Validating smarterscout from top to bottom
By Dan Altman, creator of smarterscout
How much can you trust our metrics? It's a critical question for many of our members. Our platform is just over two years old and already in use by clubs in the Premier League, Bundesliga, Serie A, MLS, and a host of other competitions, as well as by player agencies, media like The Athletic and talkSPORT, and thousands of other members. Naturally, we've had inquiries about validation – how we check our tools to ensure that they are as consistent and useful as possible.
This is a really important topic for us, and I thought all of our members might want to read about it. Since the beginning, smarterscout has been based on the principle of transparency. I strongly believe that people will trust and implement our tools only when they can understand them. That's why our FAQ details the methods behind our metrics, and that's why our podcast answers many more questions about them. Being transparent about the robustness of our models and algorithms is a natural extension of this principle. So here goes.
I've used event (or "on the ball") data from half a dozen different providers over the past eight years, and in each case I've taken some time to validate the data by checking it against video. In the simplest version of this process, I flatten the data from several matches into a spreadsheet and then watch those matches on video while looking at the data row by row. Some providers have more errors with locations, some with timestamps, others with contextual information, and I encountered one whose data tagged the wrong players quite a few times.
In other words, no one is perfect. Most providers offer data where about 90% to 95% of the events are completely error-free. That sort of error rate might raise eyebrows for people computing stats for a single match, but over the course of a season it usually just amounts to noise. Moreover, the errors often have a systematic aspect that can be corrected by recoding. One provider even changed the hardware used by their staff to record events after some concerns arose.
We spent months validating the data for smarterscout, and every day our system cleans, recodes, and transforms the data we receive to make it as accurate as possible. We also conduct spot checks to identify any changes in coding tendencies, and we report these to the provider. Finally, we correct errors in names, birthdates, and even height and weight whenever we see errors.
The dual cores of our performance metrics are two models of expected goals. One is the shot creation model, which measures the probability of scoring from each shot and then divides the credit between the players who contributed to the move. The other is the ball progression model, which gives credit to players who raise the probability of going on to score a goal by moving the ball to another location on the pitch. We also use each model to dole out equal and opposite demerits to the opponents who conceded the actions that raised the chances of scoring. Both models are calibrated over a 600-match moving window.
One crucial way of validating the shot creation model is to check whether small subgroups of shots resulted in the same scoring rates that we estimated. For instance, here are the results for non-headers in open play during the last full season of the Premier League, 2019-20:
The model looks well calibrated for these shots. We don't have any weird asymptotic dynamics going on at either extreme. That's what we want to see.
For the ball progression model, a similarly fundamental check is to make sure that the total value of all attacking actions is equal to the number of goals. Here's what that comparison looks like for all of our seasons of the Premier League:
|Regular goals||Penalty and own goals||Total goals||Total expected goals|
|* to date|
The totals are very close, with the biggest miss (still only 3%) coming in the current season. I suspect this has been due to the unusual prevalence of penalties since the onset of the pandemic and the increased use of VAR. We'll see if that continues next season with more fans in the seats and a likely revision of VAR and/or IFAB rules.
Our metrics are the products of formulas. We break down the credit in our models, choose appropriate denominators, standardise to each position on the pitch, and then adjust for our members' benchmark leagues. In this respect, there isn't anything obvious to validate; if you know how the metric is calculated – and that's what our FAQ is for – then you know what it's telling you and what it's not telling you. However, one of the purposes of our metrics is to give our members reliable estimates of players' underlying capabilities. In that sense, we'd expect our metrics to be somewhat consistent from season to season.
Now, there are many reasons why a player's performance levels might not be consistent from season to season. Some players are getting older, and some are getting younger. Some are moving to clubs with different styles of play. Some are changing roles at the same position. Some are carrying injuries. Some are playing for a new contract. And then there's also the inherent randomness of the game. So we should never expect a perfect correlation between the metrics for one season and the metrics for the next. But how good is it?
We'll start with the raw metrics for our measure of attacking output, which are the expected goals contributions in both models per minute in possession, standardised to each position and then benchmarked to the same league. Of course, we can't just look at all players at the same time. That's because players at different positions will have contributions in different ranges: forwards typically higher than midfielders, and midfielders typically higher than defenders. So we have to separate the players by position.
Here's how the league-adjusted metric from the shot creation model performs. These are players at the same position in domestic leagues over consecutive seasons. Below I've varied the minimums for minutes played in each season. And I've grouped matching positions together, so FB includes LBs and RBs, etc:
|Correlation at 380'+||0.40||0.50||0.54||0.56||0.51||0.41||0.44|
|Correlation at 570'+||0.43||0.52||0.60||0.60||0.55||0.48||0.46|
|Correlation at 950'+||0.48||0.60||0.71||0.64||0.65||0.61||0.53|
Because our metrics are league-adjusted, players who switched teams and leagues are included in these numbers. You can see that there are fewer players with a lot of minutes at WB/WM and CM in our data than at other positions. That's partly because so many clubs play a double pivot (where our data provider would classify both players as DMs) and so few use a back three or five, though it is becoming more popular again. The widest midfielders in a midfield flat four or three, playing right in front of the back line, are still classified as WMs.
In the table above, the higher we set the floor for minutes in both seasons, the higher correlations we see. That's a good sign – it means that richer data make our metrics more consistent. By the time a player has 950'+ at the same position in consecutive seasons – the equivalent of ten full matches – the correlation is at about 0.5 or better at all positions.
Here's the same table for the ball progression model:
|Correlation at 380'+||0.49||0.62||0.63||0.67||0.63||0.49||0.51|
|Correlation at 570'+||0.55||0.64||0.68||0.70||0.66||0.55||0.54|
|Correlation at 950'+||0.60||0.69||0.74||0.74||0.74||0.66||0.60|
The correlations are higher here, starting at about 0.6 for players with 950'+ at the same position in consecutive seasons. This is probably in part because shots are more idiosyncratic. But also, we have to use rules of thumb to divide up contributions to expected goals in the shot creation model; in the ball progression model, each action is evaluated independently.
We can perform the same validation for our defending quantity metric, which is based on the volume of defending opportunities a player has per minute out of possession. Because we estimate defending opportunities using an algorithm – there's really no choice if you're using event data – we expect some noise here. And we get it:
|Correlation at 380'+||0.42||0.40||0.40||0.46||0.34||0.37||0.51|
|Correlation at 570'+||0.46||0.45||0.46||0.50||0.37||0.42||0.54|
|Correlation at 950'+||0.50||0.50||0.53||0.56||0.44||0.53||0.58|
The noisiest position is CM, which stands to reason given the relatively small sample and the many profiles that might exist there. The noise will carry over into our defending quality ratings, too, since we are valuing the actions that players concede in each defending opportunity. Here are the correlations from season to season for the defending quality metric from our shot creation model:
|Correlation at 380'+||0.31||0.29||0.27||0.32||0.19||0.24||0.21|
|Correlation at 570'+||0.36||0.34||0.31||0.37||0.24||0.26||0.23|
|Correlation at 950'+||0.42||0.41||0.39||0.43||0.27||0.38||0.30|
Not surprisingly, the correlations are lowest at CM. The good news here is that we have a decent correlation around 0.4 after 950' played for the main defending positions: CB, FB, WB/WM, and DM. But more generally, this is why we use lighter fonts on our player profile pages for data on positions where players have fewer than 950'. We're just not as confident in all of the metrics when the players have so few minutes. If we boost the minimum to 1,900' – the equivalent of 20 full matches – then the correlations for defenders reach about 0.5. And indeed, we usually advise clubs to look at defending metrics over an entire season.
Here's the same table for our measure of defending quality from the ball progression model:
|Correlation at 380'+||0.33||0.32||0.49||0.35||0.27||0.26||0.23|
|Correlation at 570'+||0.37||0.37||0.52||0.39||0.36||0.28||0.27|
|Correlation at 950'+||0.43||0.44||0.60||0.44||0.41||0.42||0.34|
Again, the correlations are higher for the ball progression model than for the shot creation model. The ball progression looks at every move, not just moves that ended in shots, which are somewhat idiosyncratic in themselves. But it's still worth using the shot creation model, since we want to understand how defenders concede shots (and actions that lead to shots) as well as territory.
For example, some defenders may have a "sixth sense" about when a move will lead to true danger. Others may have trouble anticipating and blocking shots. And it certainly seems like the metrics are collecting different information; the same-season correlation between the two defending metrics for a given position is about 0.6, no matter how many minutes we set as a minimum. This suggests there are real differences that aren't just down to noise. For the attacking metrics, the correlation is about 0.7.
Now let's put aside the expected goals models and turn to our ball retention metric. We try to focus on the riskiness of a player's style – and the likelihood of maintaining possession – by controlling for the actions the player attempts and the locations of those actions on the pitch, which might be affected by the player's role or the team's overall style. Here's the same table again:
|Correlation at 380'+||0.61||0.65||0.63||0.65||0.63||0.62||0.60|
|Correlation at 570'+||0.64||0.69||0.67||0.68||0.68||0.69||0.64|
|Correlation at 950'+||0.68||0.71||0.75||0.72||0.75||0.74||0.69|
Ball retention one of the metrics that marks the clearest separations between leagues of widely differing standards. A player from a lower-level league might have a tremendous attacking spell for a variety of reasons. But if the player can't keep possession of the ball so easily in a top competition, it might be even harder replicate the performance. So it's gratifying that the underlying metric is so persistent in this case.
Overall, I think these numbers are acceptable for our purposes. There's no such thing as certainty in football, so we're looking for metrics that can help us to make rational decisions in a probabilistic setting. Using the metrics above, we can get signals out of the noise with fairly small samples, even when players switch teams and leagues. The key thing to remember is this: once you understand where a metric comes from and its limitations, then it's just a fact – "these events lead to these numbers" – and you can use it as you see fit.
If you have comments or suggestions about our validation or our models that aren't answered in the FAQ, please feel free to send them to us via the contact link below. Thanks, and as always we hope you enjoy the platform.
[Photo: Steve Daniels]