smarterscout

faq

How many men's leagues does Smarterscout cover?

Smarterscout now has 55 leagues from around the world, including the UEFA Champions League and UEFA Europa League. All users can access Europe's top 5 leagues and 2 European competitions. Premium users get 23 more first-tier leagues and 6 second-tier leagues. Pro users add 14 more first-tier leagues, 3 more second-tier leagues, and the third and fourth tiers in England. Data from additional leagues is available to clients upon request.

Does Smarterscout cover women's leagues?

Coming soon! Smarterscout has procured access to a few dozen of the top women's leagues from around the world. We are currently testing the data with our models and hope to make it available before the summer 2024.

Where does the data for smarterscout come from?

The data comes from a third-party provider that processes video from hundreds of football games every week. The provider records various aspects of each event on video – what action(s) happened, which players were involved, where it occurred on the field, and when it took place. The provider passes the data to us and we process the data to create the metrics and ratings you see on Smarterscout.

Why do some metrics appear in lighter colors?

The metrics are presented at three levels of confidence. Metrics calculated over fewer than 570 minutes of playing time (the equivalent of 6 full matches including injury time) have the lowest confidence level. Metrics calculated over 570 to 950 minutes (6-10 matches) have a moderate level of confidence. Metrics calculated over 950 minutes or more (the equivalent of at least 10 full matches) have high confidence.

Can smarterscout's content be republished or reproduced elsewhere?

Smarterscout's Terms of Use state that its content is for personal and professional use only. Requests to republish or reproduce specific content for commercial use should go to info@smarterscout.com.

How many people can use a single Smarterscout account?

Each user account is for one user only and will not work on more than one device at a time. If there is a login from a new device using the same account, the previous device will be logged off automatically. If you need multiple user accounts for your organization please contact us.

How do the smarterscout ratings work?

Smarterscout uses our proprietary algorithms and mathematical models to assess different aspects of players' performances and styles. These aspects are divided into overall performance, skills in duels, skills in shooting or saving, and playing style. We use event data collected in the same format across dozens of leagues to create the metrics presented by Smarterscout.

Overall performance

We use two primary mathematical models to evaluate players' attacking and defending: a shot creation model and a ball progression model. Both models calculate expected goals (xG) generated and conceded by teams during games. Then algorithms divide the models' estimates of xG for (xGF) and xG against (xGA) into 'credits' and 'demerits', respectively, for individual players.

These credits and demerits are then compiled into ratings at each of nine major positions on the field: goalkeeper, centerback, fullback, defensive or holding midfielder, central midfielder, wide midfielder or wingback, attacking midfielder or second striker, winger, and center forward.

The ratings are adjusted for differences in leagues. Each user of smarterscout selects a league to use as a benchmark when they sign-up (which can be changed as needed) and all the ratings viewed by the user are adjusted to show how the player might perform in the selected benchmark league on a scale of 0 to 99. The league adjustments are based on playing careers over long periods of time. Players' experiences of moving between leagues during their careers create a vast and robust network from which the system can calibrate adjustments for each metric independently and with a high level of confidence.

Attacking

Our attacking output rating is a measure of contributions to xGF per minute that a player's team is in possession of the ball. A player can not easily advance the ball or create a shot when his team does not have possession. Yet within the time his team has possession, a player may use any number of touches to contribute to xGF. This is why we use minutes in possession as a denominator.

Defending

The identity of the defender for each attacking action is not always clear from event data, which we use because of its wide coverage and availability. So we developed an algorithm to guess the defending players during each move. In validation with video, this algorithm has proven to be correct in two thirds of events during typical games. Though imperfect, the algorithm picks up sufficient signal to assess defending over periods of several games or seasons.

We measure defending as a combination of defending quantity and defending quality. Defending quantity is the number of defending opportunities a player has per minute out of possession, estimated using the algorithm above. A player is not in complete control of the number of times he is called on to defend, though the most and least aggressive players can affect this frequency. Unlike in attacking, players usually can not take multiple touches to defend an oncoming attacker. So defending quality is a measure of the contributions to xGA conceded per defending opportunity, rather than per minute out of possession.

Ball retention

We use a separate model to measure ball retention, the likelihood that a team will keep possession after a player touches the ball. This is not a measure of a player's contribution to winning but can be an important aspect of performance that combines skill and style.

The model calculates the likelihood of keeping possession when executing a specific action at a specific location on the pitch, then compares that likelihood to individual players' success in keeping possession in the same situations. The differences between players' success rates and the average success rates in their leagues are then averaged across all their actions and standardized as above.

Skill in duels

To evaluate skill in ground duels and aerial duels, we use an algorithm inspired by Arpad Elo's rating system for chess players, which has analogs in tennis rankings and many other areas. This adaptation was first proposed publicly by Todd Kingston for aerial duels. We do not use the classic Elo formula for reasons explained below, but the overall idea is similar.

Every time two players enter a one-on-one competition for the ball, they are in a duel. Generally one player will win, and one will lose. For a high-skilled player to win a duel against a low-skilled player is not surprising. The reverse is surprising. So in the first situation, the high-skilled player's rating should rise less than it would fall in the second situation, and the converse is true for the low-skilled player's rating. The updated ratings can be used to handicap the next duels for both players.

We created a formula that makes these computations to track and change players' ratings in duels over time. The formula ensures that sequences of equally likely events have the same effects on a player's rating, e.g. winning a duel with 36% probability versus losing two duels with 60% probability. The formula also requires that the value of winning or losing a duel converges to zero as the result of the duel becomes less and less surprising.

In the case of ground duels, the algorithm tracks skill in holding the ball (dribbling) and regaining possession (tackling) separately and simultaneously. Each ground duel is handicapped using a baseline probability of winning a ground duel at a specific location, the dribbling rating of the player in possession, and the tackling rating of the defending player. The same two ratings change depending on the result of the duel. Aerial duels are also handicapped using baseline probabilities of winning in specific locations, as well as the players' ratings.

As with the ratings for overall performance, the ratings for skill in duels are standardized at each position and adjusted across leagues.

Skill in shooting and saving

Each shot can be thought of as a duel between the striker and the goalkeeper, and each duel can be handicapped using the baseline probability of scoring (xG) and the skill levels of the two players. As above, shooting and saving skill can be tracked separately and simultaneously. However, computing ratings based on single shots, as with the duels above, creates a problem because of the relative rarity of goals. Our solution to this problem is to group shots into batches.

Using batches smoothes out the ratings and allows strikers and goalkeepers to be assessed across large numbers of opponents. Each striker is given a rating based on his likelihood of scoring with a batch of shots from which a generic striker would have a 50% chance of scoring at least once against a generic goalkeeper. Each goalkeeper is given a rating based on his likelihood of stopping a batch of shots from which a generic striker would have a 50% chance of scoring at least once against a generic goalkeeper.

Style

We measure playing style across eight dimensions, defined as follows.

Disrupting opposition moves: attempt to break up an opposition move by tackling or fouling a player, or by clearing, punching, blocking, or kicking the ball out of bounds (per minute out of possession)

Recovering a moving ball: regain possession by intercepting, saving, smothering, or otherwise picking up the ball (per minute out of possession)

Aerial duels: engage in an aerial duel (per minute played)

Passing toward goal: a pass that brings the ball at least 10 meters closer to the center of the opponent's goal (per attacking touch)

Link-up passing: any other pass (per attacking touch)

Dribbling: move the ball by advancing it uncontested at least 10% of the length of the field or by taking on a player (per attacking touch)

Receiving in the box: receive the ball in the opposition penalty area (per attacking touch)

Shooting: shoot the ball except from a penalty (per attacking touch)

Frequencies of these actions are standardized at each position, and the resulting ratings may have different significances at different positions. These ratings are not adjusted for differences between leagues.

Statistics

Most of the metrics in this section of the player profiles are self-explanatory. Minutes in possession are measured as the cumulative time between actions on the ball by the same team. Because of stoppages and loose balls, a typical team might spend a third of the official match time in possession.

The percentages of expected goals from passing and receiving are from the attacking part of the ball progression model. They show how important passing and receiving are, in relative terms, to a player's attacking output in that model. The remainder is the percentage of expected goals from individual actions such as winning duels and carrying the ball forward.

The involvement metrics measure a player's participation in moves (in open play) while on the field. As well as showing the share of moves leading to shots and goals where the player was involved, the site shows the share of the team's total expected goals (from the shot creation model discussed below) that was generated from moves involving the player. The denominators are compiled only from events that occurred while the player was on the field.

Fantasy

The site offers several metrics connected to accumulating points in fantasy leagues, for both home and away games. The metrics for playing time and attacking are calculated only from games where the player was on the field as either a starter or substitute. The metrics for scoring are calculated from the shot creation model of expected goals discussed below. As such, they depend on the likely chances of scoring rather than actual goals scored in the past.

The metrics related to defending are calculated across all games, whether or not the player featured, because a single player will generally have a minority influence on them. Again, these metrics are generated using the shot creation model (and a simulation), to convey the underlying probabilities of conceding goals and keeping clean sheets.

How does Smarterscout search for similar players to a model player?

The site uses a distance formula based exclusively on the eight aspects of playing style described above. In the search results, the players are ranked so that the ones who have the most similar styles to the model player's style come first.

What are the player badges?

Premium and Pro users can see the player badges, and Pro users can use them to search for players. We use model ratings to suggest which players are likely to be high performers and low performers for a user's benchmark league. We also denote players who have had their first seasons with regular playing time in our database, as well as those who have had a dip in minutes versus the previous season. The final badge identifies players flagged by our algorithm as young prospects.

How does Smarterscout identify young prospects?

The system uses an exclusive algorithm calibrated to nine seasons of debuts in Europe's top five leagues through out-of-sample testing. The idea behind the algorithm is to look for measures of young players' actions that are persistent across seasons, choose the persistent measures that are most relevant to specific positions, and then create Boolean rules that pick future stars based on the levels of these measures. In other words, the algorithm looks at young players' actions to see whether they fit historical profiles – sometimes more than one per position – of debutantes who have gone on to star at a higher level. Players with at least 380' at a position are eligible up to age 22.

Are time-series versions of Smarterscout's skill metrics available?

Yes, for an additional fee – please send an email to info@smarterscout.com with your request.

What is a shot creation expected goals model?

Every shot taken in football is unique, but shots often have characteristics in common: the location of the shot, the situation at the time of the shot, the action before the shot, etc. By looking historically at shots with similar characteristics, it's possible to estimate the probability of a shot being scored by a generic striker against a generic goalkeeper in a given league. This probability, between zero and one, is the "expected goals" from a shot. This amount of expected goals can be split between all the players involved in the attack, so that players who did not take or assist the shot also receive credit for their contributions.

What is a ball progression expected goals model?

Players in possession of the ball pass through the same location on the pitch thousands of times in a given season. Each time, the player's team may go on to score a goal or not. Historical data on these possessions can generate a probability of going on to score from every location on the pitch. Because the probability changes from one location to the next, players may add "expected goals" by progressing the ball to more advantageous locations. These expected goals offer another way to measure the players' contributions to attacks.

The "Statistics" tab in the player profile displays the percentage of ball progression expected goals that a player accrues from passing and receiving. (The value of each pass is divided evenly between the passer and receiver.) The remainder of a player's ball progression expected goals come from his individual actions.

Why doesn't smarterscout use post-shot expected goals to evaluate goalkeepers?

Post-shot expected goals are problematic, because they assume the goalkeeper is not part of the duel with the striker until the shot is struck. In truth, the goalkeeper can affect the shot before it is struck through positioning or challenging the striker. This idea was first suggested in statistical fashion for ice hockey.

For example, imagine a scenario where the goalkeeper moves to cut off the striker's angle on goal, and the shot goes wide. In this case, post-shot expected goals are zero – a shot that is already missing the goal has no chance of scoring – so the goalkeeper receives no credit for what may have been an effective action.

Consider a second case where the goalkeeper moves the wrong way against an oncoming striker. At this point, post-shot expected goals are grading the goalkeeper on the ability to correct a mistake, while ignoring the mistake itself. The goalkeeper has actually created a generous handicap by raising the save’s degree of difficulty. (Note that not all post-shot expected goals models incorporate information about the goalkeeper's position.)

In the first case, the goalkeeper's shotstopping rating will be biased downward. In the second, it will be biased upward. The effect of these biases is to make goalkeepers look less different than they really are. Goalkeepers who make good decisions might not look so good, and goalkeepers who make bad decisions might not look so bad.

This situation would not be so problematic if the biases were somehow symmetrical. Then a ranking of goalkeepers by shotstopping skill might still be correct. But the biases aren’t symmetrical, since they stem from very different qualities of goalkeepers.

How are the xGAR metrics calculated?

The xGAR metrics are explained in detail here.

Why don't maps appear for players saved in my history?

Maps only appear when a user searches for a player's most recent data. The system does not save players' old maps in users' histories; it only saves the players' metrics from the moments when they were viewed.

How can I find out more about the calculation of Smarterscout's metrics?

In the Smarterscout podcast, "The Why in Analytics", you can listen to in-depth discussions of several topics in football analytics and we explain the smarterscout approach in greater detail. The podcast is available through most major outlets. We also have an online course available.

Why did we launch Smarterscout?

We believe that football analytics and technical scouting should be available to everyone, from fans checking on their squads' new signings to club executives looking for the next big star. In sports like baseball and basketball, advanced metrics are in the public domain – now that's true for football as well.