smarterscout now has 48 leagues from around the world, including the UEFA Champions League and UEFA Europa League. There are 27 first-tier leagues for all users, and an additional six second-tier leagues for Premium and Pro users, and 15 more leagues for Pro users. In the near future, smarterscout may expand to as many as 55 leagues, plus women's leagues. In the meantime, data from additional leagues are available to private clients.
Yes, as smarterscout expands, it will have coverage for women's leagues where robust and detailed data are available.
The data come from a third-party provider that processes video from hundreds of football games every week. The provider records various aspects of each event on video – what happened, which players were involved, where it occurred on the field, and when. Then the provider passes the data to clients including North Yard Analytics, which further processes the data to create the metrics and ratings for smarterscout.
The metrics are presented at three levels of confidence. Metrics calculated over fewer than 570 minutes of playing time – the equivalent of six full matches including injury time – have a low confidence level and are presented for the sake of completeness. Metrics calculated over 570 to 950 minutes have a slightly higher level of confidence. Metrics calculated over 950 minutes or more – the equivalent at least ten full matches – have high confidence.
Each account will work on only one device at a time. If there is a login from a new device using the same account, the previous device will be logged off.
smarterscout uses algorithms and mathematical models developed by North Yard Analytics (NYA) to assess different aspects of players' performances and styles. These aspects are divided into overall performance, skills in duels, skills in shooting or saving, and playing style. NYA uses event data collected in the same format across dozens of leagues to create the metrics presented by smarterscout.
NYA uses two mathematical models to evaluate players' attacking and defending: a shot creation model and a ball progression model. Both models calculate expected goals (xG) generated and conceded by teams during games. Then algorithms divide the models' estimates of xG for (xGF) and xG against (xGA) into credits and demerits, respectively, for individual players.
These credits and demerits are then compiled into ratings at each of nine major positions on the field: goalkeeper, centerback, fullback, defensive or holding midfielder, central midfielder, wide midfielder or wingback, attacking midfielder or second striker, winger, and center forward.
Lastly, the ratings are adjusted for differences in leagues. Each registered user of smarterscout selects a league to use as a benchmark; all the ratings viewed by the user are adjusted to show how the player might perform in the benchmark league on a scale of 0 to 99. The league adjustments are based on playing careers going back several years; players' experiences of moving between leagues create a vast network from which the system can calibrate adjustments for each metric independently.
NYA's attacking output rating is a measure of contributions to xGF per minute that a player's team is in possession of the ball. A player can not easily advance the ball or create a shot when his team does not have possession. Yet within the time his team has possession, a player may use any number of touches to contribute to xGF. This is why NYA uses minutes in possession as a denominator.
The identity of the defender for each attacking action is not always clear from event data, which NYA uses because of its wide coverage and availability. So NYA developed an algorithm to guess the defending players during each move. In validation with video, this algorithm has proven to be correct in two thirds of events during typical games. Though imperfect, the algorithm picks up sufficient signal to assess defending over periods of several games or seasons.
NYA measures defending as a combination of defending quantity and defending quality. Defending quantity is the number of defending opportunities a player has per minute out of possession, estimated using the algorithm above. A player is not in complete control of the number of times he is called on to defend, though the most and least aggressive players can affect this frequency. Unlike in attacking, players usually can not take multiple touches to defend an oncoming attacker. So defending quality is a measure of the contributions to xGA conceded per defending opportunity, rather than per minute out of possession.
NYA uses a separate model to measure ball retention, the likelihood that a team will keep possession after a player touches the ball. This is not a measure of a player's contribution to winning but can be an important aspect of performance that combines skill and style.
The model calculates the likelihood of keeping possession when executing a specific action at a specific location on the pitch, then compares that likelihood to individual players' success in keeping possession in the same situations. The differences between players' success rates and the average success rates in their leagues are then averaged across all their actions and standardized as above.
To evaluate skill in ground duels and aerial duels, NYA uses an algorithm inspired by Arpad Elo's rating system for chess players, which has analogs in tennis rankings and many other areas. This adaptation was first proposed publicly by Todd Kingston for aerial duels. NYA does not use the classic Elo formula for reasons explained below, but the overall idea is similar.
Every time two players enter a one-on-one competition for the ball, they are in a duel. Generally one player will win, and one will lose. For a high-skilled player to win a duel against a low-skilled player is not surprising. The reverse is surprising. So in the first situation, the high-skilled player's rating should rise less than it would fall in the second situation, and the converse is true for the low-skilled player's rating. The updated ratings can be used to handicap the next duels for both players.
NYA created a formula that makes these computations to track and change players' ratings in duels over time. The formula ensures that sequences of equally likely events have the same effects on a player's rating, e.g. winning a duel with 36% probability versus losing two duels with 60% probability. The formula also requires that the value of winning or losing a duel converges to zero as the result of the duel becomes less and less surprising.
In the case of ground duels, the NYA algorithm tracks skill in holding the ball (dribbling) and regaining possession (tackling) separately and simultaneously. Each ground duel is handicapped using a baseline probability of winning a ground duel at a specific location, the dribbling rating of the player in possession, and the tackling rating of the defending player. The same two ratings change depending on the result of the duel. Aerial duels are also handicapped using baseline probabilities of winning in specific locations, as well as the players' ratings.
As with the ratings for overall performance, the ratings for skill in duels are standardized at each position and adjusted across leagues.
Each shot can be thought of as a duel between the striker and the goalkeeper, and each duel can be handicapped using the baseline probability of scoring (xG) and the skill levels of the two players. As above, shooting and saving skill can be tracked separately and simultaneously. However, computing ratings based on single shots, as with the duels above, creates a problem because of the relative rarity of goals. NYA's solution to this problem is to group shots into batches.
Using batches smoothes out the ratings and allows strikers and goalkeepers to be assessed across large numbers of opponents. Each striker is given a rating based on his likelihood of scoring with a batch of shots from which a generic striker would have a 50% chance of scoring at least once against a generic goalkeeper. Each goalkeeper is given a rating based on his likelihood of stopping a batch of shots from which a generic striker would have a 50% chance of scoring at least once against a generic goalkeeper.
NYA measures playing style across eight dimensions, defined as follows.
Disrupting opposition moves: attempt to break up an opposition move by tackling or fouling a player, or by clearing, punching, blocking, or kicking the ball out of bounds (per minute out of possession)
Recovering a moving ball: regain possession by intercepting, saving, smothering, or otherwise picking up the ball (per minute out of possession)
Aerial duels: engage in an aerial duel (per minute played)
Passing toward goal: a pass that brings the ball at least 10 meters closer to the center of the opponent's goal (per attacking touch)
Link-up passing: any other pass (per attacking touch)
Dribbling: move the ball by advancing it uncontested at least 10% of the length of the field or by taking on a player (per attacking touch)
Receiving in the box: receive the ball in the opposition penalty area (per attacking touch)
Shooting: shoot the ball except from a penalty (per attacking touch)
Frequencies of these actions are standardized at each position, and the resulting ratings may have different significances at different positions. These ratings are not adjusted for differences between leagues.
Most of the metrics in this section of the player profiles are self-explanatory. Minutes in possession are measured as the cumulative time between actions on the ball by the same team. Because of stoppages and loose balls, a typical team might spend a third of the official match time in possession.
The percentages of expected goals from passing and receiving are from the attacking part of the ball progression model. They show how important passing and receiving are, in relative terms, to a player's attacking output in that model. The remainder is the percentage of expected goals from individual actions such as winning duels and carrying the ball forward.
The involvement metrics measure a player's participation in moves (in open play) while on the field. As well as showing the share of moves leading to shots and goals where the player was involved, the site shows the share of the team's total expected goals (from the shot creation model discussed below) that was generated from moves involving the player. The denominators are compiled only from events that occurred while the player was on the field.
The site offers several metrics connected to accumulating points in fantasy leagues, for both home and away games. The metrics for playing time and attacking are calculated only from games where the player was on the field as either a starter or substitute. The metrics for scoring are calculated from the shot creation model of expected goals discussed below. As such, they depend on the likely chances of scoring rather than actual goals scored in the past (which can be more idiosyncratic).
The metrics related to defending are calculated across all games, whether or not the player featured, because a single player will generally have a minority influence on them. Again, these metrics are generated using the shot creation model (and a simulation), to convey the underlying probabilities of conceding goals and keeping clean sheets.
The site uses a distance formula based exclusively on the eight aspects of playing style described above. In the search results, the players are ranked so that the ones who have the most similar styles to the model player's style come first.
Premium and Pro users can see the player badges, and Pro users can use them to search for players. We use model ratings to suggest which players are likely to be high performers and low performers for a user's benchmark league. We also denote players who have had their first seasons with regular playing time in our database, as well as those who have had a dip in minutes versus the previous season. The final badge identifies players flagged by our algorithm as young prospects.
The system uses an exclusive algorithm calibrated to nine seasons of debuts in Europe's top five leagues through out-of-sample testing. The idea behind the algorithm is to look for measures of young players' actions that are persistent across seasons, choose the persistent measures that are most relevant to specific positions, and then create Boolean rules that pick future stars based on the levels of these measures. In other words, the algorithm looks at young players' actions to see whether they fit historical profiles – sometimes more than one per position – of debutantes who have gone on to star at a higher level. Players with at least 380' at a position are eligible up to age 22.
Yes, for an additional fee – please send an email to firstname.lastname@example.org with your request.
Every shot taken in football is unique, but shots often have characteristics in common: the location of the shot, the situation at the time of the shot, the action before the shot, etc. By looking historically at shots with similar characteristics, it's possible to estimate the probability of a shot being scored by a generic striker against a generic goalkeeper in a given league. This probability, between zero and one, is the "expected goals" from a shot. This amount of expected goals can be split between all the players involved in the attack, so that players who did not take or assist the shot also receive credit for their contributions.
Players in possession of the ball pass through the same location on the pitch thousands of times in a given season. Each time, the player's team may go on to score a goal or not. Historical data on these possessions can generate a probability of going on to score from every location on the pitch. Because the probability changes from one location to the next, players may add "expected goals" by progressing the ball to more advantageous locations. These expected goals offer another way to measure the players' contributions to attacks.
The "Statistics" tab in the player profile displays the percentage of ball progression expected goals that a player accrues from passing and receiving. (The value of each pass is divided evenly between the passer and receiver.) The remainder of a player's ball progression expected goals come from his individual actions.
Post-shot expected goals are problematic, because they assume the goalkeeper is not part of the duel with the striker until the shot is struck. In truth, the goalkeeper can affect the shot before it is struck through positioning or challenging the striker. This idea was first suggested in statistical fashion for ice hockey, by "DTMaboutheart" and "asmean" at hockey-graphs.com.
For example, imagine a scenario where the goalkeeper moves to cut off the striker's angle on goal, and the shot goes wide. In this case, post-shot expected goals are zero – a shot that is already missing the goal has no chance of scoring – so the goalkeeper receives no credit for what may have been an effective action.
Consider a second case where the goalkeeper moves the wrong way against an oncoming striker. At this point, post-shot expected goals are grading the goalkeeper on the ability to correct a mistake, while ignoring the mistake itself. The goalkeeper has actually created a generous handicap by raising the save’s degree of difficulty.
In the first case, the goalkeeper's shotstopping rating will be biased downward. In the second, it will be biased upward. The effect of these biases is to make goalkeepers look less different than they really are. Goalkeepers who make good decisions might not look so good, and goalkeepers who make bad decisions might not look so bad.
This situation would not be so problematic if the biases were somehow symmetrical. Then a ranking of goalkeepers by shotstopping skill might still be correct. But the biases aren’t symmetrical, since they stem from very different qualities of goalkeepers.
Maps only appear when a user searches for a player's most recent data. The system does not save players' old maps in users' histories; it only saves the players' metrics from the moments when they were viewed.
In the smarterscout podcast, "The Why in Analytics", Dan Altman offers in-depth discussions of several topics in football analytics and explains the smarterscout approach in greater detail. The podcast is available through most major outlets.
North Yard Analytics LLC was founded by Dan Altman as a vehicle for his sports data consulting. NYA has worked with clients in several sports, including football clubs ranging from the Champions League and Europa League to lower divisions in Europe and other competitions around the world.
North Yard Analytics believes that football analytics and technical scouting should be available to everyone, from fans checking on their squads' new signings to club executives looking for the next big thing. In sports like baseball and basketball, advanced metrics are in the public domain – now that's true for football as well.
At present the smarterscout system is fully automated, but it may expand in the future to include new content. Any opportunities will be posted on the site's home page.