How we spot smarterscout young prospects

By Dan Altman, creator of smarterscout

Identifying the most promising young players is a critical part of a club's long-term recruitment strategy. Whether a club wants to increase its revenues by trading players or save on transfer fees by developing its own stars, young players are the key. But sending scouts to see thousands of debutantes around the world is hardly practical. Can data help?

I think the answer is yes. But it's not as straightforward as looking at a few metrics. I found this out many years ago when I tried to identify top prospects using the same mathematical models I'd been working on for peak-age players. What I didn't understand then was that young players didn't stand out for the same reasons.

Peak-age players are expected to be integral parts of their squads, offering levels of performance that translate into wins and points in the table. Their games are well-rounded, with the rough edges smoothed off through experience. That's not always the case for younger players. Some are dynamic and take too many risks. Others play it safe too often for fear of making a mistake.

What seems to matter most for young players is a combination of getting involved and being successful in their individual actions. We want to see players who are trusted by their teammates and get stuck in on both sides of the ball. And we want these players to go toe-to-toe with seasoned pros in one-on-one situations.

Because my first attempt at identifying young stars didn't work out too well, I decided to take a different approach the second time around. I wanted to find players who would eventually perform at the highest level, so I used data on debutantes from Europe's top five leagues. I started out with simple actions that were easy to track, even on the training ground or in academy matches – things like aerials, 1v1s, wall passes, switches of play, etc.

Choosing the right denominators for these actions was crucial. For example, when trying to gauge a player's involvement in a match, we shouldn't look at his attacking 1v1s overall or even per 90 minutes; we should divide by his team's minutes in possession. After all, he can't attack if his team doesn't have the ball.

The denominators transformed my counts of simple actions into useful metrics. The next step was to see which of them were persistent – in other words, which ones tended to be similar for players at a given position from year to year. Among these persistent metrics, I needed to select the ones that predicted future success.

I began with a three-year cohort of debutantes – youngsters who had come into their senior squads before age 23. I tracked them for a further three years to see which ones had become regulars in Europe's top five leagues. Then I experimented with selection criteria based on the players' metrics to see if I could isolate the stars and exclude the ones who didn't make it.

I knew there had to be more than one profile at each position; for instance, a ball-playing central defender versus an specialist in aerials and clearances. I refined these profiles until they minimised the false positives and false negatives at each position. Lastly, I tested the profiles on the next three-year cohort of debutantes, tracking them for a further three years as well. I finalised the profiles based on this second sample.

Today the smarterscout system uses these profiles to flag the most promising young players in more than 60 leagues around the world. We can identify players after as few as 380' – the equivalent of four full matches – at a given position. And we've found that the profiles are consistently effective, starting as far back as age 16, even in academy data.

So, what do the results look like? Here are all the players in the EFL Championship who were under age 22 at the start of the 2018-19 season and ended up with at least 380' at any single position. I've picked an earlier season, so we can see how the players progressed. The names marked in red went on to play in the Premier League. Those in green played in another one of Europe's top five leagues (and some, like Fikayo Tomori, did both):

It's a rough measure here, but you can see that the rate of progression is much higher among flagged players. Some of them were merely on loan in the Championship from Premier League clubs, and others were promoted. But the vast majority were capable of playing at the highest level. And at least one, Josh Dasilva, may only have been kept off the pitch in the Premier League through injury.

We did miss a few players who went on to bigger things, like Everton's Ben Godfrey and Chelsea's Trevoh Chalobah. These guys could be false negatives – we can't identify every successful profile, after all – and we probably have some false positives as well. But we should expect these errors with any kind of scouting. Morever, if data gave us exactly the same opinion as the eye test, then there'd be no point in using it in leagues we knew well. And data are a huge help in leagues where we can't send out scouts every day.

Also, we won't find the same number of young prospects in every competition. Because we're using criteria based on debuts in Europe's top five leagues, we usually find fewer prospects in leagues with a slower pace. Also, all other things equal, we naturally flag more players in leagues with more clubs and/or a greater tendency to give youth a chance.

With the new FA rules on Governing Body Endorsements for work permits, the ability to find young prospects all over the globe is especially important. The FA's system awards points for youth internationals and continental competitions overseas, so a surprising number of starlets may be eligible to come to English football. But regardless of where a club is recruiting, we think smarterscout's algorithmic approach can help to find the needles in those faraway haystacks.

UPDATE: For those interested in conventional statistics on specificity and sensitivity, here are some more numbers. The players who started play in the 2018-19 Championship at age 22 or under and had at least 380' at a position accounted for a total of 117 player-position combinations. Each of those player-position combinations was essentially a test of the algorithm. In this exercise, I'll consider playing in one of Europe's top five leagues in the same or a later season as a success. Here are the results:

Flagged, Successful: 53

Flagged, Unsuccessful: 18

Not flagged, Successful: 16

Not flagged, Unsuccessful: 30

Here the sensitivity of the test is 53 / (53 + 16) = 77%, and the specificity of the test is 30 / (30 + 18) = 62%. The overall efficiency of the test is 71%. To put it another way, in 7 out of 10 cases the algorithm could be used to predict whether or not EFL Championship players with 380'+ at a position would go on to play that same position in any of the top five leagues, when the players were tracked over a total of three seasons.

Of course, these tests only validate the algorithm to the extent that we believe the clubs in the top five leagues make correct decisions about whom to sign. It may be that players capable of playing at the highest level are being left in the EFL Championship, and/or that clubs are signing EFL Championship players who won't make the grade.

Pro members of smarterscout can search specifically for players flagged as young prospects by selecting the badge at the bottom of Search by metrics or smartersearch.

[Photo: via Instagram]

Recent articles

How we spot smarterscout young prospects
Private membership – there can be only one!
Should clubs focus on shot quality using expected goals?
Introducing xGAR: expected goals above replacement