One of the frustrations of being passionate about football analytics and being involved in the football analytics community is the contrast between the public and media perception of the great work that is going on at the forefront of the field. We should really be well past the point of debating the value of analytics in football, and we certainly don’t need to keep debating whether or not xG is actually useful or not (it is). It would be wonderful if the football analytics community was spending more time talking about new research and exciting new innovations going on in the field, instead of defending its existence. Unfortunately, that is often not the case, because there are still plenty of doubters out there. Some of that is because the community needs to improve how it communicates some of this stuff, but it doesn’t help when people with a limited understanding of the subject cast doubts based on ignorance, instead of looking to learn.
But the good news is that the last two days have been spent focusing on the good stuff, all thanks to leading data provider, StatsBomb! StatsBomb have been at the forefront of football analytics for a couple years now, and they have grown into a real heavyweight in the sport, boasting 76 professional clubs as clients, including two Champions League semi-finalists.
StatsBomb hosted a virtual event yesterday, called StatsBomb Evolve, to unveil a host of new developments in the products StatsBomb offers. These include a new dataset, StatsBomb 360, live in-game data, and new metrics for better assigning value to actions all over the pitch. I was able to watch the presentation live, and have provided below an overview of some of StatsBomb’s exciting new products.
The center-piece of the event was StatsBomb 360, a new dataset that the data provider is hailing as the most advanced data in the industry. To demonstrate just how significant this new dataset is, they also announced that the first club to sign on as a customer was Liverpool FC!
360 is a dataset that is collected using computer vision technology (a multidisciplinary field of artificial intelligence and machine learning). It provides a massive amount of context to event data, collecting information about all players on camera at the point of every event that occurs (which they estimate to be ~3300 events per match). This will help to quantify areas of the game that have otherwise been very difficult to pin down, like the passing lanes available to a player, the space they are in when they receive the ball, the distance to every defender when in possession, and the defensive shape at the time of the event.
One area of (misplaced) skepticism when it comes to football analytics is the argument that, without tracking data, metrics are ultimately not able to tell us a great deal about the sport. However, tracking data isn’t the be-all and end-all that it is sometimes portrayed as. It is expensive, it is difficult to work with, and it’s not always that easy to pull valuable information out of this kind of data. That said, tracking data does contain vital contextual information that is lacking in event data. The 360 dataset looks to bridge this gap, not by providing tracking data, but by arming event data with most of the useful information you could take from tracking data, placing events in wider context.
One of the more granular details about this data is that StatsBomb have been working to make it frame-precise, which means that they attach a detailed timestamp to the data that is as precise as possible. This means that StatsBomb 360 can be integrated with tracking data effectively.
To better understand how 360 is changing the football analytics landscape, consider what is available now compared to what 360 can offer.
As the above image demonstrates, this new dataset will provide much more context, and make it much easier to judge player actions, even opening up the possibility of evaluating a player’s decision-making by measuring the value of the other choices that were available to them.
This really is a huge leap forward for the football analytics community. This will arm analysts with so much more information, which will lead to improving the accuracy of existing metrics, and opening up the potential for new metrics to be developed.
The second big development discussed was StatsBomb LIVE. This seems like a product that needs little explanation, but just in case it is unclear, StatsBomb LIVE is offering live data. It will provide real time data and a feed of useful statistics that is sure to find its way onto the TV’s of ordinary fans in the near future.
While I don’t think this warrants a significant focus in this article, this is a big development, and could have a significant impact on the way we all watch football, if big media outlets start tapping into this resource.
Perhaps of the greatest immediate consequence for much of the football analytics community not working for a football club were the analytics developments presented by StatsBomb’s Data Scientist, Dinesh Vatvani. While the whole event was very interesting, Dinesh’s presentation really got my nerd juices flowing. Dinesh unveiled several new metrics that the Data Science team has been working on, in particular a new possession value model called On-Ball Value (OBV).
To explain OBV in the simplest possible terms, it measures the value of actions based on how much they change the probability of scoring or conceding. It is a possession value model, and it computes a plus/minus value for each event, which provides a much more granular look at player contributions in a game.
Rather than thinking in terms of the individual player actions, OBV instead uses “possession states” to measure value. Possession states are a snapshot of the game at that moment in time, such as a player in possession, with the ball at his feet in open play, while being pressured by the opponent. By training a model to value each possession state, it is possible to quantify actions as well, as they are the transition between each possession state. The benefit of focusing on possession states instead of actions is that it simply creates a lot more data to work with. The example given in the presentation was a particular type of pass, of which there were about 37,000 similar passes in the data, compared with 3,600,000 similar start possession states and 5,900,000 end possession states.
This is a big step in the development of football analytics, moving from the focus on the value of shots and everything surrounding shots, to thinking in more atomic terms, and measuring the individual value of each action in a game. StatsBomb certainly are not the first to come up with a possession value model, as this has been something that has been ongoing in the analytics community for some time (for example, American Soccer Analysis’ Goals Added model, which I wrote about some time ago), but combined with the more in-depth data StatsBomb are now collecting, this obviously has a huge amount of potential.
Really excited to have had the chance to talk about the models we've been working on at StatsBomb today!— Dinesh Vatvani (@d_vatvani) March 17, 2021
Here's a little snippet of the possession value surfaces that drive our On-Ball Value model for anyone who couldn't make the event#StatsBombEvolve pic.twitter.com/5pzfOaWOAx
There is a ton of detail that I’m not including here, because I could easily spend a couple thousand words talking about OBV, but what I can say is that I was really impressed, and that seems to be the consensus view in the football analytics community. They’ve done a really great job putting together a model that builds on all the good work that precedes their own model, and incorporating some fresh ideas and leveraging their strengths as the best data provider in the business. One of the big strengths that OBV has over other models is the work done to filter out the presence of bias as a result of team strength, producing a more accurate assessment of player contribution from individual actions.
One of the really incredible aspects of this kind of model is the fact that this unifies the scale on which different actions are measured (net expected goal difference). This means it is possible to make comparisons across different types of events. You can compare the value contributing by different actions or different types of players. For example, the image below plots Thiago’s contributions based on OBV.
Of course, the best part of a model that measures all player actions on a unified scale is the fact that you can measure total value of each player, and use that to assess the best players in football. Dinesh presented the findings from the model’s results using the last five seasons of data, and the three best players were Lionel Messi, Kylian Mbappe, and our very own Mats Hummels!
Beyond OBV, Dinesh also discussed a number of other really interesting developments, including a data-driven approach to classifying player role and position, a clustering model for identifying types of passes, and a metric called xPass, which computes the expected probability of completing a pass.
Obviously this is just an overview of everything that was featured in what was a two-hour event, so there is plenty of important detail missing. However, I hope that it is a helpful summary, and really shows some of the great work that is going on. Everyone has been raving about what we saw in these presentations, and for good reason. This is all very exciting!
It’s also worth noting that these are, without a doubt, extensions of some of the great work that has already been done by others in the analytics community. However, I don’t mean this to criticize or diminish StatsBomb’s work. I only mention it because it is an indication of how the football analytics community is forging a path forward by a competitive and collaborative process.
Feel free to give me a shout if you have any questions about anything you see in the article, and hopefully I can help. Failing that, you’ll just have to pester StatsBomb themselves!