clock menu more-arrow no yes

Filed under:

Westfalenstats: Return of the Math

New, comments

I’m back on my bullshit

Technology - The 2014 Mobile World Congress Photo by AOP.Press/Corbis via Getty Images

A couple months ago I decided to take a short hiatus from writing Westfalenstats, our semi-regular feature covering some interesting football analytics that I’ve recently come across. Some personal and political distractions meant that I didn’t have the time to produce these articles, so I shelved the idea, with every intention of returning to it a couple weeks later. Since that time, I’ve started every week thinking that I’ll finally get round to getting these back on track, but have somehow failed in that endeavor. But here we are, with the much awaited return of some football analytics. What a delight! I promise to never leave you stranded like this again. Or until I have too much on my plate, in which case you’re on your own nerds.

Learning some MIY: Math it Yourself

I have previously shared some of the resources that exist for learning about football analytics, data analytics, and coding, but there is so much out there I could easily fill several articles with this stuff. I thought I’d share a few more today.

First, Devin Pleuler’s open-source Soccer Analytics Handbook is a really great, hands-on approach to learning about sports analytics. Everybody learns differently, but I know that I learn by doing a lot more effectively than when reading a book or watching videos.

The Soccer Analytics Handbook contains a series of Jupyter Notebooks, which can be used to run Python (as well as R, Julia, and several other languages). You can use these notebooks by running them as is, and learning the process while doing so, but you can also learn a great deal by editing them and trying things out, and even adapting them to try and answer different questions that you are interested in.

Another resource that is well worth checking out is the Friends of Tracking Data Github. I’ve mentioned Friends of Tracking Data before, but their Github is full of some great repositories, in both R and Python, that will also supplement the work in Pleuler’s handbook.

Finally, For those of you that prefer a nice read, Measurables Podcast put together the following Twitter thread of books for learning sports analytics, and in addition to that, there’s also this Twitter thread of resources for learning to code.

Soccer Analytics Library

With the rapid growth of analytics in football, it is hard to keep track of everything that is coming out, and perhaps most importantly, everything that came before. In science, research has to be well-grounded in prior work that is relevant, because it helps frames the current work in terms of the bigger picture, and it justifies any assumptions that are fundamental to the research. In reality, a lot of sports analytics research occurs outside of an academic setting, so the barriers to entry are naturally a little lower. That can be a good thing, but it does mean that it is easier to lose important work that has been done before. Lars Maurath attempts to solve this problem with the Soccer Analytics Library.

I think this is a really great idea. Not only does it help track and frame all research in the wider body of literature, it will help anyone that is seeking to understand a certain aspect of football analytics, and it will make it much easier to find the relevant literature when researching a certain topic. As an example of this, I’ve been interested in understanding and measuring playing style recently, and this library makes the process of identifying and reading previous research much, much quicker. Lars’s introduction and justification for the library, as well as some example use cases, can be found here.

What if the Bad Shots are Actually Good?

One of the biggest impacts that analytics has had on the sport is the increased focus on producing high probability chances. In recent years, the number of long shots has decreased, as teams focus on trying to get the ball into high probability areas, on the understanding that fewer shots is okay, as long as those shots are higher probability. This principle is grounded in our understanding of probability and Expected Goals (xG). However, does this hold across all contexts?

Ben Torvaney has looked at the idea that, under certain circumstances, lots of low probability shots is a better strategy than a few high probability shots. The idea is based on a theory put forward by Kees van Hemmen, that bad teams would prefer a dozen low probability chances to one great chance.

Ben creates a hypothetical situation where two teams create a total 0.95 xG in a game, with one team producing that xG from 32 shots, and another team from just three shots. Interestingly, while the team taking few high-quality shots is more likely to score one or two goals, it is the team with many low-quality chances that is most likely to score three goals.

Probability of n goals from 0.95 xG
Ben Torvaney

This means that, in games where these two teams concede two goals, the team producing many bad chances is actually most likely to win, even though it’s less likely to draw. Ben builds on this to make an argument that teams with poor defenses should shoot from distance more often.

Though I think this is conceptually a little counter-intuitive, Ben quotes Kees in the article, and this really helps simplify the idea: “One 0.6 xG shot can yield at most one goal, whereas 12 0.05 xG shots can yield many goals but also are more likely to yield none.”

While one really good chance is more likely to end up in the back of the net than one really poor chance, there’s an upper bound on the number of goals that can result from that chance (one... obviously). Therefore, when in search of many goals, it’s better to have many bad chances than only a few really good chances.

The article builds on this and tests the idea a little more, including computing the probability of winning based on a team’s xG per shot and their average goals conceded. He demonstrates that a very bad defense is better off producing a lot of very bad chances, though the change to win probability is minimal.

Bad Chances are Better for Bad Defenses

The article concludes that this is ultimately unlikely to have a significant impact on the tactics employed by bad teams, and that there is sufficient uncertainty that this may not really translate in the real world. Nonetheless, I think it’s a fun and interesting way of challenging the assumptions built out of xG, and I do think there are some edge cases where it would make sense for teams to apply these principles. If a team, say Borussia Dortmund in a game against a relegation-threatened side, was 3-0 down, they might as well start hitting and hoping when they’re in the final third. Just give it some welly and see what happens, you’re already losing so who cares?

Your Thoughts

Does Ben Torvaney’s article change your thoughts on how BVB build their offense? Is the utility of long shots, as a means of making an offense more unpredictable, something that Borussia Dortmund seem to be missing? Please leave your thoughts, and any questions below.