clock menu more-arrow no yes mobile

Filed under:

The Value of Expected Goals: Part 2

Using xG, and other variables, to explain outcomes

If you buy something from an SB Nation link, Vox Media may earn a commission. See our ethics statement.

Borussia Dortmund v Tottenham Hotspur - UEFA Champions League Round of 16: Second Leg Photo by Dean Mouhtaropoulos/Getty Images

Having taken a look at Expected Goals (xG) as a tool for predicting outcomes last week, I now plan to use xG to explain points totals in the Bundesliga, before taking a look at the best individual performers in the league last season. Hopefully, this should help us better understand how xG works, and when it is at it’s best. Additionally, this article should continue to shed some light on the statistical tools available to us for analyzing football. While the article is a little longer than I had originally planned, I made sure to include lots of pretty graphs to keep you interested.

Using xG to explain Bundesliga points totals

In order to test the effect of xG and our other explanatory variables on points totals, I will use regression analysis. Regression analysis is a technique for modeling the relationship between a dependent variable that you want to explain (in this case, points totals) and one or more independent variables that you think will explain it. In this case, I use a linear regression (the simplest approach), because simplicity is more valuable in this instance than the marginal gains to be had from more complex models. I use final points totals from the last five seasons as the dependent variable, which only gives us 90 observations. However, in our case, it is sufficient to tell some basic stories.

The main independent variable in the model is (as I’m sure you’ve figured out) xG, compiled from Understat. I also include a variable that measures the number of games missed by first team players for each club in a season, through injury and suspension, using data compiled from Transfermarkt. However, I only include players that are playing regularly enough for this to matter.

The final set of variables relate to team expenditure. Previous research has found a relationship between individual/team performance and earnings (Torgler & Schmidt 2007, Torgler, Schmidt, & Frey 2006), and team performance and transfer spending (McDowell 2010). However, there is some debate as to how beneficial increases in transfer spending are to the biggest teams (Burdekin & Franklin 2015).

I have compiled data on transfer expenditure and market value (as a proxy for salary spending) from the Transfermarkt website, measured in millions € per season. Unfortunately data on club wage bills is relatively hard to come by. Instead, I have taken Transfermarkt’s market value data as a proxy for team salary, following an approach used by Frick (2008) and Torgler et al (2006). I also tested available data on total salary and market value, finding a correlation of 0.97, suggesting it is an appropriate proxy.

Finally, In order to avoid issues of multicollinearity (when an independent variable can be explained by other independent variables), I run three separate models. The first model tests the effects of xG on total points distribution, the second tests transfer expenditure and includes absences, and the third does the same but with team market values.


By now it should be relatively unsurprising to discover that xG explains a great deal of the variance in points total, as shown in Table 1. xG is statistically significant (as represented by the stars – more stars is better). According to Model 1, a one unit increase in xG leads to an average increase of almost 1 point. That is a pretty substantively large effect. For example, if Dortmund had improved by just over 3 expected goals over the course of the season, they may well have been champions.

Table 1: OLS Regressions of xG, Market Value, and Absences, on Points Totals

Following this, Models 2 and 3 illustrate the effects of Transfer Spending, Market Value, and Absences. Absences is statistically significant in Model 2 (though not by much) and Model 3. The substantive effect also moves in the right direction in both models, as we should expect an increase in absences to have a negative effect on outcomes. However, the substantive effect is relatively small. I think this is largely a product of the model not really capturing everything. I suspect that the effects of injuries depend on who is injured, the depth available to replace them, and the number of injuries occurring at any one time. If we wanted to really understand the effect that player absences has on outcomes, a more complex model would be necessary. That’s something I hope to explore at a later date.

Finally, regarding Transfer Spending and Market Value, both have statistically significant effects on total points. These results are also visually represented in Figure 1. There does appear to be quite a bit of variance, but I think a more complex model that accounts for the effects of previous transfer windows would reduce this.

Figure 1: The Effect of Transfer Spending on Total Points

Transfer spending has an especially strong effect (I also looked at net spending, and the effects weren’t as strong). For every one unit increase in transfer spending, which is a €1m increase, there is an average increase of 0.28 points. From this model, an increase of €10m in our transfer spending last season may have helped us claim the title. Market value is producing a smaller substantive effect, though it is not meaningless. Consider that the mean market value is around €145m (median value ~€90m) and the maximum value is €845m. An increase of around €40m could result in an increase of 3 points. That’s asking a lot for smaller clubs in the league, but for the bigger teams, that is pretty attainable.

xG is clearly the most powerful variable in the model, and the only alternative that is likely to explain more is the actual number of goals scored by a team. But there is still more to the game that can’t be explained by Expected Goals, as shown by the significant effect that expenditure and player absences has on the points totals.

Measuring individual performances

Hopefully the previous section gives a little insight into the methods available for statistical analysis, the kind of variables that can be used to explain points totals in football, and xG’s role in modeling performance and outcomes. However, xG is also useful for comparing individual performances. The following section will consider Expected Goals, and two further measures that have been developed from it, Expected Assists (xA) and Expected Goals Against (xGA), using them to look at some of last season’s top performers in three key positions. Data in the following section is compiled from Understat, WhoScored, and Fox Sports.


xG is perfect for judging the performance of forwards over the course of a season. We can use Expected Goals to compare players across the league, and in particular, we can compare their actual goals with their expected goals, and analyze the differences between the two.

Figure 2 ranks the top 20 goalscorers in the last Bundesliga season, with their xG numbers included for comparison. Perhaps the biggest story here is the over- and under-performance of the top two.

Figure 2: Forwards ranked by Goals and xG

Lewandowski is the top goalscorer, but according to his xG, he should have won by a lot more. Paco Alcácer, meanwhile, came in second, despite an xG that was lower than the majority of the 19 other players included in this analysis. I suspect Paco’s scoring will probably prove to be unsustainable, but this also appears to be a common trend across a number of attacking players in the Dortmund side. In Lewandowski’s case, I suspect some of it may be luck. For his xG to be so much higher than his actual goals, while also having a record as one of the most lethal finishers in the world, suggests that some of that is the product of bad luck. But with that said, I think there are likely some system-effects going on.

Figure 3 plots each player’s total shots and their total xG for the season, with each observation color-coded by the number of big chances missed.

Figure 3: Shots vs xG

This figure really highlights the incredible production Lewandowski offers, and how much of Bayern’s offense runs through him. His xG is much higher than anyone else in the league, and he takes many more shots than everyone else too. However, he does miss a lot more big chances as well. I think this is probably the product of being the main man up top for Bayern. Efficiency is less important in a team that will produce a lot of chances, but doesn’t have tons of goalscorers (Gnabry is the only other Bayern player that makes the list, and Goretzka just missed out). In order for Bayern to succeed, they need Lewandowski to be putting shots on goal as often as possible.

In addition, it’s interesting to note that all three Dortmund players produce high xG totals from relatively few shots. Sancho in particularly efficient. This lends weight to the idea that part of this over-performance is systematic efficiency as a product of the way Favre plays. I do there is an element of luck involved too, but I think the efficiency and quality of the chances created is a part of the story.


xG doesn’t really capture a lot of what is being done by individuals on the pitch that are expected to create quality chances rather than convert them. In order to do that, xA measures the expected number of assists a player will claim. Figure 4 ranks the top 20 creative players in the league, by their number of assists, and includes xA for comparison.

Figure 4: Creators ranked by Assists and xA

Again, Sancho is vastly over-performing. However, assists are an extremely noisy statistic. Perhaps Sancho is just in the right place at the right time? We can try and look through some of this noise by including a couple of variables in addition to xA. Figures 5 & 6 show the relationship between xA and Big Chances Created/Assists, and include a color-coded variable of Key Passes.

Figure 5: Big Chances Created vs xA
Figure 6: Assists vs xA

The first thing I see when I look at these figures is the incredible production by Joshua Kimmich. He ranked second behind Sancho for the number of assists he claimed last season, but Figures 5 & 6 show that Kimmich is a creative force. He made many key passes and big chances, and his xA numbers are the best in the league. These numbers really show that Kimmich is a brilliant player. In addition, we can also see that Thomas Muller is creating a lot of chances. He didn’t make the cut of the top finishers in the league, but he clearly plays a major role in Bayern’s success.

In terms of Dortmund players, Sancho, Gotze, and Reus are all ranked highly. Figure 6 highlights Sancho’s crazy over-performance. This is, again, likely to regress over time. But I think these numbers are also the product of Sancho’s quality, and his playing in a team that puts him in positions to do some real damage. Finally, it is rather exciting to see all three big summer signings on the list. Brandt and Hazard are both creating lots of chances, and Brandt is leading the key passes as well (alongside Kimmich). Schulz’s assists were not extremely high last season, but his xA was, and given the way Dortmund plays, it is reasonable to expect he will be creating lots of chances next season.


The third group of players I will look at are goalkeepers. In this case, I don’t rank goalkeepers by goals conceded, as it tells us a lot less. Even when including only the top 20 goalkeepers by appearances, the top performers are those goalkeepers that played the fewest games. Figures 7 & 8 help us understand goalkeeper performance. xGA pretty closely approximates the number of goals conceded, and over-/under-performance is less prevalent. This is because it has less confounding variables effecting it. When a goalkeeper faces a shot that has a high probability of going in, there is typically very little to get in the way of this, other than poor finishing or good goalkeeping.

Figure 7: Goals Against vs xGA
Figure 8: Shots Against vs xGA

Nonetheless, there are a few goalkeepers that are doing particularly well. Perhaps most notably, RB Leipzig’s Péter Gulácsi. Gulácsi over-performs his xGA by the most of any player. This is especially impressive because he is already expected to concede so few anyway. All of this despite having to make more saves than the rest of the goalkeepers around him, and facing a close-to-league-average number of shots. Every goalkeeper that conceded less than him did so by playing less games than him. His numbers are pretty astonishing. These numbers suggest he is one of the best goalkeepers in the world right now, and a huge part of Leipzig’s success. In addition, Sommer, Pavlenka, and Jarstein are also over-performing their xGA by considerable margins.

Interestingly, despite being one of Dortmund’s best players all season, Burki doesn’t actually stand out at all. His ratio of goals conceded to xGA is close to 1. Of the goalkeepers that played close to the full season, he conceded less than most others, but that is to be expected. I suspect that Burki’s numbers are held back by his propensity for the occasional howler. On the whole, Burki was absolutely incredible last season, but he still has the capacity to concede some terrible goals, and I imagine that held him back a little.

Finally, Manuel Neuer conceded the second least of any goalkeeper in the analysis, though this is in large part due to his playing fewer games (and playing those games for Bayern). He conceded about 3 more goals than his xGA, despite facing the fewest shots (58) and having to make the fewest saves (34) in the league. He actually had the 3rd highest clean sheets (11), ahead of Burki (10), but I think this has much more to do with Bayern’s overall quality, and their ability to keep the ball and not allow the opponent to threaten them often. Because Neuer is at the tail of the distribution, I’m not sure Figures 7 & 8 are totally capturing his performance, but I think there are signs that he was pretty poor. He has been suffering with fitness and injury issues though, so it will be interesting to see if he returns to his former best.


So to wrap things up, hopefully the uses and value in statistics like xG is now clearer, and hopefully this proved to be an interesting look into the statistical tools available to analysts seeking to understand, explain, or predict football. Apologies that part two proved to be a littler longer, but I think there’s a ton of really interesting stories here, a number of which I intend on exploring further in future articles. Though xG is obviously the latest fad, that doesn’t mean it is without merit. It offers a great deal of insight into performance, and helps us identify trends that our eyes might not see. It is imperfect, of course, but I think that’s fine. As long as you understand it’s limitations and when it is appropriate, it can be an excellent tool for analyzing football.