#23 – FBRef and Progressive Passes


There’s so much more data available for football matches now than there ever has been. There are some fantastic initiatives making tracking and event data freely available by StatsBomb and Metrica Sports. But there’s also lots of other information more widely available out on the web at the likes of https://www.transfermarkt.co.uk/ and recently fbref.com. These won’t be as detailed match to match, but offer a wider overview of what’s happening on and off the pitch.

Accessing information from the web can be time consuming unless automated, which is made very easy using some powerful Python packages and tutorials widely available, like FCPython. There are limits to what you can/should access automatically since we don’t want to overload websites with requests. More information on scraping can be found here.

There are many public functions for web scraping different places, but advice I would give would be to try and make your own to ensure you actually understand what you’re getting. I’ll still add my own interpretation of a web scraping function which works for player/squad tables on fbref, so here’s a link to the GitHub page which has the functions I used with some examples: https://github.com/ciaran-grant/fbref_data

Progressive Passes

There are lots of types of passes available on fbref, it’s really appreciative having all this data available to explore. Here I am taking a look at progressive passes, with a view to see who’s leading the way this year and if there are any styles we can infer.

Among the progressive passes available the ones that I’ve focused on are below:

  • Total Progressive Distance
  • Number of Progressive Passes
  • Passes into the Final Third
  • Passes into the Penalty Area
  • Key Passes
  • Expected Assists

These have been chosen to try to get a cross between both quantity and quality of ball progression.

As each player has played a different number of minutes, I have used per 90 minutes to compare players. It may be more suitable to use number of minutes in possession for offensive passing such as this, and also number of minutes out of possession for defensive measures. But per 90 minutes goes most of the way there.

Each metric has a significantly different range of values, for example Total Progressive Distance per 90 minutes will be in the 100s/1000s whilst xA per 90 minutes will be between 0 and 0.5 usually. An extra 1 progressive distance is way less impressive than an extra 1 xA. To compare between statistics I’ve normalised each relative to the best performer respectively, this forces all comparisons relative to their peers at their productivity per 90 minutes.

This makes it hard to compare between different groups sometimes, but as long as you’re aware of the context what everything is relative to then that should be minimised.

On to the fun stuff!

Out of all players in the 2019/20 season it’s no surprise that two of the most complete progressive passes were: Lionel Messi and Kevin De Bruyne. We’ve passed the Messi sense-check at least.

Both are among the top across almost all statistics in both ball progression and actually creating quality chances.

There seems to be two styles that most players fit into, the above two are aliens so they don’t count. There are chance creators:

Angel Di Maria leads everyone in passes into the penalty area and xA, with lots of key passes too. These types of players seem to be great at using the ball in and around the box, converting possession into chances.

There are also deep progressors:

This is among all players, position agnostic. And David Alaba appears as the best passer into the final 3rd, whilst also high volume and distance in progressive passes. These are deeper players who can move the ball from the halfway line into the final third for your chance creators to thrive.

I have identified these personally just using some intuition, I think next steps will be to test my theory and apply some clustering or PCA to these players to try to identify more styles.

As everyone is secretly wondering, here are a selection of some of the best u23 performers I have found. In no particular order we have Christopher Nkunku, Martin Odegaard, Jadon Sancho and Trent Alexander-Arnold. These are normalised relative to other u23 players:

Whilst for perspective at just how good these guys are, here they are relative to everyone. They are some of the best players in the world already, pretty scary.

#19: How to quantify the prevention of a potential goal scoring opportunity


Chances Created and Chances Missed

Chances Created is a metric which tries to quantify the number of goal scoring opportunities that a player is directly involved in.
Opta definition is Assists + Key Passes, where Assists are passes (final touch) which result in a goal from the subsequent play and KP are passes which result in a shot that doesn’t become a goal.
So chances created can be reduced to the final passes (touches) before another player has an attempt at goal (and scores or doesn’t).
As I’ve discussed on podcast The Monthly Football Podcast, Assists alone are pretty random since you rely on the shot actually going in the goal, so chances created is a bit less noisy and should more reliably predict future assists than assists actually do due to the sheer volume of chances created and opportunities for goals to be scored rather than relying on goals actually being scored which is hard.

Chances created relies on a player to actually have a shot at the end, otherwise there is no record of the opportunity. Opta also have ‘Chance Missed’, which is defined as a big chance opportunity where the player doesn’t get a shot away. Chance missed will be attributed to the player who has the big chance and decides not to shoot, which doesn’t help the creator who provided the chance. If we assume that the miss is largely due to the player not executing an attempt, then mapping these chances missed back to the creator in addition to chances created would give credit to creating the opportunity and not punishing them for something out of their control, such as the forward deciding to delay a shot and missing the chance to.

Chances Denied Metric

As usual, chances created and most quantified statistics deal with the offensive side of the game since it’s more tangible. Shots are there and they happen, counting them is pretty straightforward. A bit less straightforward is to count the passes prior to shots, with chances created. Both of these can be tracked over many events and quantify expected outcomes based off similar situations in the past, this results in expected goals and expected assists. What is not straightforward is how to quantify the benefit of defensive actions.

We can count tackles, blocks, interceptions and recoveries, however, much like steals, blocks and rebounds in the NBA, they don’t quite tell the whole story about how a defence works. Weaker teams are asked to defend more since they have less possession, this means they have more chances to rack up interceptions, tackles and recoveries. Possession adjusting these measures helps somewhat to normalise these differences, which means that we can compare the frequency of each action assuming they all have equal chances to do so. However it’s still hard to differentiate the quality of the actions, or how important they were to each team.

Chances denied are an attempt to quantify how much of an opportunity was denied by an interception or ball recovery. In a purely defensive, denying your opponent a goal scoring opportunity, sense, recovering the ball in the middle of the pitch is not as important as recovering the ball on the edge of your own box. Expected threat, created by Karun Singh (@karun1710), is a metric which quantifies how likely a team is to score from each location on the pitch within the next 5 actions. If we assign the xT to a recovery or interception or tackle considering the location on the pitch it occurs then we may get a proxy for how important each action was. Since defensive teams will get more opportunities, it may be worth possession adjusting this also to compare like for like.

The general concept trying to be captured here is to quantify the quality of chance or potential quality of chance that is denied due to the action taken by the defender. This quantity can be given solely to the defender making the action or collectively assigned to the players involved to appreciate the team aspect of defensive play. There is a question whether to include tactical fouls in here as well as legal ball recoveries, but will save that for another time.



#16: StatsBomb – Messi Ball Receipt Locations


** Re-uploaded with correct y-axis (duh, Messi played on the right..)


Over the last few moths, Statsbomb have released all of the event data for matches including Lionel Messi’s La Liga matches. Data this detailed and clean is incredibly hard (expensive) to come by, so to give free access to everyone is amazing and much appreciated!

I’ve only just had a chance to take a look at the data, and seen many great pieces put out already. Considering who the data is of, I don’t think it will ever go out of fashion so it’s never too late to start playing around.

You can get access to the data and there’s a very helpful getting started guide here:

** You do need to have the latest version of R and the StatsbombR package installed

There are almost too many things to look at in this data set, so I’ve decided to try to focus on a specific part of Messi’s game and see what I find.

Messi gets the ball, a lot. And he obviously does great things with it once he’s got it, but taking a look at where/how he manages to get the ball would be interesting. Surely the one thing an opposing team would try to do against him would be to try to stop him getting the ball or at least limit him receiving the ball in dangerous areas. That’s the inspiration for taking a look at where he receives the ball on the pitch.

Data Prep

Load all the necessary libraries. The usual suspects for R data manipulation like plyr/tidyverse/magrittr, plotting graphs and pitches with FC_RStats’ SBpitch/ggplot2/cowplot and access to the data in StatsBombR.

# Libraries ---------------

We are only looking at La Liga matches, so let’s only load matches from that competition. There is even a cleaning function ‘allclean’ which adds in some extra columns which will be of use such as x/y locations. We have joined on the season names also as they’re much more intuitive than the season id that has been assigned.

There are events from matches from Messi’s debut season 2004/05 through to 2015/16, consisting of events such as shots, passes and even nutmegs. We’re interested in passes received by the man himself. Note there is also an indicator “ball_receipt.outcome.name” that identifies when a pass is missed, we want to exclude these and only look at passes to Messi that he received (NA values).

Plotting a pitch

To get some perspective relative to an actual football pitch and use StatsBomb’s event location data, FC_RStats has created a function “create_Pitch” which does exactly that. Using ggplot2 and a set of pitch type parameters, it’s easy to plot a pitch with the same proportions as the event data collected by StatsBomb.

This pitch can be used as the base to visualise all events by plotting the x/y locations.

goaltype = "box"
grass_colour = "#202020"
line_colour =  "#797876"
background_colour = "#202020" 
goal_colour = "#131313"

ymin <- 0 # minimum width
ymax <- 80 # maximum width
xmin <- 0 # minimum length
xmax <- 120 # maximum length

blank_pitch <- create_Pitch(
  goaltype = goaltype,
  grass_colour = grass_colour, 
  line_colour =  line_colour, 
  background_colour = background_colour, 
  goal_colour = goal_colour,
  padding = 0


Quick Look

Initial data and visual processing is done, we can now start to take a look at the interesting stuff!

At a high level, we can have a look at all of the times Messi received the ball and plot them on a pitch. This will probably get overcrowded but can start to provide some understanding.

# All ball receipts ------------
Messi_Plot <- 
  blank_pitch +
    geom_point(data = Messi_Ball_Receipts, aes(x=location.x, y=80-location.y), colour = "purple") +
    ggtitle("Messi Ball Receipts") +
    theme(plot.background = element_rect(fill = grass_colour),
          plot.title = element_text(hjust = 0.5, colour = line_colour))

Since eachtime Messi receives the ball is in a specific location, we’ve used points to represent this on the pitch. This looks okay initially, but it’s pretty hard to work out exactly what’s going on and doesn’t really tell us anything we didn’t already know. Messi gets the ball a lot in the opposition’s half.

There are lots of overlapping points, let’s try to get a view of the density distribution to see where specifically he has received the ball the most.

# Density Receipts ----------------
Messi_Density_Plot <- 
  blank_pitch +
    geom_density_2d(data = Messi_Ball_Receipts, aes(x=location.x, y=80-location.y), colour = "purple") +
    ggtitle("Messi Ball Receipts - Density") +
    theme(plot.background = element_rect(fill = grass_colour),
          plot.title = element_text(hjust = 0.5, colour = line_colour))

This mostly suggests the same thing, Messi likes to receive the ball in the opposition half. Though we can now also see that there are two “peaks”, one far out wide near the top and one closer to the centre. More central areas are more dangerous, whereas you might get more space out wide to be able to receive the ball easier.

Luckily (definitely not luckily) StatsBomb just so happen to have a flag which identifies events that occurred under pressure. I believe under pressure is taken as having an opposition player within X metres of you actively affecting your decision making.

Let’s take a look at Messi’s ball receives whilst under pressure and under no pressure. I would expect that you would be under pressure more often the closer you get to the opposition’s goal.

# Pressure --------------
Messi_Ball_Receipts <- Messi_Ball_Receipts %>%
  mutate(pressure = ifelse(is.na(under_pressure), "No Pressure", "Pressure"))

Messi_Pressure_Plot <-
  blank_pitch +
  geom_point(data = Messi_Ball_Receipts, aes(x=location.x, y=80-location.y, colour = pressure)) +
  ggtitle("Messi Ball Receipts by Pressure") +
  theme(plot.background = element_rect(fill = grass_colour),
        plot.title = element_text(hjust = 0.5, colour = line_colour),
        legend.position = "bottom",
        legend.title = element_blank(),
        legend.background = element_rect(fill = grass_colour),
        legend.text = element_text(color = line_colour))


I’m not really sure what I expected. It’s pretty hard to distinguish between the two as there are so many points. To further filter the data we can take a look at this in each season.

# Pressure Season Loop ----------------
for (i in rev(La_Liga$season_name)) {
      blank_pitch +
        geom_point(data = Messi_Ball_Receipts %>% filter(season_name == i), 
                   aes(x=location.x, y=80-location.y, colour = pressure))  +
        ggtitle(paste0("Messi Ball Receipts by Pressure - ", i)) +
        theme(plot.background = element_rect(fill = grass_colour),
              plot.title = element_text(hjust = 0.5, colour = line_colour),
              legend.position = "bottom",
              legend.title = element_blank(),
              legend.background = element_rect(fill = grass_colour),
              legend.text = element_text(color = line_colour))

Now there’s a lot less going on. Remember those two peaks of ball receipts from above? We can see here that this is due to Messi receiving the ball in different areas of the pitch in different seasons. Again, this is something we already probably knew. Messi started his career as a wide forward so will receive the ball out wide most of the time. From 2009/10 onwards he starts to receive the ball much more centrally, coinciding with his time playing as a “False 9” up front. Coincidentally, his already ridiculous production output skyrocketted. Messi getting the ball in central areas = goal machine.

This still hasn’t really answered the question of where Messi recieves the ball under pressure as it’s hard to tell if there’s a pattern to the blue/red or if it’s all just random.

Something that can help here are marginal density plots. These can be plotted along each axis separately and can hopefully display the distribution of ball receipts more intuitively.

Taking a look at all seasons initially.

xdens_pressure <- axis_canvas(Messi_Pressure_Plot, axis = "x") +
  geom_density(data = Messi_Ball_Receipts, aes(x=location.x, fill = pressure), alpha = 0.5) +
  xlim(xmin, xmax)

combined_pressure_plot <- insert_xaxis_grob(Messi_Pressure_Plot, xdens_pressure, position = "top") 

ydens_pressure <- axis_canvas(Messi_Pressure_Plot, axis = "x") +
  geom_density(data = Messi_Ball_Receipts, aes(x=80-location.y, fill = pressure), alpha = 0.5) +
  xlim(ymin, ymax) +

combined_pressure_plot %<>%
  insert_yaxis_grob(., ydens_pressure, position = "right")

Again there’s a bit too much going on on the pitch here, but looking at the marginal distributions across each axis is interesting.

Across the top it looks like there’s not too much difference in distribution between “Pressure”“ and “No Pressure”. There is a higher peak for “No Pressure”“ about halfway inside the opposition’s half which could be due to Barcelona practically camping themselves outside the opposition box and all defenders are on the edge of their own box for the majority of most games.

Along the right is as expected, there are many more pass receives under no pressure out wide.

And for each season separately.

for (i in rev(La_Liga$season_name)) {
    p <- blank_pitch +
      geom_point(data = Messi_Ball_Receipts %>% filter(season_name == i), aes(x=location.x, y=80-location.y, colour = pressure)) +
      ggtitle(paste0("Messi Ball Receipts - ", i)) +
      theme(plot.background = element_rect(fill = grass_colour),
            plot.title = element_text(hjust = 0.5, colour = line_colour),
            legend.position = "bottom",
            legend.title = element_blank(),
            legend.background = element_rect(fill = grass_colour),
            legend.text = element_text(color = line_colour))

    xdens <- axis_canvas(p, axis = "x") +
      geom_density(data = Messi_Ball_Receipts %>% filter(season_name == i), aes(x=location.x, fill = pressure), alpha = 0.5) +
      xlim(xmin, xmax)
    xplot <- insert_xaxis_grob(p, xdens, position = "top") 

    ydens <- axis_canvas(p, axis = "x") +
      geom_density(data = Messi_Ball_Receipts %>% filter(season_name == i), aes(x=80-location.y, fill = pressure), alpha = 0.5) +
      xlim(ymin, ymax) +

    comb_plot <- insert_yaxis_grob(xplot, ydens, position = "right")

Now this is what we all came here to see.

For the first 6 seasons of his career (2004/05 – 2009/10), Messi actually received the ball closer to goal under no pressure than he did under pressure (distribution across the top), which is pretty incredible and opposed to both what you would expect and what we saw overall. These are the wide forward Messi seasons, which shows just how good he was and how good he was getting at being a wide forward. Where he most often received the ball under no pressure (peak of the top distribution) actually moves closer to the opposition goal until 2008/09.

Then in 2009/10 something magical happens. Somehow he manages to receive the ball under no pressure both closer to the goal (across the top) AND dead in the centre of the field (along the right). Which of course is a recipe for success.

From then on, looks like teams at least tried to put pressure on him when he received the ball close to the goal. Not really sure that worked so much though.

There are a lot more amazing things from Messi’s career hidden away in this amazing data set. Thanks again to StatsBomb for the free access to explore and show off some things that are possible with the data.


#14 What Defines a Successful Season?


No matter what happens, Liverpool’s season is a success.

With the best Premier League title race going down to the last day of the season, it’s a large contrast to last season where Manchester City had the league wrapped up and were aiming for 100 points. They became the Premier League team with the most points in a season, beating Chelsea’s 04/05 95 points by 5 points, arguably becoming the most successful Premier League team. They were so good that it’s not such a surprise that this year Manchester City are on 95 points with a game to play, potentially getting 98 points and becoming the team with the second most points scored the year after smashing the record last year. The surprise of this year is that despite Manchester City being so good, the title is still going down to the final day. Liverpool have got 94 points with a game to play and a win will bring them up to 97 points, becoming at least the team with the third most points, depending on how Manchester City’s game turns out. We are likely seeing two of the top three premier league sides ever in the same season, with the best team being one of these teams last year! It’s truly an incredible season and hopefully we appreciate how good these teams are.

This brings us to the imminent question and judgement on the whole season that comes from these last games. One team will be champions and one team will not. One team’s season will be a success and one team’s won’t. That may seem unfair since as discussed, these could be two of the three best teams to be seen in the Premier League.

However, there are of course more trophies to be won and means to success than just the Premier League. Manchester City got 100 points last year, on course for 98 points this year, have already won the Carabao Cup, Community Shield and are in the FA Cup final. They are on course for the domestic treble and the only team to get more points than them in a season were themselves last year, but they got knocked out of the Champions League quarter-finals to Tottenham. Which people are keen to focus on, despite domestic success once again Manchester City failed in Europe. The criticism is fair, Manchester City were favourites to beat Tottenham over two legs but they didn’t, largely due to prioritising the league over their first leg match. That’s where the problem with success lies, Manchester City were going for the Quadruple and looks likely they will have to settle for a domestic treble. This just shows how high their standards are and what perceived success is for a team of their quality. With two games to go, one in the League and one in the FA Cup final, from here they expect to go on to win both. However, if they don’t they already have the joint second highest points total and have won the Carabao Cup, this is probably not successful considering how close they got to all of their goals but is one hell of a season with all the chance to do the same again next year.

In comparison and with the incredible Champions League semi-finals just behind us, Liverpool have made it to the Champions League final for the second time in two years and are favourites to win this time against Tottenham. Liverpool are in contention to win the Premier League and the Champions League this year, that is an incredible achievement in itself. They lost to the Real Madrid three-peat side with Cristiano Ronaldo and without Mohamed Salah last year, as expected. Most teams don’t get to a single European final, let alone get to back to back finals. They have managed to beat Paris St Germain, Bayern Munich and Barcelona on their way to the final, even with Lionel Messi largely pulling the semi-final tie away from them in the first leg, they were the better team across both legs and you can’t argue they don’t deserve to be there.

As a worst-case scenario for the finish to this season, if Liverpool lose in their final Premier League match and lose the Champions League final to Tottenham, they will still have the fourth highest points total in a season and have got to back to back Champions League finals. Even at worst case scenario, you could argue that’s a successful season. They are expected to win the Champions League and beat Wolves on the final day, ultimately getting 97 points and coming second to the second-best team in the Premier League. Their expected finish to the season is definitely a success. If Manchester City were to drop points and Liverpool won the League title, doing the Premier League and Champions League double whilst getting the second highest points total in a season would cement this team among the Premier League’s best. It’s not possible to on one hand potentially be considered the best ever, but also potentially be considered to have an unsuccessful season based on 2 games of football. No matter what happens, Liverpool’s season is a success.


#12 Statsbomb Event Data – Fernandinho Replacements


Manchester City find themselves once again top of the Premier League, with the chance to retain the title for the first time in 10 years since Manchester United in 2008/09. However they also find themselves without Fernandinho, the only seemingly irreplaceable player in their squad that overflows with talent. Fernandinho has missed four Premier League games so far this season, the two at the end of December in which they lost and left the league title in Liverpool’s hands and the two most recent games which were both dominating 1-0 wins. Even if their performances were no worse off and just lacked some luck, no doubt there is nobody else in their squad who can do exactly what Fernandinho does.

Even Guardiola has commented that there is no doubt they will be looking to bring in a replacement:

“I think with the way we play we need a guy who has of course physicality, is quick in the head and reading where our spaces to attack are”


In this post I will try to scout a replacement for Fernandinho using Statsbomb’s 2018 FIFA World Cup Event data. This is a small sample size, so will only include players and their performances in the World Cup. I will define some metrics that could be used to describe the type of player that would fit the role that Fernandinho plays and identify those players that performed best during the World Cup.

Guardiola talks about physicality, quickness of thought and reading where the spaces will be to attack. It is hard to quantify those qualities, however using adapting some simpler metrics could give a good shortlist.

We know that Manchester City will have the ball a lot and want to get the ball forwards to their more attacking players in attacking areas, relying on Fernandinho to progress the ball. Using Statsbomb’s passing events, with the start and end location in x, y coordinates, I have defined a ‘Progressive Pass’ to be one that moves up the pitch more than 10m. Players who have the ability to progress the ball forwards are desired. It could be argued that we also want to only include players who progress the ball from deeper positions so as to more accurately emulate Fernandinho’s role, however we have a small sample as it is and the ability to play progressive passes is what we are looking for.

Whilst lots of players are great at passing, what makes Manchester City so special and Fernandinho so hard to replace, is their ability or willingness to win the ball higher up the pitch. Check out a previous post in the link below where I show how many more times they win the ball back in the opposition’s half. In the same vein, using Statsbomb’s ball recovery event with the x, y location I create a count of times that a player has recovered the ball in the opponent’s half. This tries to emulate the ability to win the ball back quickly after losing it and pinning the opposition back.


The combination of progressive passes and high ball recovery is used as a proxy for the type of skills that Fernandinho portrays and can be used to get a shortlist of players that perform similarly. Looking at only the players who played positions considered as central midfield or defensive midfield, the top 10 is below.

Figure 1: Midfield Progressive Passes and Opponent Half Recoveries Top 10 from 2018 FIFA World Cup

One thing to note is that these are pure counts and not per game or per 90min. It would be worth taking a look at that to account for the differences in games and minutes played. For example, Croatia making the Final and Germany getting knocked out in Group Stage is a difference of four games, so Toni Kroos making it to 2nd on the absolute list is incredible.

Initially it looks like the list makes sense, players like Kroos, Modric, Rakitic are all players who you could see being able to play in a deeper midfield role. Mascherano is also in the same mould, even more so considering he has played at Centre Back most of the time for Barcelona and Fernandinho has begun to slot in there to bring the ball out.

Those players are all 30+ years old so no better than Fernandinho in terms of potential replacements. Granit Xhaka and Marcelo Brozovic are two that are just entering their prime midfield years at the age of 26. This is where it’s important to note that when scouting, context is important and large sample sizes are encouraged. Xhaka may have the progressive passing ability and love of yellow cards, but probably wouldn’t have the discipline.

This post has looked at outlining a way to narrow down a shortlist of potential replacements for Fernandinho, the methods can be used to find similar players for any player as long as you can identify what you are looking for. Ideally you would get a much larger sample size of games and could look at a player’s contribution per game or per 90mins to get a more stable shortlist. In the future I would like to look at some unsupervised methods which don’t require you to specify or create the similar fields as I have done here.

I have included the total passing heatmaps and the recovery maps of selected players; if you want to see any players specifically from the World Cup from any position then give me a shout!

Once again, massive shout out to Statsbomb for providing the free source of event level data, it’s hard to come by and even harder to collect so it’s much appreciated!


#11 Normalizing xG Chain – Are all actions created equal?


In this post I will be taking a look at the concepts of xG Chain (xGC) and xG Buildup (xGB), why they are useful and how we can develop these concepts to get even more use from them. Both of these concepts further the expected goals (xG) and expected assists (xA) metrics, allowing the contribution of players not directly involved in a goal to be accounted for.

xG is a likelihood attached to each shot that attributed the chance of that shot being a goal. This metric is only really useful for players who take lots of shots, such as forwards.

xA is attached to a pass that immediately precedes a shot, the xA measures the likelihood that a pass will become an assist from the following shot.. This metric aims to widen the influence of the xG metric and attribution of play to the creative players who create the shots that the xG provides information for.

Both of these are intuitive and simple concepts that provide an estimate for specific actions on the pitch. Since goals and assists are key events in a match, it makes sense to focus analysis on them since they are incredibly predictive. xG and xA are very limited however, they only care about a shot and the preceding pass so don’t tell us anything about any of the play that happens leading up to there. It turns out that the majority of football isn’t just taking turns taking shots, so it would be nice to be able to do something like xG/xA for other actions on the pitch.

Just as xA is to xG; attributing the result to the preceding pass, xG Chain is to xA where it aims to do the same thing for the whole preceding possession chain. In this way you can widen the influence of xG to all players that are involved in the preceding possession. Where xG mainly highlights forwards and xA mainly highlights creative players, xG Chain aims to highlight players that make contributions to the possessions that end up with a shot. These could include your ‘assisting the assister’ players, your deep lying playmakers like Jorginho who get criticised for lack of assists or your progressive passing defenders that wouldn’t usually get the credit they potentially deserve for starting effective possessions.

Calculating xG Chain: https://statsbomb.com/2018/08/introducing-xgchain-and-xgbuildup/

  • Find all possessions each player is involved in
  • Find all shots within those possessions
  • Sum the xG of those shots (usually take the highest xG per possession)
  • Assign that sum to each player, however involved they are

You can normalise xGC per 90mins to see contributions per match, however this still highlights forwards and creative players since if they are the players getting the shots then they will get all the credit for their own shots plus any other possession chains they are involved in.

Since the aim is to highlight players that xG and xA don’t directly pick up, you can calculate xGC without including the shots and assists to get xG Buildup. This leaves all of the preceding actions to the assist and the shot, or all of the build up play as it were. By removing assists and shots, the dominance of forwards is removed and the remaining players are heavily involved in all the play up to just before the defining assist and shot. You can also normalize xGB per 90 mins to see contributions per match. Again, each player involved gets equal contribution as long as they are involved in the possession chain in some way.

xG Chain and especially xG Buildup are great metrics that highlight the contributions of players leading up to assists and shots. They allow players that don’t contribute directly to goals to make a case for their own importance. Normalising per 90 mins is an effective way to allow for reduced player minutes due to injury or substitutions, and evaluate all players on the same basis.

As great as the concepts of xGC and xGB are, there is a clear and influential flaw in the calculation when assigning the xG of the possession chain to the players involved. Each player gets equal contribution no matter how involved they were. So player A makes a simple 5 yard pass in their own half gets the same assigned contribution as player B who made the decisive through ball to a player who squared it for an open goal. Neither player would get credit in xG/xA but both would get the same xGC/xGB contribution despite the fact that player A’s contribution was potentially arbitrary and player B’s turned the possession chain from probing to penetrating and a shot on goal.

Another way to consider the contributions of each player is if you were to remove the action of that player, how likely was the possession chain to have still occurred. If you remove player A’s simple pass, it doesn’t take much for the possession chain to maintain its low threat whereas if you remove player B’s decisive through ball then it’s unlikely that the possession chain continues in the same way. In this way, player B’s contribution could be argued to be more important than player A’s.

This leads to considering other ways of normalising xGC and xGB, each method of assigning contribution and normalising will highlight different aspects of the build up.

Since you have all the information of each possession chain, you may have access to the number of passes or touches that each player contributed to the chain. If you proportion the xGC out by the frequency of passes or touches you can get a good idea of the proportion of involvement that each player has in each possession chain. For example, if a possession chain involves two players, C and D, where player C made 3 passes and player D made 4 passes with a resulting shot that has an xG of 0.7. Then player C contributed 3/7 passes so gets an xGC of 3/7 * 0.7 = 0.3 and player D contributed 4/7 passes so gets an xGC of 4/7 * 0.7 = 0.4. Since player D was involved slightly more than player C then player D gets a higher xGC. A similar calculation can be made using touches which will consider players who dribble more than just counting passes.

You aren’t limited to just counting passes or touches of the ball, you can get more creative with the allocations if you want to credit specific types of actions. You could only count progressive passes that move the ball forward by at least 10 yards, try to quantify the most important or necessary actions of a possession chain (decisive through ball/taking on a player in the box) or count the number of opposition players taken out of the game by each player involved, where ‘taking a player out the game’ may be defined as moving the ball closer to the defending team’s goal than the player.

xG Chain and xG Buildup are both intuitive and simple metrics that assign contributions to players that don’t get directly involved in taking shots or assists but are frequently involved in preceding actions to these events. On their own they can already highlight players that seem to contribute well under the ‘eye-test’ when you watch them, but they can be misleading and provide many false positives since all actions are considered equal under xG Chain.


Credit to Statsbomb and Thom Lawrence for introducing concepts and providing clear explanations and examples. They even include free data sets for FAWSL and the 2108 FIFA World Cup if anyone wants to try themselves. Check them out here:


#10 Match Report: Man City 2 – 1 Liverpool


Liverpool head into their first game of 2019 still unbeaten and 7 points clear of arguably the best ever Premier League side, reigning champions Manchester City. Manchester City were on course for another incredible year, and still are by anyone else’s standards, however losing at home to Crystal Palace and then Away to Leicester in 2 of their last 3 games was not in the script for their next documentary.

Up to Christmas, City had been unbeaten too, sitting top of the league and had already played all of the other ‘Top 6’ sides away from home. it was looking like the question was whether City could go unbeaten, with Liverpool doing amazing to just keep up. A severe dip in form, a key injury and some incredible shooting against them saw City relinquish the lead in the title with Liverpool not looking like slowing down at all.

A Liverpool win at the Etihad and the gap becomes 10 points, arguably the title race is over without a Liverpool collapse (not impossible). A draw would maintain the 7-point gap, but would also give Liverpool hope that they can continue in their excellent season since the champions couldn’t beat them at their own ground. Whilst a win for City would reduce the gap down to 4 points, which means City are still relying on Liverpool messing up, but it also means that Liverpool are no longer untouchable and City will have put doubt in Liverpool’s minds.

Considering City finished champions 25 points ahead of Liverpool and won 5-0 in this fixture last season, if I were to say to you that this was the most even game of the season so far would be surprising to say the least. It shows how far Liverpool have come in such a short space of time that that is indeed the case, this game was incredibly even and almost any result could’ve happened if repeated.

City did end up winning 2-1, however the Expected Goals (xG) from Understat suggest it wasn’t an easy win. The xG score was City 1.18 – 1.38 Liverpool, suggesting arguably Liverpool would win this game more often than City if repeated and a draw is most likely. For a game with two of the highest scoring teams in the league, there were not that many shots or chances created with only 9 – 7 for City – Liverpool respectively. This low shot volume adds to the variance in xG numbers and emphasises that it would be more down to individual skill at finishing or luck to determine this game rather than an overwhelming inevitability that someone would score.

Figure 1: Size of bubble = Expected Goals (xG), Location = Location of shot, Stars = Goals

In terms of finishing and scoring goals, Liverpool were not very clinical however they did create the best chance of the game with a lovely cross field pass followed by a first time cross across the box for a tap in to an empty goal. They also had a ball cleared off the line by centimetres following a scramble after a rebound off the post. Other than those two, Liverpool were limited to shots through crowds of bodies. City managed to manufacture some chances through counter attacks, and also capitalise on the fact that Sergio Aguero is an incredible finisher from tight angles. Whilst Liverpool scored with their highest xG chance (0.62), City missed both of their highest xG chances (0.49, 0.32) and scored from two lower xG chances (0.06, 0.05) which suggests that it was City’s finishing when needed was the difference in goal scoring.

Since not very chances were being made, most of the game and interesting plays were between the two boxes. There are three players I’d like to highlight, all playing central midfield: Fernandinho, Bernado Silva and James Milner. It’s hard to quantify the effect that these players had on the game, but all three were excellent in denying the opposition any space or progression up the pitch.

No player had more ball recoveries than Silva with 10, Fernandinho had 9 and Milner whilst only being on the pitch for about an hour had 7. In of itself ball recoveries doesn’t mean much, however especially for City players it’s the area of the pitch that they win the ball back that’s so great.

Figure 2: 25/65 Man City recoveries in Liverpool’s half, 10/56 Liverpool recoveries in Man City’s half


5 out of the 10 ball recoveries for Silva and 4 out of 9 for Fernandinho were in Liverpool’s half, which suggests that City were winning the ball back high up the pitch and not allowing Liverpool to progress much further. Compared to other players with high recoveries, this is significant. Not only recoveries, but Silva also completed 3 tackles on the halfway line out of 8 (!) attempts and made 4 interceptions in Liverpool’s half. As you can imagine Silva got around the pitch a lot this game and managed to cover 13.7km which is the most in a game this season. I don’t usually like those kinds of stats since they don’t suggest anything about a player’s involvement in a game but maybe suggest that they’re just out of position recovering for the whole game. However, Silva was definitely involved and sometimes that extra effort you put in makes others do the same.

A lot of Fernandinho’s work is done off the ball, in ways that aren’t quantifiable by tackles or interceptions or distance covered. It’s clear how large an impact he has in City’s midfield since the two games he didn’t play due to injury were the two games they lost so far this season. Fernandinho deserves more than a paragraph of one game to highlight his skills, he’ll be the focus of an upcoming post in the future. But City need to find a replacement quickly for him, or find a way of playing that doesn’t rely so heavily on him sweeping up behind the front 5’s press.

It’s a shame that James Milner had to be the one to come off early in the second half, Milner plays similarly to Bernado Silva when Liverpool have the three in midfield and was as effective as Silva defensively until he got taken off. Moving to 4-2-3-1 since they needed to score was probably a sensible move, however needing a goal and leaving Jordan Henderson on the pitch alongside Fabinho (better version of Henderson) doesn’t always end well. It worked out since Liverpool scored an amazing team goal but they may have been more of a threat if Milner was alongside Fabinho. Also, doesn’t help pushing Wijnaldum out to left wing with several wingers sitting on the bench but hey.

Come the end of the season, this game will be regarded as a turning point whatever happens. Whether Liverpool collapse and City come back to win their second title in a row or Liverpool brush it off and continue in the same manor we will find out, but Manchester City have showed their hand and they are here to stay until the end of the season. We have our first real title race in years, take it in and enjoy it.

Thanks to @StatsZone and Understat for images, stats and xG numbers.


#7 Defensive Metrics [Decision Making]


“If I have to make a tackle then I have already made a mistake.”

Paolo Maldini

It’s a famous quote I’m sure you’ll have heard, but you can hear the penny drop in every single person who hears it for the first time. One of the best defenders (if not the best) to have played football couldn’t be wrong could he. Yet defenders and defensive players are judged mainly on statistics such as number of tackles or blocks. Tackles and blocks are usually last-ditch attempts to prevent an opponent from progressing.

Defending is a constant ongoing process that is happening throughout a football match, no matter who has the ball or where the ball is on the pitch. As a collective team, and individually, every player is moving into positions that adhere to a defensive structure with an aim of conceding the least amount of goals possible. Each player will contribute to that by performing defensive actions, these are usually known as tackles or blocks. However, to perform a tackle or block first requires the opposition to have the ball in a potentially dangerous area, or rather first requires you to allow the opposition to have the ball in a potentially dangerous area. More importantly and less easy to quantify would be the actions and ability to prevent a forward getting the ball in dangerous areas in the first place.

It doesn’t seem a stretch to suggest that the something better than blocking every shot on goal is to prevent every shot being taken in the first place.

When a forward has the ball, they will have an aim in mind of what they want to achieve with their possession. There will be a hierarchy of aims ranging from scoring a goal down to retaining possession of the ball. Whilst a defender will also have an aim in mind when a forward has the ball. Their hierarchy of aims will be a version of the reverse of the forwards, ranging from not conceding a goal to winning the ball back. The immediate aim of both the forward and the defender will depend on factors such as location of the pitch, time of the game, game state and the perceived abilities of each player by each player.

For example, if the striker has the ball in the penalty area then their primary aim may be to take a shot to score a goal, whilst the defender’s primary aim may be to not concede a goal.

If the fullback had the ball in their own half then their primary aim probably won’t be to score a goal straight away, but rather progress the ball up the pitch either through midfield or down the line to the winger. If those two options are not available then they potentially need to regress their aim down to maintaining possession and recycle the ball back to goalkeeper or centre backs. In this case, the defender may be a striker or a winger who has closed the ball down, the defender’s primary aim here may be to prevent forward progression of the ball towards a more dangerous position.

Figure 1: Davies’ decision making options v Chelsea

These thought processes will be going back and forth between each player at all times throughout a match. Even whilst nowhere near the ball, these are things players need to consider at maybe a more minute level. Furthering the example above with the fullback and winger, the fullback’s aim is to ball progression and the winger’s aim is to prevent ball progression. If possible, the fullback would play the ball straight into the striker so that they could progress the ball up the pitch as far as possible as quickly as possible, however collectively the defence need to negate that as an option. Maybe the defending centre back is marking the striker tightly with the defending central midfielder also blocking off any direct pass, just enough so that the fullback doesn’t consider passing to the striker a viable option.

Figure 2: Chelsea unable to prevent Davies from progressing the ball

If the defending team sufficiently prevent efficient progress into dangerous areas of the pitch then their job is made much easier. As we can see in Figure 1 and Figure 2, Chelsea were unable to prevent ball progression, as a result they are left to defend a more dangerous situation and even resort to tackling or blocking (!).

The decisions that each player has, defender or forward, aren’t limited to just marking or blocking passing options and passing or shooting. Forwards may want to dribble past players, cross the ball from wide or even off the ball may make runs into space to receive the ball. These decisions of the forwards cause defenders to react respectively, how well they deal with the questions asked by the forwards depends on the abilities of the team and players in question.

It would be interesting to look at the decision making of defenders and forwards in different situations by counting the number of times or frequency of a decision overall and whether that depends on who they are facing or where they are on the pitch. A decision here for a forward would be a simple action such as attempt a shot, attempt a dribble, pass the ball up the line or retain possession. Whilst a defensive decision would depend on the decision of the forward, it would be interesting to see if players change their decisions significantly when playing against certain players. It could be a way to measure to what degree a defender can force a forward into uncomfortable positions and into making unfavourable decisions or decisions lower down on the forwards hierarchy of aims.

As always, any feedback or questions are welcome. These are primitive ideas and just looking to provoke thoughts of football analytics from a different perspective.


#6 Defensive Metrics [Optimal Positioning]


Even though the most tracked part of a football match is where the ball is, the most interesting things often happen off the ball. The ball is only in play for 50-60 minutes of a 90-minute match, and each individual player is only on the ball a minimal part of that time. The majority of the game is played off the ball by all players, they need to move about the pitch in relation to their teammates, the opposition and the ball. A forward can move off the ball to find space between defenders to receive a pass, whilst the defenders need to keep an eye on the forwards and track these attempts.

For a specific player, at a given point of the game, there are locations on the pitch that would be considered worse positions to be in than others. For example, if the opposition had the ball on the edge of your penalty area, you would consider your central defender to be at a worse position if they were standing by the opposition’s corner flag than if they were marking the opposition’s forward. Since there is a concept of better or worse positions, that leads to the possibility of there being an optimal position for a specific player at a given point of the game. You could also think of it such as if you were to remove a single player from the game, where would you want to replace them in the game such that you couldn’t move them to a better place.

Several factors could affect the perception of a position at a given time being better or worse. These could be physical states of the game, such as locations of teammates, opposition or the ball. They could also be non-physical, such as score, the aim of the tactics or time on the clock. Considering where the ball is and who is in control of the ball will dictate the general area where your teammates and opposition will set up. Considering the tactics and formation that you and the opposition are looking to play will dictate the general areas of where individual players will set up.

Different tactical styles, scores and time remaining will affect what the aims of a team are. Some teams, such as Manchester City, want to control the largest surface area of the pitch possible. Control of the pitch can be determined by which team is likely to get possession of the ball if the ball was located in that area. Whilst other teams are aware that they can’t afford to try to control the largest surface area of the pitch, but rather look to control the areas of high interest such as around their own penalty area. Individual players need to position themselves with appropriate distances between each other to reflect their tactical style and goal. Certain distances between certain players would be better or worse than others, so again there must be an optimal distance with respect to tactical aim. When each player achieves this optimal distance, the collective team would appear to perform optimally.

A geometric way of viewing the areas of control on a pitch would be to look at Voronoi diagrams. 

Figure 1: Voronoi Diagram Red v Blue

If we look at Figure 1 with team red against team blue, each of the polygons surrounding each player would correspond to the areas on the pitch that they control. If the ball happened to be within their boundaries, they would most likely get to the ball first (considering each player has the same speed). This concept has been around for a while and has been made possible due to the technology available to football clubs, player tracking is everywhere nowadays and is crucial to understanding how your team is performing.

Voronoi diagrams can be used at a team level to understand structure and how well a team can transition between situations, but also is useful at the player level as you can identify which players find the most space or which players are the best at denying space.

In terms of quantifying better or worse positions of an individual player, the surface area of a player’s Voronoi region can be indicative of how well positioned they are. It is important to note that not all spaces on the football pitch are equal, controlling the areas closer to the goals area more beneficial than controlling the centre of the pitch. Perhaps a weighted surface area would be a better quantifier of control of the pitch and would be another contributor to identifying optimal positioning.


Special mention to below for their work already on Football Geometry and Voronoi Diagrams:
@Soccermatics –  https://medium.com/@Soccermatics/the-geometry-of-attacking-football-bee87e7a749
@UTVilla –  http://durtal.github.io/interactives/Football-Voronoi/

 #5 Defensive Metric Concepts [Expected Shot Block]


There are emerging metrics in football such as Expected Goals, Key Passes, Progressive Passes and even now Expected Assists. These are all measuring single events in a football match and quantifying their utility or effectiveness with an indicator or a probability of happening. These are also all measurements of how effective a player is at executing actions whilst on the ball, particularly in offensive positions such as shooting or creating shots from passes. They give us a better idea for which players and teams are most (or least) effective at offensive events. The higher your Expected Goals and the more Key Passes you make, the more goals your team is likely to score. However, there aren’t similar metrics that measure defensive contribution. This may be due to the act of contributing to goal scoring being an objective decision, with each goal scored there is a single player who scored it and it’s easy to allocate contributions. With allocating defensive contribution, it is hard to quantify the presence of a non-event. It is hard to quantify how much of an effect a player or team has on the opposition not scoring. I think that if it’s possible to quantify a sensible defensive metric of any kind then it could be as useful as any of the offensive metrics above, I will use this series to brainstorm some ideas of such defensive metrics and how it would be possible to compute them.

Expected Shot Block:

Where better to start with defensive metrics than with the clearest act of denying a goal, the shot block. This concept is in direct competition to and is inspired by the concept of Expected Goals.

For Expected Goals, each shot is given a probability of being a goal based on historical shots of a similar type. For example, a shot that is taken with the head from a cross may be given 0.1 xG whereas a shot from a counter attack inside the 6-yard box may be given 0.5 xG. It suggests that the shot from a counter is 5 times more likely to go in than the headed effort. These may not be realistic numbers but the concept stands. Some shots are more likely to go in than others depending on a number of factors including where on the pitch the shot was taken, what play led to the shot, what the shot is taken with and game state of the match.

The concept of the Expected Shot Block would be to calculate the probability that the shot is blocked by a defender. This would require more information than just the event data of the shot, it would require the knowledge of the presence of a defender and how likely it is that a defender makes a block in a similar situation based on historically similar shots. The time this decision is made would be at point of contact of the shot. Based on shots from the past, you can categorise them into similar categories as that of Expected Goals but with the added factor of the presence of a defender or defenders between the ball location and the goal. The ball location and the two posts of the goal create a triangle and if there are any defenders in this area then the shot would be identified as having the potential to be blocked. The presence of the goalkeeper is expected and since we are looking at shots being blocked not saved then the goalkeeper’s location can be acknowledged but not required for calculation. The location of the goalkeeper may alter the shot direction of the attacker so may affect shot blocking numbers.

Expected Shot Block - FM

Furthering the concept of an Expected Shot Block would be to calculate the percentage of the goal that is open to the shot at the point of contact. When identifying if a shot has the potential to be blocked, you can calculate the percentage of the goal that is available to be shot at where the defenders wouldn’t make a block. This calculation could be done in either 2D or 3D. You can assume an average area for the defender’s body and block out the area of the goal that the defender is in front of. In 2D this would be less accurate than in 3D since it would be assumed that the defender can block a shot of any height. Whereas in 3D, you could create silhouettes of the defender’s limbs and create a more accurate percentage that way.

There are some problems with this concept but I think it has potential. You need more than just event information, you also need player location data which is harder to get. It also assumes that the shot is a direct hit straight at goal, whereas many players attempt to bend and curl the ball around defenders. Just because a defender is in the way of the goal, doesn’t guarantee a blocked shot in that location. There are many times where a shot goes through a defender’s legs or just past a limb, defenders and players aren’t perfect.

It’s hard to quantify defensive actions and shot blocking seemed to have the most relevant as it’s related to the current set of offensive metrics, it’s not perfect. If anyone has any thoughts or comments regarding other issues I may have missed, please do let me know!


#4 Team Analysis: AS Monaco – Realistic Expectations


In this piece I will take a look at AS Monaco’s Ligue 1 performance to see why they are sitting in 19th after 13 games. Looking at their recent transfers, I’ll investigate what their expectations would have been compared to what their expectations should be.

This is a team that only two years ago won the Ligue 1 title, holding top spot for most of the season, and was a team full of emerging young talent. Players such as Kylian Mbappe, Bernado Silva, Thomas Lemar, Benjamin Mendy, Tiemoue Bakayoko and Fabinho all contributed greatly alongside veterans such as Radamel Falcao and Joao Moutinho. This season the only player still at the club is still Falcao, Monaco were a club that thrived off the talent of these rising stars and cashed in. Larger European clubs such as Paris Saint Germain, Manchester City, Atletico Madrid, Chelsea and Liverpool all came in and stripped Monaco of their title winning team.

Since the system worked so well previously, it’s not hard to see why they have tried to replicate their success and have looked to reinvest in a new crop of youngsters to compliment the likes of Falcao once again. They have brought in the likes of Youri Tielemans, Pietro Pellegri, Willem Geubells, Benjamin Henrichs and Aleksandr Golovin who are all under 22, with Pelegri and Geubells both 16 at the time they were brought in. The concept which they are trying to repeat is to build the team around giving these promising youngsters lots of playing time, hoping to accelerate their development and mature early, therefore prolonging their careers at the highest level.

This transition couldn’t happen overnight, and has taken two years for these recruitment changes to occur. Last year Monaco managed a 2nd place finish behind a resurgent PSG team, which is respectable considering they loaned Paris their best asset in Mbappe for the year and lost Bernado Silva and Mendy to Manchester City. That’s a lot of attacking threat to lose, however they managed to keep hold of Lemar, Fabinho and Moutinho. Fabinho and Moutinho are two competent central midfielders who can take control of any given game, allowing Monaco the foundation to let their forwards do their thing. It could be suggested that losing Fabinho to Liverpool and Moutinho to Wolves in the summer before this season are the losses that were hardest to replace. Fabinho has proved himself worthy of a spot in a Jurgen Klopp midfield three which is saying something and Moutinho is part of the Portuguese midfield duo at a Wolves team proving themselves already a competent Premier League team.

Out of those youngsters brought in over the last two years, only Youri Tielemans has the suggested promise to be able to replace either. However, Tielemans has been playing in a more offensive midfield role previously at Anderlecht and Belgium, relying on a young player who’s still getting used to controlling a game from deep may not be the best idea.

That moves us on to Monaco’s current crisis, they sit 19th in Ligue 1 after 13 games and just been thrashed 4-0 for the second time by PSG [PSG have won their opening 13 games and sit 13 points clear]. After 9 games, Leonardo Jardim, who was in charge of their title winning season, mutually agreed with the club to leave and has been replaced by Thierry Henry in his first managerial role.

Monaco v PSG 11Nov18

A team that has finished 1st and 2nd in the previous two seasons shouldn’t be anywhere near the bottom of the league at this point of the season. They have underperformed their xG and xGA across the 13 games, so they haven’t scored as many as they should and have conceded more than they should have. Though not by a huge margin, they have 12 goals from 16.31 xG and conceded 22 from 17.47 xGA. Regressing to the mean, we can expect Monaco to perform better than current standings suggest, but that is nowhere near challenging for the title. Based on previous seasons, expectations would be to dominate most games by creating lots of chances and giving away few. Their goal difference of -10 and xGD of -1.16 shows a difference in expectation versus reality but compared to 2nd place Lille’s xGD of +7.28 [PSG’s xGD = 23.66] shows how far away from pre-season expectations they are.

Except, there doesn’t seem to be anything clearly wrong. Their defence is as leaky as suggested, conceding 1.69 goals per game. They aren’t creating enough good chances to score the goals to win games, scoring 0.92 goal per game. They have had 150 shots, creating 15 big chances but conceded 154 shots and 19 big chances so far. Most worrying is that they aren’t controlling games, they aren’t putting the opposition under pressure and they seem to be playing in matches on a level playing field with many of the teams in the league. So far this season they are performing like an unlucky mid table side, nowhere near their expectations of European qualification.

When I say there doesn’t seem to be anything clearly wrong, I mean that there doesn’t seem to be anything immediately fixable wrong. It’s not the case that they can change just one thing and go back to being the title challenging side they used to be. That’s because they are literally a completely different team to that one, even if the expectation hasn’t changed, the players definitely have and they are not as good as those who left. Unfortunately, it seems as though Monaco’s attempts to recruit a new group of young title winners, or at least challengers haven’t worked so far. Which isn’t surprising. It will take time for the players to get used to playing in a top 5 European league and playing with each other and handling the expectations all at once. They aren’t suddenly a bad team, just not what they were last year and the expectations surrounding the team need to reflect that. It is also worth noting that they have had some serious injury concerns which has forced them to play maybe more youngsters than planned.

*credit to Understat and @Statszone for the numbers and figure


#25 – Why Liverpool’s Expected Goal Conundrum Makes Sense

As the new 2020/21 Premier League season is about to get underway, a big question is whether Liverpool can replicate their utter dominance. They’ve won the league by an unprecedented margin, and it never really looked like anything else was going to happen. Not even Pep Guardiola’s Manchester City teams that have just achieved 100 and 98 points in the previous two years could come close, they hadn’t been able to maintain that pace for a third season. Liverpool have just had consecutive 97 and 95 point seasons, hoping to replicate that form for the upcoming third season.

It had been noted that despite winning the league in record time, Liverpool’s expected goals and goal difference was still not as good as Manchester City’s. This suggests the idea that Manchester City were in fact the better team over the course of the season and that Liverpool have been merely lucky to win the league by such a margin.

Using shot distance, shot times and expected goals from www.fbref.com, I’ve approximate expected goals per shot for both Liverpool and Manchester City’s 2019/20 Premier League season. The total expected goal totals across each match have been proportioned out using shot distance to approximate expected goals per shot. Per shot information allows expected goals and minutes aggregation by gamestate.

Gamestate is an important factor in contextualising football matches. Stronger teams usually spend more time winning a game than weaker teams. Teams that are winning are no longer under obligation to push forward as much, with the losing team responsible for trying to get back into the game.

Across both the 97 and 95 point Liverpool seasons in 2018/19 and 2019/20, Liverpool achieved the majority of their expected goals in winning gamestates. Notably in the recent 2019/20 season,  they actually achieved more expected goals winning by a single goal than drawing.

Whereas for Manchester City, there is a clear distinction between their 98 point 2018/19 season and the recent 2019/20 season. They have earned a much larger proportion of their expected goals at a neutral gamestate than their title winning season, they are clearly creating the chances but perhaps have been wasteful. It’s also noticeable that they create lots of their expected goals when they’re already 3+ ahead and lots more when losing than the previous year.

Now lets take a look at how long each team has spent in each gamestate across the season.

Much like the expected goals charts, the proportion of minutes played between each club suggests a clear difference in approach that each team needed to adopt. Liverpool have spent a much larger proportion of their time winning by 1 goal, whilst Manchester City spent more time Losing and winning by 3+.

They both spent a similar proportion of time at neutral gamestates, though Manchester City’s expected goals at this gamestate were much higher. This suggests more chances and shots were required to go ahead in the game, Liverpool went ahead more efficiently and spent more time ahead at +1.

As mentioned earlier, your approach can change once you are winning. You no longer need to force anything or take risks, the responsibility to equalise or reduce the deficit is on the opponents so they need to take risks. Playing with no risks allows for a higher floor in performance, no doubt being in winning positions so much helped Liverpool maintain their momentum throughout the season. When you aren’t winning, you are required to create chances and shoot more which in turn helps build up your expected goals numbers.

Manchester City built up a lot of expected goals whilst at a neutral gamestate, when they were losing and when they were 3+ ahead. At neutral gamestates, these are the goals that convert into points most easily, and Liverpool were more efficient than Manchester City. When losing, you need to create shots (rack up expected goals numbers) to get back in the games, but you’re only losing because you didn’t score the first goal. When you’re 3+ ahead, these shots and expected goals likely won’t change the points returns of the match. Manchester City spent lots of time either needing to score goals in losing or neutral gamestates or absolutely crushing teams, and little in between, which perhaps explains their ridiculous expected goals numbers.

Liverpool spend little time and expected goals to get from neutral to +1 gamestates, meaning they could spend reduced time with responsibility to take more risks. They spent little time losing and lots of time ahead +1, with little time spent at 3+. They get ahead early and then not much else happened in the game, pretty good strategy to win. They’re deserving champions and perhaps explains why their expected goals aren’t as bonkers as Manchester City’s.


#24 Premier League Points History

It’s finally nearing end of this current season, so I wanted to have a look at past league points totals to get some context for how this season is shaping up.

To do so, I’ve taken data from www.fbref.com for league tables for last 24 years of 38 game seasons in the Premier League. A jupyter notebook used to get the league tables and create the plots is in my GitHub here: https://github.com/ciaran-grant/premier_league_points

This post looks to investigate the following three questions:

  • How do points totals recently compare to early Premier League seasons?
  • What has happened to the gap between the champions and relegation survivors?
  • How many points do you tend to need to qualify for Europe recently?

History of Premier League Points

Looks like the top 6 have been getting more points recently at the expense of the bottom half. There are only a set number of points available across all teams, the more points the big boys get the less available for the lower teams.

Does this mean there is an increasing gap between the top teams and the rest? We’ve gone through several incarnations with a top 4, then top 6 and now arguably a top 2 with Liverpool and Manchester City.

Champions and Relegation Survival

Relegation survival has been calculated as one more point than the points achieved by 18th place for simplicity of not getting too picky about goal difference.

Only 4 times out of the 24 seasons has a team required 40 points to survive, whilst it seems around 35 will usually be enough to be safe. Of course you want to aim for more points, but this seems to bust the myth of 40 points required for survival, often less is sufficient.

The points required to become champions has increased, meaning the gap between champions and relegation survival has increased in recent years. 2016/17, 17/18, 18/19 have been 3 of the highest points totals ever, only Chelsea in 04/05 with 95 in between here.

Could just be recency bias, but Liverpool were on track for 100+ this year. They’re likely to actually get 95+ and it’s expected both Liverpool/City to get close to 90+ again next year. They’re making 90 points seem normal and is becoming the minimum to win now.

How about qualifying for Europe?

European Qualification

It’s not only the Champions that has seen a point inflation, and in line with the whole top 6 sweeping up more points looks like there’s more points required to qualify for both European competitions as well.

The last decade has seen a higher average points requirement for getting both Champions League and Europa League qualifications, with 70 points for Champions League and 60 points for Europa League.

Current Season Context

Champions Liverpool are on 93 points with 2 games left, and likely they will break 95. This will make the last 3 seasons the highest 3 totals ever by a Premier League Champion. They could still win the league this year with less points than last year (97) and be in the top 4 highest points scoring teams ever.

Champions League qualification looks like coming up short of the 70 points mark. Chelsea, Leicester and Manchester United all comfortably in Europe sitting above 60 points but all to play for with 2 games left. Will be a low bar for both European competitions this year.

As for relegation survival, Bournemouth and Aston Villa both sit on 31 points with 2 games left. If things stay as they are, 32 points required for survival is the lowest since 09/10 where 31 points were needed. That year 40 points would’ve been good enough to finish 14th, this year 40 points looks good for 15th.

Final Thoughts

This season seems to be getting stretched by how ridiculously good Liverpool have been for 90% of the season, subconsciously or otherwise taking their foot off the gas once the title was wrapped up. Apart from prime Messi/Ronaldo Barcelona and Real Madrid teams, I wouldn’t expect many teams even get above 90 points, let alone dream of nearing 100.

Are these points totals also consistent across other 38 game seasons in France, Italy and Spain? How does this change when you consider Germany’s 34 game season or the 46 game seasons in the Football League?


#22 Friends of Tracking Challenge – Appendix

Goals Overview – Tab 1

Designed to get a general overview of what the opposition team does well. This could be another subset of patterns of play that an opponent takes rather than just goals such as shots in the box, passes into the final 3rd, etc.

This tab shows the events and tracking data for the respective frames for all the Liverpool goals provided by Last Row.

Events – Pitch Value [Tab 2]

Designed to add context to the view of events that take place in previous tab. This will show what options were available to the player on the ball and what options the defence has covered. It can start discussions about player positioning off the ball and decisions made between events.

Pitch Value is created by adding the context of relevance (PT) and scoring opportunity (PS) to Pitch Control (PPCF) as outlined by @the_spearman.

pitch_value_model.py, generate_pitch_value, line 322

Pitch Control is also the implementation from @EightyFivePoint’s tutorials. This computes the probability that each team will control the ball in each position on the pitch, subject to interceptions, time to control the ball and player velocities.

pitch_value_model.py, lastrow_generate_pitch_control_for_event, line 259

Where relevance is computed as a normally distributed probability constrained by ball travel time and Pitch Control. Mean of 14m as per @the_spearman’s Beyond Expexted Goals, 2018

pitch_value_model.py, generate_relevance_at_event, line 270

Scoring opportunity is calculated as a normally distributed probability subject to the distance to goal.

pitch_value_model.py, generate_scoring_opportunity, line 298

Player Displacement – Pitch Value [Tab 3]

Once an area for improvement has been established, this tab will provide the opportunity to manually adjust a player’s position and get an updated view of Pitch Value. This can help to understand where players should be positioned at each event, with the consequences laid out. Reducing the Relative Pitch Control will help to prevent the opposition from scoring. Reducing their number of options will make them more predictable.

Pitch Value is as calculated from Tab 2.

Relative Pitch Value is calculated as the Pitch Value of each area divided by the Pitch Value at the current ball location for the event and frame. This shows areas which moving to will increase your Pitch Control, potentially providing suggestions for on/off ball decision making.

pitch_value_model.py, generate_relative_pitch_value, line 365

Hopefully this allows anyone who uses the app to understand how the values are computed without having to trawl through my messy code structure on GitHub.

Any further questions please do get in touch at @Ciaran_Grant or @TLMAnalytics

Lastly thanks again to all those contributing at Friends of Tracking for providing all the great content that they have been putting out. It really helps and inspires those of us who have been looking in from the outside!

@Soccermatics / @EightyFivePoint / @the_spearman / @JaviOnData

Beyond Expected Goals, Spearman W. – http://www.sloansportsconference.com/wp-content/uploads/2018/02/2002.pdf

#21 A View of the Suspended 2019/20 Season by Shots

Here’s a quick look back to see where the suspended seasons were left. I’m going to take a look at how efficient teams have been at converting shots into shots on target whilst simultaneously limiting shots on target against. How many shots a team takes is the most basic indicator of attacking output. You don’t shoot, you don’t score after all. Shots on target goes a step further to add a qualitative element to the shots. You don’t shoot on target, you don’t score.

If shots are indicative of how likely a team is to score and shots on target are even better, then we can also turn that around defensively. To win a game you need to score more goals than the opponent, which means that as well as trying to score yourself, you need to prevent the opposition from scoring. Reducing the shots conceded likely reduces the chances to concede a goal, and reducing the shots on target conceded does even better.

During a game, teams will try to maximise their own chances of scoring and reduce the opposition’s chances of scoring at the same time. A measure that captures both aspects of this sufficiently is known as the total shots ratio, with a respective total shots on target ratio as well. For a specific game, the total shot ratio is calculated for each respective team below:

Total Shot Ratio = Shots by Team / Total Shots of both Teams

Since every shot one team takes is a shot conceded by the other team, the sum of ratios for each team will always be 1. This also means that a total shots conceded ratio can be calculated, which turns out to be equal to the total shot ratio for the opposition in a single game.

This measure considers the proportion of shots you take against the total shots in a game. This means that if both teams have lots of shots, then the match is more likely to be equal. Whereas if one team takes lots of shots AND stops the opponent from taking lots of shots then that must be better, which is reflected here.

I’ve taken the shot and shots on target information for each match so far in each league to calculate the total shot ratio and total shots on target ratio for each match. Although not every team has played each other just yet, an average of these per game ratios was easiest to represent how each team has performed so far. Teams with high total shots on target ratios are likely to be the stronger teams in the league. Teams with higher total shots on target ratios than total shots ratios appear to be more efficient in terms of creating shots on target for themselves and limiting their opponents to just shots. Whilst teams with lower total shots on target ratios than total shots ratios seem to adhere to quantity over quality when it comes to shot selection or are susceptible to concede lots of shot on target.

Below are the results for each league, they seem to be pretty good approximations for the current league tables and manages to potentially group teams into tiers.

La Liga

La Liga 2019/20 – Total Shots Ratios and Total Shots on Target Ratios
  • Barcelona and Valencia are both getting a much higher proportion of shots on target in their matches, suggests that they perhaps are hesitant to shoot and rather manufacture better chances. Or they are great at limiting their opponents to settling for off target shots
  • Sevilla, Eibar and Espanol are at the opposite end of the spectrum, pretty inefficient at both ends
  • Real Sociedad are deserving of their top 4 place and Bilbao arguably should be better off than their mid table place suggests

Serie A

Serie A 2019/20 – Total Shots Ratios and Total Shots on Target Ratios
  • Yet another reason why Atalanta are so good this year, they create shots on target at a higher rate than their opponents more than anyone else in the league
  • 2nd down to 7th consists of the remaining European challengers and Sampdoria, who actually sit 16th! Potentially unlucky to be that far down based on shot counts

Premier League

Premier League 2019/20 – Total Shots Ratios and Total Shots on Target Ratios
  • Man City and Chelsea lead the way but are both pretty inefficient considering their shot dominance.
  • Liverpool have clearly been the best team in the league and are 3rd here, with a suggestion they have been one of the more efficient teams. This goes to show that game state can have an impact on shot counts.
  • Doesn’t look so good for Tottenham/Arsenal who are actually below 0.5 for Total Shots on Target Ratios, either by chance or design they both aren’t getting shots on target as much as top half sides expect let alone Champions League teams.

Ligue 1

Ligue 1 2019/20 – Total Shots Ratios and Total Shots on Target Ratios
  • Lille are way up there almost with Paris which bodes well for them!
  • Lyon look to be an efficient mid table team, which goes with their below par season so far. Expected them up with Marseille/Lille at least.


Bundesliga 2019/20 – Total Shots Ratios and Total Shots on Target Ratios
  • Top 4 looks as expected, with Gladbach’s incredibly efficient shot to shots on target helping them keep pace.
  • The bottom half seems to be pretty inefficient, Hertha don’t appear to like shooting on target that much..

All data from: http://www.football-data.co.uk


#20 Age x Minutes Played in Top 5 European Leagues

Considering the suspensions of all major leagues, I thought it would be a good chance to catch up on how each one. Specifically here I’m taking a look at the distribution of minutes played by players in each team by the age of those players. The inspiration for this came from Real Sociedad and seeing that the core of their team has been a bunch of early twenty year olds and they currently sit in 4th should the leagues end now. This did seem unusual, but to compare with other teams this sparked a project that I have come across much more time to complete.

The data for this analysis has come from transfermarkt.com. Using the helpful tutorials on FC Python, specifically https://fcpython.com/scraping/introduction-scraping-data-transfermarkt, means all of the data you see is possible to get into a much cleaner, easier to use table or dataframe.

Minutes played is in all competitions during the 2019/2020 season up to the latest round of games before league suspensions.

I’ve taken a look at the top 5 European leagues and noted down some interesting, some weird and some funny things that came up.

La Liga

  • Real Sociedad

The initial inspiration for this project, so it’s nice to confirm the intuition that allocating lots of minutes to a younger group of players isn’t the norm. They have done well this year, I expect this team to have players poached sooner rather than later.

  • Real Madrid

What struck me with this was the gap between the core 7/8 players in their prime ages and the rest of the squad. They are always in ‘win now’ mode so it’s hard to ease in any youngsters, especially with the expectations of the last decade. But they’re going to need to start to trust a few more of the younger players they actually have an abundance of.

  • Atletico Madrid

In my head, Atletico’s team is full of 30+ year olds. They are all just passed their prime and have all the experience and tactical nouse a single team could contain. Then I see that the majority of their team is late 20s, actually just hitting their prime. They’ve got some kind of weird, Simeone style conveyor belt going on behind the scenes there.

Ligue 1

  • Lyon/Lille

Another case of lots of minutes played by young players, not too surprising that it’s Lyon and Lille. However worth pointing out again because this is still very much out of the ordinary and some teams seem to be able to consistently do this.

  • Rennes

This outlier up in the top left corner is Eduardo Camavinga, born on November 2002. He’s still 17. He’s on course to play 3000 minutes this year at centre midfield. He’s pretty good.

  • Montpellier

This other outlier in the top right is Vitorino Hilton, born in September 1977. He’s 42. He’s on course to play 3000 minutes this year at centre back. I had never heard of him before. He’s 42 and playing 3000 minutes in Ligue 1. I have gained much respect for him.


  • Gladback/BVB

Obligatory lots of minutes for young players alert here. No surprise from Dortmund, but Gladbach are just a notch or two below in terms of the quality and quantity of minutes they’re getting from younger players this year.

  • Liepzig

RB Leipzig should also be in the lots of minutes for young players, but I thought they deserved a separate mention just because they ONLY play young players. There might be a virtual barrier around the training ground which doesn’t let you in after your 30th birthday.

  • Paderborn

Paderborn are currently bottom of the Bundesliga and look to have generally spread the minutes around, maybe trying different players or tactics to find something that works. I haven’t watched them. But interesting that the player with most minutes played is 22 year old Sebastian Vasiliadis in centre midfield. Is he the player the coaches trust the most? Or have they had injury troubles to other key players? He could be staying in the Bundesliga next year.

Serie A

  • Juventus

If there was ever a ‘win now’ team’s age profile, it would be this one. High proportion of minutes given to age 30+ players means that surely a rebuild is coming soon. De Ligt/Demiral are probably the start of that. (Note Buffon in the bottom right skewing the x-axis for every other team.)

  • AC Milan

The only really negatively correlated distribution I’ve seen, lots of minutes for young and not so many minutes for old players. I’d heard that Milan had made a decision to go in a different direction than overpaying just-past-their-prime players on long contracts, seems like they’re at least giving younger guys a go.

  • Atalanta

Special mention here goes to the island of players in their prime that Atalanta seem to have collected together. Hopefully they have a Gasparini type conveyor belt ready to go.

Premier League

  • Dwight Mcneil

This is probably my favourite distribution of all. And yet there is nothing actually surprising about what there is to see. Burnley are a team of all old/in their prime players, and then there’s Dwight McNeil. Come on Sean, give him someone from his own generation to talk to at least.

  • Wolves, Sheffield Utd don’t rotate

There’s definitely something to be said here about team understanding and having consistent line-ups. All these teams have a group of players playing the majority of their minutes, all these teams arguably perform above expectations. This may be confirmation bias as I haven’t checked the cases where core teams underperform. Wolves have played more games than the other teams in Europe, and still don’t rotate. Looking at you Conor Coady at 4000+ minutes already.

  • Aston Villa

The obligatory team with lots of minutes for young players is Aston Villa? They’re not especially seeing the success that other synonymous teams around Europe are having, probably due to this being mostly debut seasons all at once in a relegation fight.

That about wraps it up, if there are any interesting things you have spotted then give me a shout! Once again, this wouldn’t have been possible without starting off with getting the data from transfermarkt so huge props to FCPython and their website which does a great job. Check them out at @FC_Python.


#18: Space & Structure: Attempting to quantify (Pt. 2)


Following on from the previous post where I tried to talk through why space and structure matters in football, this post here will try (emphasis on try) to quantify the concepts. The inspiration for this method of quantifying structure has come from looking at event data and trying to combine that with a notion of structure for each event.

When picturing each event, it’s at a point in time of a match with the known locations of each player. They’re snapshots in time with a load of information about how each team is organised for this specific event. These can be plot individually on a pitch, with a visual representation for each event to provide context. If you do plot each event individually, as you’d expect, each plot will look different as players move around the pitch. This difference is what I would like to quantify. Each event has a bunch of player locations in relation to the pitch and ball locations, is there a way to quantify this specific set of locations for an event? If so, can we see how that quantity changes when moving on to the next event?

If a team is defending and they have a certain quantity attached to their structure during a specific event, when the offending team does something which changes the defensive team’s structure massively and causes a goal scoring chance then I’d expect this quantity to have changed significantly to reflect the change in scenario on the pitch.

In the brief research I’ve done trying to find something that fits this criteria, it seemed sensible to take a look at network graphs. They have nodes which can represent players and edges which can be weighted depending on distances between the players at each event. Networks have been used in football in the past with reference to passing metrics and average positions to quantify and qualify different styles of play. They’re usually aggregated views of whole matches, whereas this idea is to sort of create a time series of individual sequential events each represented by networks using distances to represent how a match unfolds. For each match there would be a network representing each event, this allows a distance metric to be calculated between each event to represent how significant or important an event was in changing the structure or immediate flow of the game. Since networks don’t inherently care for locations, to compensate for this I have added in the locations of the centre of the pitch, each corner and each goal to reflect movement around the pitch.

For example, a pass between centre backs intending to maintain possession and cycle the ball may not have such a large difference in structures before and after the pass of either team. Whereas a through ball which plays a forward through on goal and cuts through the defence will have a much larger difference in structures before and after the through ball. This is the type of distinction I hope that will be quantified.

# Libraries ---------------


Enough theory, let’s try and see a simplified extreme example. Since for each event all player locations are necessary and that data is both hard and expensive to come by, I have created an extremely unrealistic event which hopefully shows the concept.

Reminder on how to go about plotting a pitch using the ‘SBpitch’ library.

goaltype = "box"
grass_colour = "#202020"
line_colour =  "#797876"
background_colour = "#202020" 
goal_colour = "#131313"

ymin <- 0 # minimum width
ymax <- 80 # maximum width
xmin <- 0 # minimum length
xmax <- 120 # maximum length

blank_pitch <- create_Pitch(
  goaltype = goaltype,
  grass_colour = grass_colour, 
  line_colour =  line_colour, 
  background_colour = background_colour, 
  goal_colour = goal_colour,
  padding = 0


plot of chunk blank_pitch

Locations on the pitch are static, so will be universal for each event.

# Pitch Locations
centre_circle <- c((xmin+xmax)/2, (ymin+ymax)/2)
bottom_left_corner <- c(xmin, ymin)
top_left_corner <- c(xmin, ymax)
bottom_right_corner <- c(xmax, ymin)
top_right_corner <- c(xmax, ymax)
left_goal <- c(xmin, (ymin+ymax)/2)
right_goal <- c(xmax, (ymin+ymax)/2)

Event 1

The first event is a pass, where both teams are in relatively neutral positions, almost like from a kick-off. The ball is on the half way line.

# Ball Location
ball_xy_1 <- centre_circle

# Player Locations
player_1_position_1 <- c(10,40)
player_2_position_1 <- c(20, 15)
player_3_position_1 <- c(20, 30)
player_4_position_1 <- c(20, 50)
player_5_position_1 <- c(20, 65)
player_6_position_1 <- c(50, 15)
player_7_position_1 <- c(50, 30)
player_8_position_1 <- c(59, 40)
player_9_position_1 <- c(50, 65)
player_10_position_1 <- c(80, 30)
player_11_position_1 <- c(80, 50)

player_12_position_1 <- c(120-10, 80-40)
player_13_position_1 <- c(120-20, 80-15)
player_14_position_1 <- c(120-20, 80-30)
player_15_position_1 <- c(120-20, 80-50)
player_16_position_1 <- c(120-20, 80-65)
player_17_position_1 <- c(120-50, 80-15)
player_18_position_1 <- c(120-50, 80-30)
player_19_position_1 <- c(120-50, 80-50)
player_20_position_1 <- c(120-50, 80-65)
player_21_position_1 <- c(120-80, 80-30)
player_22_position_1 <- c(120-80, 80-50)

# Complete Location DataFrame
event_1_player_locations <- as.data.frame(cbind(
event_1_xy <- as.data.frame(t(event_1_player_locations)) 
names(event_1_xy)[1] <- "location_x"
names(event_1_xy)[2] <- "location_y"
teams <- c("pitch", "pitch", "pitch", "pitch", "pitch", "pitch", "pitch",
           "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
           "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2")
player_size <- c(1, 1, 1, 1, 1, 1, 1, 1.5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
team_colours <- c("cyan", "Red", "White", "Green")

# Plot event 1 on the pitch
blank_pitch +
  geom_point(data = event_1_xy, aes(x=location_x, y = location_y, colour = teams),  size = player_size) +
  scale_colour_manual(values = team_colours) +
  theme(legend.position = "none")

plot of chunk event_1_pitch

There are no immediate goal scoring chances for the team in possession. We can calculate a distance matrix based on the distances between each player, the ball and points on the pitch. This distance matrix will help to create a network which looks entirely unappealing but contains all the information we want.

# Create distance matrix (weighted adjacency matrix)
dist_event_1 <- as.matrix(dist(event_1_xy %>% select(c("location_x", "location_y"))))

# create Network for all
graph_event_1 <- graph_from_adjacency_matrix(dist_event_1,
                                             mode = "undirected",
                                             weighted = TRUE,
                                             diag = FALSE)

Event 2

The second event is immediately after the first event above, a pass has been made to the forward who receives the ball in a much more advanced position. Amazingly, there has been no attempt to block or deny him any space, if anything the defence has moved completely out of the way. This is pretty poor defending with a pretty poor defensive structure after the pass.

# Ball Location
ball_xy_2 <- c(100, 40) # Edge of the right penalty area

# Player Locations
player_1_position_2 <- c(10+10, 40)
player_2_position_2 <- c(20+19, 15)
player_3_position_2 <- c(20+19, 30)
player_4_position_2 <- c(20+19, 50)
player_5_position_2 <- c(20+19, 65)
player_6_position_2 <- c(50+19, 15)
player_7_position_2 <- c(50+19, 30)
player_8_position_2 <- c(50+19, 50)
player_9_position_2 <- c(50+19, 65)
player_10_position_2 <- c(80+19, 30)
player_11_position_2 <- c(80+19, 40)

player_12_position_2 <- c(120-10, 80-40)
player_13_position_2 <- c(120-30, 80-10)
player_14_position_2 <- c(120-20, 80-25)
player_15_position_2 <- c(120-20, 80-55)
player_16_position_2 <- c(120-30, 80-70)
player_17_position_2 <- c(120-50, 80-15)
player_18_position_2 <- c(120-50, 80-30)
player_19_position_2 <- c(120-50, 80-50)
player_20_position_2 <- c(120-50, 80-65)
player_21_position_2 <- c(120-80, 80-30)
player_22_position_2 <- c(120-80, 80-50)

event_2_player_locations <- as.data.frame(cbind(
event_2_xy <- as.data.frame(t(event_2_player_locations))
names(event_2_xy)[1] <- "location_x"
names(event_2_xy)[2] <- "location_y"
teams <- c("pitch", "pitch", "pitch", "pitch", "pitch", "pitch", "pitch",
           "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
           "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2")
player_size <- c(1, 1, 1, 1, 1, 1, 1, 1.5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
team_colours <- c("cyan", "Red", "White", "Green")

# Plot event 2 on the pitch
blank_pitch +
  geom_point(data = event_2_xy, aes(x=location_x, y = location_y, colour = teams),  size = player_size) +
  scale_colour_manual(values = team_colours) +
  theme(legend.position = "none")

plot of chunk event_2_pitch

Again we can create a distance matrix which helps create an unappealing network representation of the each player’s location in relation to each other , the ball and points on the pitch.

# Create distance matrix (weighted adjacency matrix)
dist_event_2 <- as.matrix(dist(event_2_xy %>% select(c("location_x", "location_y"))))

# Create Network for all
graph_event_2 <- graph_from_adjacency_matrix(dist_event_2,
                                             mode = "undirected",
                                             weighted = TRUE,
                                             diag = FALSE)

Computing Network Differences

We have two events, one before a pass is made with the defence in a seemingly comfortable position and the second after a brilliantly executed pass that caused the defence to immediately abort mission. There is a clear difference in the players locations and scenario that can be seen from the pitch view, hopefully comparing the networks can reflect this intuitive difference.

The package NetworkDistance has several metrics which aim to compute how difference several networks are form each other using their adjacency matrices. The distance matrices that we have calculated are weighted adjacency matrices and should be appropriate candidates. It’s worth noting the obvious, that these are fully connected, undirected and weighted networks so not all measures will be appropriate. I am no expert in quantifying networks, however some research into the measures led to the intuition that the best candidates for representing the differences seen on the pitch won’t involve counting or replacing edges or looking at node specific measures. This is because each network will be fully connected, the only difference is the weighting’s between each node and we are interested in overall differences not specifically individual nodes jut yet, there may be some merit to looking at those.

The types of distance measures that seem to fit these criteria are (weighted) spectral distances, which compare the distance between the Eigenvectors of each distance matrix of the networks or the diffusion distance which is based off heat diffusion and has an element of time involved.

The measures for each is below, though I haven’t got an intuition for the scale I’m expecting these numbers to be as there’s no real situation for comparison. They are split into the pairwise distances between each network and a matrix representing the respective spectra. In this case there are only two networks for two events and so a single pairwise distance.

# Calculate distance between events (can do all events at once)
events_list <- list(

# Centrality
# event_centrality_close <- nd.centrality(events_list, out.dist = TRUE, mode = "Close", directed = FALSE)
# event_centrality_degree <- nd.centrality(events_list, out.dist = TRUE, mode = "Degree", directed = FALSE)
# event_centrality_btness <- nd.centrality(events_list, out.dist = TRUE, mode = "Between", directed = FALSE)

# L2 Distance of Continous SPectral Densities
event_csd <- nd.csd(events_list, out.dist = TRUE, bandwidth = 1)

# Discrete Spectral Distance
event_dsd_adj <- nd.dsd(events_list, out.dist = TRUE, type = "Adj")
event_dsd_lap <- nd.dsd(events_list, out.dist = TRUE, type = "Lap")
event_dsd_slap <- nd.dsd(events_list, out.dist = TRUE, type = "SLap")
event_dsd_nlap <- nd.dsd(events_list, out.dist = TRUE, type = "NLap")

# Edge Difference Distance
# event_edd <- nd.edd(events_list, out.dist = TRUE)

# Extremal Distance with top-k Eigenvalues
# event_extremal <- nd.extremal(events_list, out.dist = TRUE, k = ceiling(nrow(events_list)/5))

# Graph Diffusion Distance
event_gdd <- nd.gdd(events_list, out.dist = TRUE, vect = seq(from = 0.1, to = 1, length.out = 10))

# Hamming Distance
# event_hamming <- nd.hamming(events_list, out.dist = TRUE)

# HIM Distance
# event_him <- nd.him(events_list, out.dist = TRUE, xi = 1, ntest = 10)

# Network Flow Distance
# event_nfd <- nd.nfd(events_list, order = 0, out.dist = TRUE, vect = seq(from = 0, to = 10, length.out = 1000))

# Distance with Weighted Spectral Distribution
event_wsd <- nd.wsd(events_list, out.dist = TRUE, K = 50, wN = 4)

# Combine Relevant Distances
events_distance <- data.frame(cbind(


Still to come

There are of course some factors that haven’t been considered here such as different areas of the pitch being more important than others. Future ideas would ideally include a weighting that is applied depending on where the ball is located on the pitch. A further improvement to this would be to separate each team for the same event and evaluate their network differences separately alongside the overall difference.

This concept is still a work in progress, but thought it was worth thowing the idea out there. Any questions, suggestions or errors please give me a shout!


#17 Space & Structure: Why it matters (Pt. 1)

I always like the idea that Lionel Messi is such a good player that he creates more space by standing still than other players do running around. Space is a premium on the football pitch as it is, so being able to manufacture more space is a great skill to have. I should clarify, space in important areas of the pitch is a premium. There is actually lots of space on the pitch, it’s just that the majority of it isn’t contested as is considered unimportant. Exactly where these important areas are is up for debate, but generally accepted is that the areas close to each goal are more important than others. Being able to manufacture space in these important areas of the pitch is a great skill to have, and one that ultimately determines how well a player and team will perform.

This is the first part of what will likely be a few entries on my thoughts about Space and Structure’s importance in football.

Football is a simple game; the opposition players make it complicated. Putting the small round thing in the large rectangular white thing is a pretty simple concept, and is easy to do when there are no other players trying to stop you. For example, it’s much easier to score an open goal than it is to score in the resulting play from your own goal kick. Aside from the distance, the main obstruction is other players actively preventing your progress up the field towards their goal.

Denying the opposition comfortable possession in important areas of the pitch is a desirable feature of a well working defensive system, whilst gaining comfortable possession in important areas of the pitch is desirable when attacking. Whole styles of play are designed and implemented with both of these dimensions in mind.

  • Guardiola’s Barcelona used quick, short passing to probe and pull a defence out of shape before exploiting the available space to create high probability goal scoring chances.
  • Simeone’s Atlético Madrid employ a disciplined low block where defenders deny space and cover each other, whilst not over extending.

These are two opposite extremes of the same coin, both systems have an established structure which maintain control over the space in important areas of the pitch.

Guardiola’s Barcelona worked so well for many reasons, but partly because they were able to consistently pull defenders out of position and distort the structure that the defending team had employed. Once the structure was out of shape, it became much easier to manufacture space where they wanted and create goal scoring opportunities.

Simeone’s Atlético Madrid consistently denied goal scoring opportunities partly due to their individual and structural discipline. They maintain their structure and ensure contingency plans are available, such as double teams and covering defenders, and effective when required.

It’s clear that managing and controlling space in important areas of the pitch is crucial both with and without the ball. The best attacking teams seem to make defensive efforts obsolete, whilst the best attacking teams make defending look so simple. Quantifying contributions to attacking play is a more well established due to individual measurements such as shots, goals, assists and now also ‘Expected’ metrics. This is since scoring goals has been considered an individual achievement and readily quantified by how many goals a player scores themselves. Defensive contributions are harder to quantify and much more nuanced. There are individual metrics such as blocks, tackles and interceptions, however these don’t always correlate with reduced goals conceded. Weaker teams are put in positions to perform these actions more often than stronger teams, but they also concede more goals.

As individual contributions are less important defensively, it seems more reasonable to seek to quantify defensive efforts using team-oriented measurements. In different phases of the game, teams will adopt a shape to their team which reflects both what they want to protect and how they want to protect it. For example, a team may retreat and allow the opposition to carry the ball out of their half but as soon as they enter their own half, they will immediately apply pressure. This suggests that they see the important areas of the pitch in their own half and want to protect this area. Whilst a team may adopt a high press and immediately press the opposition in their half, with the aim of winning the ball back and countering nearer to the opposition’s goal. This approach suggests that their important areas of the pitch cover a much larger area of the field, with a lower emphasis on their own half. These examples are specific to a point in time and will evolve as the ball moves around the pitch, constant re-evaluation of important areas and reactions between teams are the decisions that individual players must make throughout a match.

Without the ball, a team will want to ensure that their intended structure is maintained. With the ball, a team will try to break the defensive team’s structure. The assumption here is that it’s much harder to create goal scoring opportunities when attacking a defence’s intended structure than once you’ve forced them out of their comfort zone. Great attacking teams can force defences out of their structure more easily, usually by precise ball movement or individual skill moving the ball past opponents. Great defensive teams avoid being disrupted from their structure, usually by being comfortable in a wide array of structures and so effectively always in an adequate defensive shape or forcing the attacking team to play by their rules.

It seems obvious that the team which controls the space on the pitch will control the game. When watching football matches, we can get an intuition about how each team sets up and how that affects the flow of a game, however it’s hard to quantify that intuition. It’s hard to determine exactly what make Barcelona and Atletico Madrid’s use of structure and space so good as they appear to have completely opposite styles. In the proceeding parts to my thoughts on structure and space, I’ll take a look at potential ways to quantify or measure their use of space and structure.