16: StatsBomb – Messi Ball Receipt Locations

Introduction

Over the last few moths, Statsbomb have released all of the event data for matches including Lionel Messi’s La Liga matches. Data this detailed and clean is incredibly hard (expensive) to come by, so to give free access to everyone is amazing and much appreciated!

I’ve only just had a chance to take a look at the data, and seen many great pieces put out already. Considering who the data is of, I don’t think it will ever go out of fashion so it’s never too late to start playing around.

You can get access to the data and there’s a very helpful getting started guide here:
https://statsbomb.com/resource-centre/
https://statsbomb.com/2019/07/messi-data-release-part-1-working-with-statsbomb-data-in-r/

** You do need to have the latest version of R and the StatsbombR package installed

There are almost too many things to look at in this data set, so I’ve decided to try to focus on a specific part of Messi’s game and see what I find.

Messi gets the ball, a lot. And he obviously does great things with it once he’s got it, but taking a look at where/how he manages to get the ball would be interesting. Surely the one thing an opposing team would try to do against him would be to try to stop him getting the ball or at least limit him receiving the ball in dangerous areas. That’s the inspiration for taking a look at where he receives the ball on the pitch.

Data Prep

Load all the necessary libraries. The usual suspects for R data manipulation like plyr/tidyverse/magrittr, plotting graphs and pitches with FC_RStats’ SBpitch/ggplot2/cowplot and access to the data in StatsBombR.

# Libraries ---------------
library(plyr)
library(StatsBombR)
library("SBpitch")
library("ggplot2")
library(tidyverse)
library(magrittr)
library(cowplot)

We are only looking at La Liga matches, so let’s only load matches from that competition. There is even a cleaning function ‘allclean’ which adds in some extra columns which will be of use such as x/y locations. We have joined on the season names also as they’re much more intuitive than the season id that has been assigned.

There are events from matches from Messi’s debut season 2004/05 through to 2015/16, consisting of events such as shots, passes and even nutmegs. We’re interested in passes received by the man himself. Note there is also an indicator “ball_receipt.outcome.name” that identifies when a pass is missed, we want to exclude these and only look at passes to Messi that he received (NA values).

Plotting a pitch

To get some perspective relative to an actual football pitch and use StatsBomb’s event location data, FC_RStats has created a function “create_Pitch” which does exactly that. Using ggplot2 and a set of pitch type parameters, it’s easy to plot a pitch with the same proportions as the event data collected by StatsBomb.

This pitch can be used as the base to visualise all events by plotting the x/y locations.

goaltype = "box"
grass_colour = "#202020"
line_colour =  "#797876"
background_colour = "#202020" 
goal_colour = "#131313"

ymin <- 0 # minimum width
ymax <- 80 # maximum width
xmin <- 0 # minimum length
xmax <- 120 # maximum length

blank_pitch <- create_Pitch(
  goaltype = goaltype,
  grass_colour = grass_colour, 
  line_colour =  line_colour, 
  background_colour = background_colour, 
  goal_colour = goal_colour,
  padding = 0
)

plot(blank_pitch)

Quick Look

Initial data and visual processing is done, we can now start to take a look at the interesting stuff!

At a high level, we can have a look at all of the times Messi received the ball and plot them on a pitch. This will probably get overcrowded but can start to provide some understanding.

# All ball receipts ------------
Messi_Plot <- 
  blank_pitch +
    geom_point(data = Messi_Ball_Receipts, aes(x=location.x, y=location.y), colour = "purple") +
    ggtitle("Messi Ball Receipts") +
    theme(plot.background = element_rect(fill = grass_colour),
          plot.title = element_text(hjust = 0.5, colour = line_colour))
plot(Messi_Plot)

Since each time Messi receives the ball is in a specific location, we’ve used points to represent this on the pitch. This looks okay initially, but it’s pretty hard to work out exactly what’s going on and doesn’t really tell us anything we didn’t already know. Messi gets the ball a lot in the opposition’s half.

There are lots of overlapping points, let’s try to get a view of the density distribution to see where specifically he has received the ball the most.

# Density Receipts ----------------
Messi_Density_Plot <- 
  blank_pitch +
    geom_density_2d(data = Messi_Ball_Receipts, aes(x=location.x, y=location.y), colour = "purple") +
    ggtitle("Messi Ball Receipts - Density") +
    theme(plot.background = element_rect(fill = grass_colour),
          plot.title = element_text(hjust = 0.5, colour = line_colour))
plot(Messi_Density_Plot)

This mostly suggests the same thing, Messi likes to receive the ball in the opposition half. Though we can now also see that there are two “peaks”, one far out wide near the top and one closer to the centre. More central areas are more dangerous, whereas you might get more space out wide to be able to receive the ball easier.

Luckily (definitely not luckily) StatsBomb just so happen to have a flag which identifies events that occurred under pressure. I believe under pressure is taken as having an opposition player within X metres of you actively affecting your decision making.

Let’s take a look at Messi’s ball receives whilst under pressure and under no pressure. I would expect that you would be under pressure more often the closer you get to the opposition’s goal.

# Pressure --------------
Messi_Ball_Receipts <- Messi_Ball_Receipts %>%
  mutate(pressure = ifelse(is.na(under_pressure), "No Pressure", "Pressure"))

Messi_Pressure_Plot <-
  blank_pitch +
  geom_point(data = Messi_Ball_Receipts, aes(x=location.x, y=location.y, colour = pressure)) +
  ggtitle("Messi Ball Receipts by Pressure") +
  theme(plot.background = element_rect(fill = grass_colour),
        plot.title = element_text(hjust = 0.5, colour = line_colour),
        legend.position = "bottom",
        legend.title = element_blank(),
        legend.background = element_rect(fill = grass_colour),
        legend.text = element_text(color = line_colour))

Messi_Pressure_Plot

I’m not really sure what I expected. It’s pretty hard to distinguish between the two as there are so many points. To further filter the data we can take a look at this in each season.

# Pressure Season Loop ----------------
for (i in rev(La_Liga$season_name)) {
    print(
      blank_pitch +
        geom_point(data = Messi_Ball_Receipts %>% filter(season_name == i), 
                   aes(x=location.x, y=location.y, colour = pressure))  +
        ggtitle(paste0("Messi Ball Receipts by Pressure - ", i)) +
        theme(plot.background = element_rect(fill = grass_colour),
              plot.title = element_text(hjust = 0.5, colour = line_colour),
              legend.position = "bottom",
              legend.title = element_blank(),
              legend.background = element_rect(fill = grass_colour),
              legend.text = element_text(color = line_colour))
      ) 
}

Now there’s a lot less going on. Remember those two peaks of ball receipts from above? We can see here that this is due to Messi receiving the ball in different areas of the pitch in different seasons. Again, this is something we already probably knew. Messi started his career as a wide forward so will receive the ball out wide most of the time. From 2009/10 onwards he starts to receive the ball much more centrally, coinciding with his time playing as a “False 9” up front. Coincidentally, his already ridiculous production output skyrocketted. Messi getting the ball in central areas = goal machine.

This still hasn’t really answered the question of where Messi recieves the ball under pressure as it’s hard to tell if there’s a pattern to the blue/red or if it’s all just random.

Something that can help here are marginal density plots. These can be plotted along each axis separately and can hopefully display the distribution of ball receipts more intuitively.

Taking a look at all seasons initially.

xdens_pressure <- axis_canvas(Messi_Pressure_Plot, axis = "x") +
  geom_density(data = Messi_Ball_Receipts, aes(x=location.x, fill = pressure), alpha = 0.5) +
  xlim(xmin, xmax)

combined_pressure_plot <- insert_xaxis_grob(Messi_Pressure_Plot, xdens_pressure, position = "top") 

ydens_pressure <- axis_canvas(Messi_Pressure_Plot, axis = "x") +
  geom_density(data = Messi_Ball_Receipts, aes(x=location.y, fill = pressure), alpha = 0.5) +
  xlim(ymin, ymax) +
  coord_flip()

combined_pressure_plot %<>%
  insert_yaxis_grob(., ydens_pressure, position = "right")
ggdraw(combined_pressure_plot)

Again there’s a bit too much going on on the pitch here, but looking at the marginal distributions across each axis is interesting.

Across the top it looks like there’s not too much difference in distribution between “Pressure”“ and “No Pressure”. There is a higher peak for “No Pressure”“ about halfway inside the opposition’s half which could be due to Barcelona practically camping themselves outside the opposition box and all defenders are on the edge of their own box for the majority of most games.

Along the right is as expected, there are many more pass receives under no pressure out wide.

And for each season separately.

for (i in rev(La_Liga$season_name)) {
    p <- blank_pitch +
      geom_point(data = Messi_Ball_Receipts %>% filter(season_name == i), aes(x=location.x, y=location.y, colour = pressure)) +
      ggtitle(paste0("Messi Ball Receipts - ", i)) +
      theme(plot.background = element_rect(fill = grass_colour),
            plot.title = element_text(hjust = 0.5, colour = line_colour),
            legend.position = "bottom",
            legend.title = element_blank(),
            legend.background = element_rect(fill = grass_colour),
            legend.text = element_text(color = line_colour))


    xdens <- axis_canvas(p, axis = "x") +
      geom_density(data = Messi_Ball_Receipts %>% filter(season_name == i), aes(x=location.x, fill = pressure), alpha = 0.5) +
      xlim(xmin, xmax)
    xplot <- insert_xaxis_grob(p, xdens, position = "top") 

    ydens <- axis_canvas(p, axis = "x") +
      geom_density(data = Messi_Ball_Receipts %>% filter(season_name == i), aes(x=location.y, fill = pressure), alpha = 0.5) +
      xlim(ymin, ymax) +
      coord_flip()

    comb_plot <- insert_yaxis_grob(xplot, ydens, position = "right")
    print(ggdraw(comb_plot))
}

Now this is what we all came here to see.

For the first 6 seasons of his career (2004/05 – 2009/10), Messi actually received the ball closer to goal under no pressure than he did under pressure (distribution across the top), which is pretty incredible and opposed to both what you would expect and what we saw overall. These are the wide forward Messi seasons, which shows just how good he was and how good he was getting at being a wide forward. Where he most often received the ball under no pressure (peak of the top distribution) actually moves closer to the opposition goal until 2008/09.

Then in 2009/10 something magical happens. Somehow he manages to receive the ball under no pressure both closer to the goal (across the top) AND dead in the centre of the field (along the right). Which of course is a recipe for success.

From then on, looks like teams at least tried to put pressure on him when he received the ball close to the goal. Not really sure that worked so much though.

There are a lot more amazing things from Messi’s career hidden away in this amazing data set. Thanks again to StatsBomb for the free access to explore and show off some things that are possible with the data.

@TLMAnalytics

#15: Getting Started with Free StatsBomb Event Data – xG Shot Map Tutorial

Introduction

After attending StatsBomb’s Introduction to Football Analytics last week, I was inspired to take another look at the free events data that they offer. One of the main obstacles to breaking into the football analytical industry is getting data to play around with and show what you can do, which is why Statsbomb’s commitment to offering such samples of data for free is so amazing and should be taken advantage of! There are endless possibilities of insight and visualisations to create using the event data, limited only by your creativity.

Support and free tutorials are also freely available for using data in R, including their own StatsBombR package and FCrStats’s twitter and GitHub who provides functions for creating custom pitches for visualisations. Did I mention they were both free?

https://github.com/statsbomb/StatsBombR & @StatsBomb
https://github.com/FCrSTATS & @FC_rstats

It can be intimidating to start to work with complex data like this, so I will go through step by step and create a version of a popular match visualisation: an Expected Goals Shot Map.

Since the Fifa Women’s World Cup is currently taking place and the StatsBombR package is continually being updated with new games as they are played, I thought I’d use the recent England v Argentina game as an example.

Install RStudio

FOr those completely new to R, you can download the latest RStudio version here:

https://cran.r-project.org/

And install packages using:

install.packages(“…package name here…”)

Load Libraries

To start we will load the relevant libraries.

library("StatsBombR") # Event data
library("SBpitch")    # Custom functions for creating pitches
library("ggplot2")    # Building visualisations 
library("tidyverse")  # Data manipulation

Create Blank Pitch

Using FCrStat’s SBpitch package you can create a pitch to use with custom visualisations using the create_Pitch() function. You can specify the colours and which lines you want to see. For the xG Shot Map, we will use the whole pitch.

# Create a blank pitch using create_Pitch()
blank_pitch <- create_Pitch(
  goaltype = "box",
  grass_colour = "#202020", 
  line_colour =  "#797876", 
  background_colour = "#202020", 
  goal_colour = "#131313"
)

blank_pitch

unnamed-chunk-2-1

Get StatsBomb Data

Using the StatsBombR package, getting access to the free events data is as simple as running the StatsBombFreeEvents() function as below and storing it in your environment.

statsbomb_events <- StatsBombFreeEvents()

## [1] "Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please register your details on https://www.statsbomb.com/resource-centre and read our User Agreement carefully."

Get Match Info

There are over 100 variables for each event of each match, so we want to narrow the data set down to a single match. We are interested in the Fifa Women’s World Cup match with England v Argentina. We are also only interested in shots, so will only include those types of events.

I have also included the colours for the respective teams to use later on.

The x,y location of each event is stored in a single variable as an array.
Using the separate() function in the Tidyverse we can extract these and create new variables called “location_x” and “location_y”.
Use as.numeric() to make the new location variables numeric so we can plot them later.

event_type <- "Shot"
team1_colour <- "red4"
team2_colour <- "lightblue"

# Narrow down to a specific match: Australia Women's v Brazil Women's
match <- statsbomb_events %>%
  filter(# Fifa Women's World Cup Competition ID
           competition_id == 72 & 
           # Eng Womens v BArg Women's Match ID
           match_id == 22962 & 
           # Only keep events that are shots
           type.name == event_type ) %>%
  # X,Y locations are stored in a single array column, separate() into two columns
  separate(col = location, into = c(NA, "location_x","location_y")) %>%
  mutate(location_x = as.numeric(location_x),
         location_y = 80 - as.numeric(location_y))

Create Goal and xG Indicators

Since we are interested in the actual goals and expected goals of each shot, we can create a goal indicator variable and respective expected goal variables for the shots of each team.

match <- match %>%
  mutate(# Create a goal indicator
         Goal = ifelse(shot.outcome.name == "Goal","1","0"),
         # Create England goal indicator and xG
         team1_Goal = ifelse((shot.outcome.name == "Goal" & team.name == unique(match$team.name)[1]),"1","0"),
         team1_xG = ifelse(team.name == unique(match$team.name)[1],shot.statsbomb_xg,NA),
         # Create Argentina goal indicator and xG
         team2_Goal = ifelse((shot.outcome.name == "Goal" & team.name == unique(match$team.name)[2]),"1","0"),
         team2_xG = ifelse(team.name == unique(match$team.name)[2],shot.statsbomb_xg,NA)
)

Plot Shot Locations

Okay, lots of preparation done so far. Let’s plot some shots!

ggplot2 builds plots from the ground upwards. Remember the blank_pitch we made earlier? We use that as a base and add the shot locations on top using geom_point to add points/dots

# Plotting raw shot locations
blank_pitch + 
  geom_point(data = match, aes(x=location_x, y=location_y), colour = "white")

unnamed-chunk-5-1

Oops, looks like all the shots happened at the same end, regardless of team. We need to reverse the shot locations of one team, since we know the pitch dimensions from create_Pitch() as 120 x 80, we can use those.

# Looks like all shots are at the same end, need to reverse the locations of one team
match <- match %>%
  mutate(location_x = ifelse(team.name == unique(match$team.name)[1],
                             120 - location_x,
                             location_x),
         location_y = ifelse(team.name == unique(match$team.name)[1],
                             80 - location_y,
                             location_y)
         )

Plot Respective Coloured Locations

Let’s see if it worked!

# Try again, with different colours for each team
blank_pitch + 
  geom_point(data = match, aes(x=location_x, y=location_y, colour = team.name)) +
  theme(legend.position="none") + 
  scale_colour_manual(values = c(team1_colour, team2_colour))

unnamed-chunk-7-1

Oof, looks like England had lots of shots and denied Argentina anything significant.

Highlight Goals

We have shot locations, but it would be nice to see which shots are goals using the goal indicator we created earlier and we can use a different shape (triangle) to differentiate.

# Now highlight the goals
blank_pitch + 
  geom_point(data = match, aes(x=location_x, y=location_y, colour = team.name, shape = Goal)) +
  theme(legend.position="none") + 
  scale_colour_manual(values = c(team1_colour, team2_colour))
unnamed-chunk-8-1

Plot Size of xG

Looks like England scored from their shot closest to the goal in the Argentina 6-yard box. Let’s see how likely they were to score by using the size of the points to reflect the expected goals.

# Now use size to reflect shot xG
blank_pitch + 
  geom_point(data = match, aes(x=location_x, y=location_y, colour = team.name, shape = Goal, size = shot.statsbomb_xg)) +
  theme(legend.position="none") + 
  scale_colour_manual(values = c(team1_colour, team2_colour))
unnamed-chunk-9-1

Looks like England scored with their best chance and could potentially have scored a few more considering their volume of relatively good shots. This is a skeleton of an Expected Goals Shot Map, we can add in annotations to make the final plot look more presentable and quantify each team’s expected goals versus actual goals.

Add Titles and Annotations

blank_pitch + 
  geom_point(data = match, aes(x=location_x, y=location_y, colour = team.name, shape = Goal, size = shot.statsbomb_xg)) +
  theme(legend.position="none") + 
  scale_colour_manual(values = c(team2_colour, team1_colour)) + 
  # Australia's xG
  geom_text(aes(x = 2, y=78,label = unique(match$team.name)[1]), hjust=0, vjust=0.5, size = 5, colour = team1_colour) +
  geom_text(aes(x = 2, y=75,label = paste0("Expected Goals (xG): ",round(sum(match$team1_xG, na.rm = TRUE),2))), hjust=0, vjust=0.5, size = 3, colour = team1_colour) + 
  geom_text(aes(x = 2, y=73,label = paste0("Actual Goals: ",round(sum(as.numeric(match$team1_Goal), na.rm = TRUE),0))), hjust=0, vjust=0.5, size = 3, colour = team1_colour) + 
  geom_text(aes(x = 2, y=71,label = paste0("xG Difference: ",round(sum(as.numeric(match$team1_Goal), na.rm = TRUE),0)-round(sum(match$team1_xG, na.rm = TRUE),2))), hjust=0, vjust=0.5, size = 3, colour = team1_colour) +
  # Brazil's xG
  geom_text(aes(x = 80, y=78,label = unique(match$team.name)[2]), hjust=0, vjust=0.5, size = 5, colour = team2_colour) +
  geom_text(aes(x = 80, y=75,label = paste0("Expected Goals (xG): ",round(sum(match$team2_xG, na.rm = TRUE),2))), hjust=0, vjust=0.5, size = 3, colour = team2_colour) + 
  geom_text(aes(x = 80, y=73,label = paste0("Actual Goals: ",round(sum(as.numeric(match$team2_Goal), na.rm = TRUE),0))), hjust=0, vjust=0.5, size = 3, colour = team2_colour) + 
  geom_text(aes(x = 80, y=71,label = paste0("xG Difference: ",round(sum(as.numeric(match$team2_Goal), na.rm = TRUE),0)-round(sum(match$team2_xG, na.rm = TRUE),2))), hjust=0, vjust=0.5, size = 3, colour = team2_colour)
unnamed-chunk-10-1

That looks a little better, at least we now know the score and how each team did compared to their expected goals. After creating a blank pitch, we only need to add layers to get a visualisation of the information we want which is incredibly powerful. To get the visualisation for another match, simply change the match_id (and team colours) above!

The only packages used to create this are those loaded above, with the free events data provided by StatsBomb and extra functions/tutorials by FCrStats

Again, you can find those two here:
@StatsBomb
@FCrStats

Hopefully this will help get some people get started and overcome any initial intimidation. I will look to provide more of these types of step by step guides going forwards the more I get to play around with the data.

@TLMAnalytics

#14 What Defines a Successful Season?


No matter what happens, Liverpool’s season is a success.

With the best Premier League title race going down to the last day of the season, it’s a large contrast to last season where Manchester City had the league wrapped up and were aiming for 100 points. They became the Premier League team with the most points in a season, beating Chelsea’s 04/05 95 points by 5 points, arguably becoming the most successful Premier League team. They were so good that it’s not such a surprise that this year Manchester City are on 95 points with a game to play, potentially getting 98 points and becoming the team with the second most points scored the year after smashing the record last year. The surprise of this year is that despite Manchester City being so good, the title is still going down to the final day. Liverpool have got 94 points with a game to play and a win will bring them up to 97 points, becoming at least the team with the third most points, depending on how Manchester City’s game turns out. We are likely seeing two of the top three premier league sides ever in the same season, with the best team being one of these teams last year! It’s truly an incredible season and hopefully we appreciate how good these teams are.

This brings us to the imminent question and judgement on the whole season that comes from these last games. One team will be champions and one team will not. One team’s season will be a success and one team’s won’t. That may seem unfair since as discussed, these could be two of the three best teams to be seen in the Premier League.

However, there are of course more trophies to be won and means to success than just the Premier League. Manchester City got 100 points last year, on course for 98 points this year, have already won the Carabao Cup, Community Shield and are in the FA Cup final. They are on course for the domestic treble and the only team to get more points than them in a season were themselves last year, but they got knocked out of the Champions League quarter-finals to Tottenham. Which people are keen to focus on, despite domestic success once again Manchester City failed in Europe. The criticism is fair, Manchester City were favourites to beat Tottenham over two legs but they didn’t, largely due to prioritising the league over their first leg match. That’s where the problem with success lies, Manchester City were going for the Quadruple and looks likely they will have to settle for a domestic treble. This just shows how high their standards are and what perceived success is for a team of their quality. With two games to go, one in the League and one in the FA Cup final, from here they expect to go on to win both. However, if they don’t they already have the joint second highest points total and have won the Carabao Cup, this is probably not successful considering how close they got to all of their goals but is one hell of a season with all the chance to do the same again next year.

In comparison and with the incredible Champions League semi-finals just behind us, Liverpool have made it to the Champions League final for the second time in two years and are favourites to win this time against Tottenham. Liverpool are in contention to win the Premier League and the Champions League this year, that is an incredible achievement in itself. They lost to the Real Madrid three-peat side with Cristiano Ronaldo and without Mohamed Salah last year, as expected. Most teams don’t get to a single European final, let alone get to back to back finals. They have managed to beat Paris St Germain, Bayern Munich and Barcelona on their way to the final, even with Lionel Messi largely pulling the semi-final tie away from them in the first leg, they were the better team across both legs and you can’t argue they don’t deserve to be there.

As a worst-case scenario for the finish to this season, if Liverpool lose in their final Premier League match and lose the Champions League final to Tottenham, they will still have the fourth highest points total in a season and have got to back to back Champions League finals. Even at worst case scenario, you could argue that’s a successful season. They are expected to win the Champions League and beat Wolves on the final day, ultimately getting 97 points and coming second to the second-best team in the Premier League. Their expected finish to the season is definitely a success. If Manchester City were to drop points and Liverpool won the League title, doing the Premier League and Champions League double whilst getting the second highest points total in a season would cement this team among the Premier League’s best. It’s not possible to on one hand potentially be considered the best ever, but also potentially be considered to have an unsuccessful season based on 2 games of football. No matter what happens, Liverpool’s season is a success.

@TLMAnalytics

#13 The Top 4 Race

At the end of 2018, Liverpool were 7 points clear of Manchester City with the title fully in their hands after Man City’s slip up over Christmas. Tottenham were only 2 points behind City and arguably had to be considered in the title race too based on points alone. Chelsea were a further two points behind Spurs and while not honestly title contenders, they were very much in control of the final Champions League spot in fourth. Arsenal and Man Utd were in seemingly no mans land behind the top 4 but ahead of the bottom 14.

Figure 1: Premier League table @ 31/12/2018

This was halfway through the season, and it looked like we weren’t going to have a title race or even a top 4 race at all. Fast forwards ~10 more games, and things look a whole lot different. The two at the top have cemented themselves as the only title challengers, whilst Spurs and Chelsea have been dragged back into a race for fourth by the upturn in form of both Arsenal and Man Utd. It makes the remaining games of the season all the more meaningful.

Figure 2: Premier League table @ 29/03/2019

I’m going to focus on the top 4 race, with only four points separating Spurs I 3rd from Chelsea in 6th, take a look at each team’s chances and suggest who might miss out.

Spurs:  

Up until the end of 2018, Spurs were having the best season ever. Despite the fact that they are still playing Home games at Wembley, having several injury problems to Kane, Alli and not having a central midfield, they have managed to grind out win after win. It seems that after perhaps over performing the first half of the season to keep up with the top 2, they have regressed to the level that you would expect them to be in terms of points. Spurs are comfortably behind City and Liverpool but better than the rest in recent seasons, and the table now reflects that.

It has really only been the last 4 games that has clawed Spurs back, losing to Burnley and Southampton really only being the terrible unexpected results. But surround that with a loss to Chelsea and a lucky draw to Arsenal, two rivals for the top 4 and 1 point out of 12 will stop any team in their tracks.

Looking forwards, Spurs have still got to play away to the top 2, so could have a huge influence on where the title goes. But these aren’t the games that they should be worrying about. Moving into a new stadium midway through a season isn’t the norm and considering they have 5 winnable home games left, how well they settle into New White Hart Lane will determine whether they comfortably remain in 3rd or make things hard for themselves and rely on the rest faltering too.

Still in the Champions League, however they face Manchester City in the quarters so wouldn’t rely on that too heavily.

Arsenal:

So far, the first season without Arsene Wenger hasn’t gone as bad as it could have, they have always been comfortably a top 6 team but it was still a question whether they would be able to challenge for top 4. They seem to be one of the most polarizing teams when playing home or away this season. The last time they didn’t win a home game was in November and the last time they won an away game that wasn’t Huddersfield was in November. Arsenal at the Emirates are a completely different beast to otherwise.

In 2019, they’ve won all 6 of their home games but only 1 out of 4 away games (v Huddersfield). Admittedly drawing away to Spurs and deserving to win and losing away to Manchester City isn’t anything to be ashamed of, but losing to West Ham isn’t great. Recent form has been fantastic as a result, you know what you’ll get with Arsenal at home and anything could happen away.

Their remaining fixtures look great on paper, arguably the easiest in the league. The highest placed team they face is Wolves and they play Newcastle, Crystal Palace and Brighton all at home. Home form isn’t an issue and would expect to win all those games, however that means they still have 5 away games to play against mid table sides, the level of teams just below the top 6 but not going to get relegated. West Ham are also in that group.

To keep their top four spot, Arsenal will have to be better away from home. It’s non-negotiable as they will likely need more than 9-12 points out of the remaining 8 games.

Manchester United:

United have just appointed Ole Gunnar Solskjaer as their permanent manager, which doesn’t come as a surprise after the form that he has brought with him. After, cutting the 10-point gap to Spurs down to only 4 points in 2019 and progressing through to the Champions League quarter finals after beating Paris St Germain, it doesn’t seem like he could have done any more. The change in mindset around Old Trafford since Mourinho left has been exactly what they needed to try to turn their slow start around.

In 2019 they’ve won 5 of their 6 away games (including a 1-3 win away in Paris), losing away to Arsenal where we’ve established they’re pretty good. But only won half of their 4 home games, drawing to Liverpool and Burnley. The style of play that OGS has got Man Utd playing reflects the results pretty well, giving license to a mobile front three of Martial, Rashford and Lingard with Pogba supporting from behind is a devastating counter attacking team that relies on playing high defensive lines with lots of space behind them to exploit. Away from home it’s easier to set up this way due to the home team having more confidence, however when teams come to Old Trafford it seems as though if they set up to not to lose and deny United space in behind then they find it harder to break teams down.

This hasn’t been such a problem so far as they’ve managed to not lose when they can’t win for the most part. United only play 3 more games away from home, Wolves, Everton and Huddersfield, and you wouldn’t expect Wolves or Huddersfield to allow much space behind their defence. Whilst Everton have been much more robust defensively this year despite appearances.

With 5 games as home, they will need to improve their home form to push into the top 4. Though they will be playing Manchester City and Chelsea at Old Trafford, it will depend on how results have been up to that point but potentially all three of these teams will see these as must win games and that could bode well for United if City and Chelsea both push forward a bit too much trying to win the game.

They unfortunately have been drawn against Barcelona in the quarter final of the Champions League, however we also didn’t give them any chance against Paris St Germain and look what happened.

Chelsea:

Chelsea are a peculiar team, they have got much of the same team that has won multiple Premier League titles but every other season they seem to look disinterested. This has started to look like one of those seasons. By the end of 2018, they looked to be well established as the fourth best team in the league, however that was probably due to Man Utd and Arsenal underperforming rather than Chelsea doing anything exceptional. Since the two below them have gone on a great run of form, Chelsea haven’t been able to respond. They are an incredibly inconsistent team and it seems their performance depends on Maurizio Sarri’s mood or ability to motivate his players, which he claims he can’t.

In 2019 they’ve had pretty poor form, and not exactly for you’d expect of a top four side. They’ve won 4 and lost 4 of their last 10 games, two losses include 6-0 and 4-0 thrashings by Manchester City and Bournemouth both away from home. Aside from those, they’ve lost to Everton and Arsenal and only beaten Fulham away this year. The most worrying thing is the lack of goals, they’ve only scored 2 goals away (against Fulham) this whole year. Whilst at home looks okay at best, beating Spurs, Newcastle and Huddersfield and drawing with Wolves and Southampton.

Chelsea have a really bad problem in football away from home where they can’t seem to score goals and let in way too many. Whilst at home they can’t seem to score enough, but don’t let too many in either. Looking into more detail as to why this is happening might be worthwhile here, as they’re regularly playing (arguably) the best defensive midfielder in the world albeit out of position and an incredible controller of the game behind him. That combination shouldn’t be getting overrun in midfield.

Chelsea still have to go away to Liverpool and Man Utd which sounds scary, though they do have winnable home games and so need to win those if they want to remain in the top four race.

Overview:

Spurs – Focus on 5 winnable home games at their new stadium

Arsenal – Need to bring home form away with them as 5 tricky mid table away games remaining

Man Utd – Need to do better at home, more dominant with the ball to break down teams

Chelsea – Open up more at home to try to win games, get tighter away and don’t collapse.

#12 Statsbomb Event Data – Fernandinho Replacements

Manchester City find themselves once again top of the Premier League, with the chance to retain the title for the first time in 10 years since Manchester United in 2008/09. However they also find themselves without Fernandinho, the only seemingly irreplaceable player in their squad that overflows with talent. Fernandinho has missed four Premier League games so far this season, the two at the end of December in which they lost and left the league title in Liverpool’s hands and the two most recent games which were both dominating 1-0 wins. Even if their performances were no worse off and just lacked some luck, no doubt there is nobody else in their squad who can do exactly what Fernandinho does.

Even Guardiola has commented that there is no doubt they will be looking to bring in a replacement:

“I think with the way we play we need a guy who has of course physicality, is quick in the head and reading where our spaces to attack are”

Guardiola

In this post I will try to scout a replacement for Fernandinho using Statsbomb’s 2018 FIFA World Cup Event data. This is a small sample size, so will only include players and their performances in the World Cup. I will define some metrics that could be used to describe the type of player that would fit the role that Fernandinho plays and identify those players that performed best during the World Cup.

Guardiola talks about physicality, quickness of thought and reading where the spaces will be to attack. It is hard to quantify those qualities, however using adapting some simpler metrics could give a good shortlist.

We know that Manchester City will have the ball a lot and want to get the ball forwards to their more attacking players in attacking areas, relying on Fernandinho to progress the ball. Using Statsbomb’s passing events, with the start and end location in x, y coordinates, I have defined a ‘Progressive Pass’ to be one that moves up the pitch more than 10m. Players who have the ability to progress the ball forwards are desired. It could be argued that we also want to only include players who progress the ball from deeper positions so as to more accurately emulate Fernandinho’s role, however we have a small sample as it is and the ability to play progressive passes is what we are looking for.

Whilst lots of players are great at passing, what makes Manchester City so special and Fernandinho so hard to replace, is their ability or willingness to win the ball higher up the pitch. Check out a previous post in the link below where I show how many more times they win the ball back in the opposition’s half. In the same vein, using Statsbomb’s ball recovery event with the x, y location I create a count of times that a player has recovered the ball in the opponent’s half. This tries to emulate the ability to win the ball back quickly after losing it and pinning the opposition back.

https://thelastmananalytics.home.blog/2018/11/06/3-are-man-city-better-without-the-ball-defensive-analysis/

The combination of progressive passes and high ball recovery is used as a proxy for the type of skills that Fernandinho portrays and can be used to get a shortlist of players that perform similarly. Looking at only the players who played positions considered as central midfield or defensive midfield, the top 10 is below.

Figure 1: Midfield Progressive Passes and Opponent Half Recoveries Top 10 from 2018 FIFA World Cup

One thing to note is that these are pure counts and not per game or per 90min. It would be worth taking a look at that to account for the differences in games and minutes played. For example, Croatia making the Final and Germany getting knocked out in Group Stage is a difference of four games, so Toni Kroos making it to 2nd on the absolute list is incredible.

Initially it looks like the list makes sense, players like Kroos, Modric, Rakitic are all players who you could see being able to play in a deeper midfield role. Mascherano is also in the same mould, even more so considering he has played at Centre Back most of the time for Barcelona and Fernandinho has begun to slot in there to bring the ball out.

Those players are all 30+ years old so no better than Fernandinho in terms of potential replacements. Granit Xhaka and Marcelo Brozovic are two that are just entering their prime midfield years at the age of 26. This is where it’s important to note that when scouting, context is important and large sample sizes are encouraged. Xhaka may have the progressive passing ability and love of yellow cards, but probably wouldn’t have the discipline.

This post has looked at outlining a way to narrow down a shortlist of potential replacements for Fernandinho, the methods can be used to find similar players for any player as long as you can identify what you are looking for. Ideally you would get a much larger sample size of games and could look at a player’s contribution per game or per 90mins to get a more stable shortlist. In the future I would like to look at some unsupervised methods which don’t require you to specify or create the similar fields as I have done here.

I have included the total passing heatmaps and the recovery maps of selected players; if you want to see any players specifically from the World Cup from any position then give me a shout!

Once again, massive shout out to Statsbomb for providing the free source of event level data, it’s hard to come by and even harder to collect so it’s much appreciated!

@TLMAnalytics

#11 Normalizing xG Chain – Are all actions created equal?

In this post I will be taking a look at the concepts of xG Chain (xGC) and xG Buildup (xGB), why they are useful and how we can develop these concepts to get even more use from them. Both of these concepts further the expected goals (xG) and expected assists (xA) metrics, allowing the contribution of players not directly involved in a goal to be accounted for.

xG is a likelihood attached to each shot that attributed the chance of that shot being a goal. This metric is only really useful for players who take lots of shots, such as forwards.

xA is attached to a pass that immediately precedes a shot, the xA measures the likelihood that a pass will become an assist from the following shot.. This metric aims to widen the influence of the xG metric and attribution of play to the creative players who create the shots that the xG provides information for.

Both of these are intuitive and simple concepts that provide an estimate for specific actions on the pitch. Since goals and assists are key events in a match, it makes sense to focus analysis on them since they are incredibly predictive. xG and xA are very limited however, they only care about a shot and the preceding pass so don’t tell us anything about any of the play that happens leading up to there. It turns out that the majority of football isn’t just taking turns taking shots, so it would be nice to be able to do something like xG/xA for other actions on the pitch.

Just as xA is to xG; attributing the result to the preceding pass, xG Chain is to xA where it aims to do the same thing for the whole preceding possession chain. In this way you can widen the influence of xG to all players that are involved in the preceding possession. Where xG mainly highlights forwards and xA mainly highlights creative players, xG Chain aims to highlight players that make contributions to the possessions that end up with a shot. These could include your ‘assisting the assister’ players, your deep lying playmakers like Jorginho who get criticised for lack of assists or your progressive passing defenders that wouldn’t usually get the credit they potentially deserve for starting effective possessions.

Calculating xG Chain: https://statsbomb.com/2018/08/introducing-xgchain-and-xgbuildup/

  • Find all possessions each player is involved in
  • Find all shots within those possessions
  • Sum the xG of those shots (usually take the highest xG per possession)
  • Assign that sum to each player, however involved they are

You can normalise xGC per 90mins to see contributions per match, however this still highlights forwards and creative players since if they are the players getting the shots then they will get all the credit for their own shots plus any other possession chains they are involved in.

Since the aim is to highlight players that xG and xA don’t directly pick up, you can calculate xGC without including the shots and assists to get xG Buildup. This leaves all of the preceding actions to the assist and the shot, or all of the build up play as it were. By removing assists and shots, the dominance of forwards is removed and the remaining players are heavily involved in all the play up to just before the defining assist and shot. You can also normalize xGB per 90 mins to see contributions per match. Again, each player involved gets equal contribution as long as they are involved in the possession chain in some way.

xG Chain and especially xG Buildup are great metrics that highlight the contributions of players leading up to assists and shots. They allow players that don’t contribute directly to goals to make a case for their own importance. Normalising per 90 mins is an effective way to allow for reduced player minutes due to injury or substitutions, and evaluate all players on the same basis.

As great as the concepts of xGC and xGB are, there is a clear and influential flaw in the calculation when assigning the xG of the possession chain to the players involved. Each player gets equal contribution no matter how involved they were. So player A makes a simple 5 yard pass in their own half gets the same assigned contribution as player B who made the decisive through ball to a player who squared it for an open goal. Neither player would get credit in xG/xA but both would get the same xGC/xGB contribution despite the fact that player A’s contribution was potentially arbitrary and player B’s turned the possession chain from probing to penetrating and a shot on goal.

Another way to consider the contributions of each player is if you were to remove the action of that player, how likely was the possession chain to have still occurred. If you remove player A’s simple pass, it doesn’t take much for the possession chain to maintain its low threat whereas if you remove player B’s decisive through ball then it’s unlikely that the possession chain continues in the same way. In this way, player B’s contribution could be argued to be more important than player A’s.

This leads to considering other ways of normalising xGC and xGB, each method of assigning contribution and normalising will highlight different aspects of the build up.

Since you have all the information of each possession chain, you may have access to the number of passes or touches that each player contributed to the chain. If you proportion the xGC out by the frequency of passes or touches you can get a good idea of the proportion of involvement that each player has in each possession chain. For example, if a possession chain involves two players, C and D, where player C made 3 passes and player D made 4 passes with a resulting shot that has an xG of 0.7. Then player C contributed 3/7 passes so gets an xGC of 3/7 * 0.7 = 0.3 and player D contributed 4/7 passes so gets an xGC of 4/7 * 0.7 = 0.4. Since player D was involved slightly more than player C then player D gets a higher xGC. A similar calculation can be made using touches which will consider players who dribble more than just counting passes.

You aren’t limited to just counting passes or touches of the ball, you can get more creative with the allocations if you want to credit specific types of actions. You could only count progressive passes that move the ball forward by at least 10 yards, try to quantify the most important or necessary actions of a possession chain (decisive through ball/taking on a player in the box) or count the number of opposition players taken out of the game by each player involved, where ‘taking a player out the game’ may be defined as moving the ball closer to the defending team’s goal than the player.

xG Chain and xG Buildup are both intuitive and simple metrics that assign contributions to players that don’t get directly involved in taking shots or assists but are frequently involved in preceding actions to these events. On their own they can already highlight players that seem to contribute well under the ‘eye-test’ when you watch them, but they can be misleading and provide many false positives since all actions are considered equal under xG Chain.

@TLMAnalytics

Credit to Statsbomb and Thom Lawrence for introducing concepts and providing clear explanations and examples. They even include free data sets for FAWSL and the 2108 FIFA World Cup if anyone wants to try themselves. Check them out here:

https://statsbomb.com/

#10 Match Report: Man City 2 – 1 Liverpool

Liverpool head into their first game of 2019 still unbeaten and 7 points clear of arguably the best ever Premier League side, reigning champions Manchester City. Manchester City were on course for another incredible year, and still are by anyone else’s standards, however losing at home to Crystal Palace and then Away to Leicester in 2 of their last 3 games was not in the script for their next documentary.

Up to Christmas, City had been unbeaten too, sitting top of the league and had already played all of the other ‘Top 6’ sides away from home. it was looking like the question was whether City could go unbeaten, with Liverpool doing amazing to just keep up. A severe dip in form, a key injury and some incredible shooting against them saw City relinquish the lead in the title with Liverpool not looking like slowing down at all.

A Liverpool win at the Etihad and the gap becomes 10 points, arguably the title race is over without a Liverpool collapse (not impossible). A draw would maintain the 7-point gap, but would also give Liverpool hope that they can continue in their excellent season since the champions couldn’t beat them at their own ground. Whilst a win for City would reduce the gap down to 4 points, which means City are still relying on Liverpool messing up, but it also means that Liverpool are no longer untouchable and City will have put doubt in Liverpool’s minds.

Considering City finished champions 25 points ahead of Liverpool and won 5-0 in this fixture last season, if I were to say to you that this was the most even game of the season so far would be surprising to say the least. It shows how far Liverpool have come in such a short space of time that that is indeed the case, this game was incredibly even and almost any result could’ve happened if repeated.

City did end up winning 2-1, however the Expected Goals (xG) from Understat suggest it wasn’t an easy win. The xG score was City 1.18 – 1.38 Liverpool, suggesting arguably Liverpool would win this game more often than City if repeated and a draw is most likely. For a game with two of the highest scoring teams in the league, there were not that many shots or chances created with only 9 – 7 for City – Liverpool respectively. This low shot volume adds to the variance in xG numbers and emphasises that it would be more down to individual skill at finishing or luck to determine this game rather than an overwhelming inevitability that someone would score.

Figure 1: Size of bubble = Expected Goals (xG), Location = Location of shot, Stars = Goals

In terms of finishing and scoring goals, Liverpool were not very clinical however they did create the best chance of the game with a lovely cross field pass followed by a first time cross across the box for a tap in to an empty goal. They also had a ball cleared off the line by centimetres following a scramble after a rebound off the post. Other than those two, Liverpool were limited to shots through crowds of bodies. City managed to manufacture some chances through counter attacks, and also capitalise on the fact that Sergio Aguero is an incredible finisher from tight angles. Whilst Liverpool scored with their highest xG chance (0.62), City missed both of their highest xG chances (0.49, 0.32) and scored from two lower xG chances (0.06, 0.05) which suggests that it was City’s finishing when needed was the difference in goal scoring.

Since not very chances were being made, most of the game and interesting plays were between the two boxes. There are three players I’d like to highlight, all playing central midfield: Fernandinho, Bernado Silva and James Milner. It’s hard to quantify the effect that these players had on the game, but all three were excellent in denying the opposition any space or progression up the pitch.

No player had more ball recoveries than Silva with 10, Fernandinho had 9 and Milner whilst only being on the pitch for about an hour had 7. In of itself ball recoveries doesn’t mean much, however especially for City players it’s the area of the pitch that they win the ball back that’s so great.

Figure 2: 25/65 Man City recoveries in Liverpool’s half, 10/56 Liverpool recoveries in Man City’s half

https://thelastmananalytics.home.blog/2018/11/06/3-are-man-city-better-without-the-ball-defensive-analysis/

5 out of the 10 ball recoveries for Silva and 4 out of 9 for Fernandinho were in Liverpool’s half, which suggests that City were winning the ball back high up the pitch and not allowing Liverpool to progress much further. Compared to other players with high recoveries, this is significant. Not only recoveries, but Silva also completed 3 tackles on the halfway line out of 8 (!) attempts and made 4 interceptions in Liverpool’s half. As you can imagine Silva got around the pitch a lot this game and managed to cover 13.7km which is the most in a game this season. I don’t usually like those kinds of stats since they don’t suggest anything about a player’s involvement in a game but maybe suggest that they’re just out of position recovering for the whole game. However, Silva was definitely involved and sometimes that extra effort you put in makes others do the same.

A lot of Fernandinho’s work is done off the ball, in ways that aren’t quantifiable by tackles or interceptions or distance covered. It’s clear how large an impact he has in City’s midfield since the two games he didn’t play due to injury were the two games they lost so far this season. Fernandinho deserves more than a paragraph of one game to highlight his skills, he’ll be the focus of an upcoming post in the future. But City need to find a replacement quickly for him, or find a way of playing that doesn’t rely so heavily on him sweeping up behind the front 5’s press.

It’s a shame that James Milner had to be the one to come off early in the second half, Milner plays similarly to Bernado Silva when Liverpool have the three in midfield and was as effective as Silva defensively until he got taken off. Moving to 4-2-3-1 since they needed to score was probably a sensible move, however needing a goal and leaving Jordan Henderson on the pitch alongside Fabinho (better version of Henderson) doesn’t always end well. It worked out since Liverpool scored an amazing team goal but they may have been more of a threat if Milner was alongside Fabinho. Also, doesn’t help pushing Wijnaldum out to left wing with several wingers sitting on the bench but hey.

Come the end of the season, this game will be regarded as a turning point whatever happens. Whether Liverpool collapse and City come back to win their second title in a row or Liverpool brush it off and continue in the same manor we will find out, but Manchester City have showed their hand and they are here to stay until the end of the season. We have our first real title race in years, take it in and enjoy it.

Thanks to @StatsZone and Understat for images, stats and xG numbers.

@TLMAnalytics

#9 Defensive Metrics [Transitions]

No matter how good a team are at maintaining possession, it is inevitable that you will lose the ball at some point. That’s okay, it happens, it’s not something you can prevent. What you can do something about is how you decide to react. This is where transitions come into play and happen so frequently that it’s important to get them right. Every time possession gets turned over, the attacking team need to change their mentality and positioning to reflect the fact that they are now defending. When considering defensive transitions, I will look at how quickly a team can make that change and what that change actually could be.

Usually an attacking team try to set up to maximise the space on the pitch, the players will be positioned high and wide to open up space between defenders. Whilst the defending team usually try to set up more compact, to deny space to the attacking teams and keep them away from key areas on the pitch such as near the penalty area and goal. Moving between these two mindsets efficiently throughout a match will determine games, the best teams are capable of seamlessly navigating between the two. Defensive transitions are moving from an attacking state to a defensive state. Moving from having possession with an intent to score a goal to not being in possession with an intent to get possession back and not concede.

Even in Figure 1 and Figure 2, Football Manager 2019 now acknowledges transitions when creating tactics to add to their realism.

When you lose possession, there are two main ways of trying to win the ball back. You can try to win the ball back immediately, this is commonly known as ‘counter pressing’ or ‘gegenpress’ popularised by Jurgen Klopp at Borussia Dortmund and now at Liverpool. If this isn’t an option then you will revert straight to a defensive set up that aims to deny space to the attacking team in key areas, this may be in your half or just approaching the penalty area. This is usually the default option for a team, especially for lesser teams against more threatening opponents. I’ll take a look at the benefits and drawbacks of each option and when each should be used.

When trying to win the ball back immediately, it usually requires a high burst of energy in the short amount of time after losing possession to swarm the opponent. An example of this is Barcelona under Pep Guardiola and their 6-second rule, where within 6 seconds after losing the ball the Barcelona players blitz the ball and opponent with the aim of forcing an error and retaining possession as quickly as possible.

The clear benefit to this style is that if it works, you minimise the amount of time that the opposition has possession and you maximise the time that you have possession since you win the ball back so quickly. Unfortunately, that’s under the condition that you do win the ball back. If you don’t win the ball back quickly, it’s hard to maintain such a high intensity of effort and pressure so you are forced into the second option and revert to a designated structure.

In that short space of time, the main aim is to win the ball back so defensive structure may be neglected. Again, if you can’t win the ball back quickly then it may take you longer to revert back to a designated structure and a team may capitalise on this extended transition period where players may be out of position. Due to the potential negligence in structure, this type of press is usually only used when losing possession in the opposition’s half. This gives you more time to revert back to a defensive structure if you fail to win the ball back.

When you fail to win the ball back immediately, or if you choose not to even try to, then you need to have a defensive structure that you move to every time your opponent has possession. The main aim is to not concede a goal and a mechanism to do that using a structure is to try to deny space to the opposition in key areas. What you define as a key area can depend on specific matches but generally a structure is constructed to deny space in your penalty area and anywhere within shooting range on goal. Depending on where you lose the ball can reduce the options that you have. Losing the ball in the opposition’s half allows you to try to deny ball progression into your own half before the opponent even get anywhere near your goal. Once they manage to get into your half you can then attempt to stop them progressing near to your penalty area. Whereas if you lose the ball in your own half then you need to immediately assume a structure that denies space near your penalty area.

Compared to turtling, a gegenpress will require certain type of players that are capable of frequent short high intense periods. Not every team has those players so that style isn’t even an option so some teams. The potential drawbacks of failing to win the ball back and neglecting defensive structure puts more emphasis on one on one defending and so is utilised more by teams that have a higher quality of individual players. When you turtle and drop deep to deny space, you are utilising the short spaces between players to cover and as a result you don’t need high quality individuals but those individuals to work as together. It’s a tactical decision whether or not to use the gegenpress and teams that are expected to win will use it as a way to gain an advantage with little risk.

@TLMAnalytics

Credit to Football Manager for acknowledging transitions in their tactics page, love the development

#8 Player Partnerships and Compatibility

Throughout the years there have been many partnerships between players that seem to stand out and are more memorable than others. The classic Yorke and Cole strike partnership for Man Utd’s treble winning season, Xavi and Iniesta passing rings round everyone for fun and Ferdinand and Vidic who became arguably the Premier League’s best defensive pairing. The main thing in common with these players is that they were successful, playing at the highest level of the game for years and so were the best of their eras. As they were successful, they must have been pretty good. Individually you need to be very good to make it at the top, however since football is a game that requires eleven players all working towards the same goal it’s how well you can fit into a system alongside other talented players which can make a good team a great team. If the aim of a team is to be greater than the sum of its parts, when that occurs very good things tend to happen, eg. Leicester City in 2015/16.

The problem arises at every football club of how to fit together all of your best, talented players into the same team. If you can get those players working well together then you will be working near the optimal level that your team can achieve, and that’s the aim. However, this frequently doesn’t turn out to be the case. You have two-star players who both play as strikers, however each prefers to lead the line on their own and so when you force them to play together, both of their performances drop and as a result the team’s performance drops. It feels like a lose-lose situation, either you drop one of your star players who will then be unhappy or you play them both and suffer bad performances. The ideal solution as suggested is to get them both playing together, however sometimes players are incompatible and so the next best solution is to work out which of those players is more compatible with your team and offload the other.

On a football pitch there are eleven players and so you want to get optimal performance from the whole team rather than just having your right full back and right winger link up well every now and then. Each player has a partnership to some degree with every other player on their team, those players that are in frequent contact and close proximity will usually have stronger partnerships since they interact more. These are pairings such as the two centre backs, two centre midfielders, full back and their respective winger and two strikers. This of course will depend on formations, if you are playing three central midfielders then it’s how well those three can play together which is important.

Figure 1 – Football Manager represents player partnerships with green lines

One way to measure the compatibility of all eleven players would be to compare the results and performances of each combination of eleven players on the pitch. So, one starting line-up would have a set of results, whilst if you just changed one player you assume that is a completely different starting line up and have a separate set of results for that group. It’s debatable how much of an impact changing one player could be, it depends how influential the player you replaced was. The problem with this is that there won’t be a large enough sample size since rarely do the exact same players play every week due to injury or rotating players due to tactics.

Another option could be to compare the results and performance of specific partnerships that you are interested in, such as when your central defenders are the same or when your strikers are the same. In looking at specific partnerships, it allows you to look into more specific areas of performance when assessing. For example, when comparing two sets of central midfielder partnerships, you may want to compare how much possession or passes you had in those matches whereas if you’re comparing two sets of striker partnerships you may want to compare how many goals were scored in those games.

As mentioned earlier, there aren’t just partnerships between those players in positions in close proximity but between players all across the pitch. Maybe a right full back has an understanding with the left winger and likes to make a long diagonal pass or maybe the winger likes to come inside and receive passes from the central defenders.

So far, we have looked at how to compare sets of partnerships with each other, using metrics such as goals or passes as proxies for compatibility since they are seen as productive outputs. Without realising, these are all on the ball metrics. There is nothing wrong with looking at those, however they need to be just part of the answer. Most of the game for every player is played off the ball so it’s arguably more important to assess their performance off the ball than on it.

When assessing partnerships off the ball performance, as I’ve discussed in my Defensive Metrics posts, it’s more about how you can be in the right place at the right time and what decisions you make. If it so happens that when you are playing with a certain player that collectively you are able to maintain correct distances, are capable of covering for mistakes and therefore force the opposition to make worse decisions as a result then you would be more compatible with that player than with another. The problem is that it’s hard to quantify these, I have attempted to outline a few ways in which you can start to get some insights in previous posts however it’s still very early. If we can effectively measure player performance of the ball and interactions between players that don’t involve the ball then we will get more of an insight into what makes some partnerships work so well and why some others don’t.

There are many examples of passing networks in football, based on specific matches and representing the distribution of passes between each player. They provide a certain aspect of the partnership between players, it would be interesting to see other examples of networks in football with connecting lines that represent how compatible the two players are. How exactly we can measure that is up for debate.

@TLMAnalytics

#7 Defensive Metrics [Decision Making]

“If I have to make a tackle then I have already made a mistake.”

Paolo Maldini

It’s a famous quote I’m sure you’ll have heard, but you can hear the penny drop in every single person who hears it for the first time. One of the best defenders (if not the best) to have played football couldn’t be wrong could he. Yet defenders and defensive players are judged mainly on statistics such as number of tackles or blocks. Tackles and blocks are usually last-ditch attempts to prevent an opponent from progressing.

Defending is a constant ongoing process that is happening throughout a football match, no matter who has the ball or where the ball is on the pitch. As a collective team, and individually, every player is moving into positions that adhere to a defensive structure with an aim of conceding the least amount of goals possible. Each player will contribute to that by performing defensive actions, these are usually known as tackles or blocks. However, to perform a tackle or block first requires the opposition to have the ball in a potentially dangerous area, or rather first requires you to allow the opposition to have the ball in a potentially dangerous area. More importantly and less easy to quantify would be the actions and ability to prevent a forward getting the ball in dangerous areas in the first place.

It doesn’t seem a stretch to suggest that the something better than blocking every shot on goal is to prevent every shot being taken in the first place.

When a forward has the ball, they will have an aim in mind of what they want to achieve with their possession. There will be a hierarchy of aims ranging from scoring a goal down to retaining possession of the ball. Whilst a defender will also have an aim in mind when a forward has the ball. Their hierarchy of aims will be a version of the reverse of the forwards, ranging from not conceding a goal to winning the ball back. The immediate aim of both the forward and the defender will depend on factors such as location of the pitch, time of the game, game state and the perceived abilities of each player by each player.

For example, if the striker has the ball in the penalty area then their primary aim may be to take a shot to score a goal, whilst the defender’s primary aim may be to not concede a goal.

If the fullback had the ball in their own half then their primary aim probably won’t be to score a goal straight away, but rather progress the ball up the pitch either through midfield or down the line to the winger. If those two options are not available then they potentially need to regress their aim down to maintaining possession and recycle the ball back to goalkeeper or centre backs. In this case, the defender may be a striker or a winger who has closed the ball down, the defender’s primary aim here may be to prevent forward progression of the ball towards a more dangerous position.

Figure 1: Davies’ decision making options v Chelsea

These thought processes will be going back and forth between each player at all times throughout a match. Even whilst nowhere near the ball, these are things players need to consider at maybe a more minute level. Furthering the example above with the fullback and winger, the fullback’s aim is to ball progression and the winger’s aim is to prevent ball progression. If possible, the fullback would play the ball straight into the striker so that they could progress the ball up the pitch as far as possible as quickly as possible, however collectively the defence need to negate that as an option. Maybe the defending centre back is marking the striker tightly with the defending central midfielder also blocking off any direct pass, just enough so that the fullback doesn’t consider passing to the striker a viable option.

Figure 2: Chelsea unable to prevent Davies from progressing the ball

If the defending team sufficiently prevent efficient progress into dangerous areas of the pitch then their job is made much easier. As we can see in Figure 1 and Figure 2, Chelsea were unable to prevent ball progression, as a result they are left to defend a more dangerous situation and even resort to tackling or blocking (!).

The decisions that each player has, defender or forward, aren’t limited to just marking or blocking passing options and passing or shooting. Forwards may want to dribble past players, cross the ball from wide or even off the ball may make runs into space to receive the ball. These decisions of the forwards cause defenders to react respectively, how well they deal with the questions asked by the forwards depends on the abilities of the team and players in question.

It would be interesting to look at the decision making of defenders and forwards in different situations by counting the number of times or frequency of a decision overall and whether that depends on who they are facing or where they are on the pitch. A decision here for a forward would be a simple action such as attempt a shot, attempt a dribble, pass the ball up the line or retain possession. Whilst a defensive decision would depend on the decision of the forward, it would be interesting to see if players change their decisions significantly when playing against certain players. It could be a way to measure to what degree a defender can force a forward into uncomfortable positions and into making unfavourable decisions or decisions lower down on the forwards hierarchy of aims.

As always, any feedback or questions are welcome. These are primitive ideas and just looking to provoke thoughts of football analytics from a different perspective.

@TLMAnalytics