#26 Written Summary – Invincibles Defending with StatsBomb Events


Arsenal’s Invincibles season is unique to the Premier League, they are the only team that has gone a complete 38 game season without losing (as been achieved in other leagues). They’re considered to be one of the best teams ever to play in the Premier League. Going a whole season without losing suggests they were at least decent at defending, to reduce or completely remove bad luck from ruining their perfect record. This post aims to look into how Arsenal’s defence managed this.

Using StatsBomb’s public event data for the Arsenal 03/04 season* (33 games), I take a look at where Arsenal’s defensive actions take place and how opponents attempted to progress the ball and create chances against them. Find these here:

Event data records all on-ball actions from a match, this is more granular than high level team and player totals. For offensive events such as passing and shooting, event data is great since they are usually on-ball actions. There are on-ball defensive actions such as tackles, interceptions and recoveries which are well captured by event data, these are the defensive events that I will be using.

We can see where these events frequently occur on the pitch for Arsenal compared to their opposition. The key nuance is that just because these are the defensive events recorded, doesn’t mean they are all of the defensive plays that take place on the pitch. The hard part about defensive analysis is that a large proportion of defending are non-events and won’t be captured by event data.

Event data still provides insight by using a combination of the defensive events from Arsenal and the offensive events from Arsenal’s opponents. Arsenal’s defensive events will show where their on-ball defending took place. Arsenal’s opponents’ offensive events will show how they approached attacking Arsenal, this may be the offensive team getting their way or Arsenal’s defence forcing opponents to play in a certain way. From Arsenal’s success, the majority of the time it’s the latter.

Please find the underlying code and methodology here: https://github.com/ciaran-grant/StatsBomb-Invincibles

Arsenal’s Defensive Events

The below plot is a combination of a 2D histogram grid and marginal distribution plots across each axis. We can see that the frequency of defensive actions is evenly spread left to right and more heavily skewed to their own half.

More specifically, the highest action areas are in front of their own goal and out wide in the full back areas above the penalty area. Defensive actions in their own penalty area are expected as that the closest to your goal and crosses into the box are dealt with.

The full back areas seem to be more proactive in making defensive actions before the opponent gets closer to the byline. Passes and cutbacks from these areas close to the byline and penalty are usually generate high quality shooting chances, so minimising the opponents ability to get here is great.

Figure 1: Marginal Distribution Plot of Arsenal’s Defensive Events

The below density grid compares Arsenal’s defensive events relative to all defensive events in their games. Where Arsenal had more events than overall is in red and less than overall in blue. The darkest red areas are again the full back areas, suggesting that Arsenal’s full backs performed more on-ball defensive actions than their opponents. Whereas they defended their penalty area about as evenly as opponents and less frequently in their opponent’s half.

Figure 2: 2D Histogram of Arsenal’s events relative to all defensive events

Opponent’s Ball Progressions

By taking a look at opponent’s ball progressions we can see the opponent’s point of view here. Do Arsenal’s full back areas have so many defensive events because they are ‘funneling’ their opponents there as they see it as a strength or do Arsenal’s opponents target and exploit their full back areas?

These progressions have been grouped into approximate phases of play through the thirds and into the penalty area.

The progressions have been grouped into similar types of progressions by comparing KMeans and Agglomerative Clustering methods. Reassuringly there were similar number of clusters from both methods, but the KMeans method appeared to perform better by grouping similar passes at both start and end locations. Further details can be found here: https://github.com/ciaran-grant/StatsBomb-Invincibles

Own Third

Figure 3: Clusters of opponent’s ball progressions from their own third

We find that the most frequent ball progressions from their own third are shorter progressions into the middle third. There are short progressions centrally in clusters 1 and 7, with wider progressions in clusters 9 and 6. Longer progressions were less frequent.

Shorter progressions are easier to complete and less risky, so not surprising that they are the most frequent. This says nothing for how quality or sustainable these progressions are, but adds to the idea that most of the play appears to be out wide.

Middle Third

Figure 4: Clusters of opponent’s ball progressions from the middle third

Similar to the progressions from their own third, the most frequent progressions from the middle third are shorter down the wide areas in clusters 2 and 9, with some longer progressions in clusters 7 and 5. Clusters 8 and 4 suggest a number of progressions do make it into the centre of Arsenal’s own half.

Final Third

Figure 5: Clusters of opponent’s ball progressions from their final third

Again, lots of short progressions in wide full back areas in the final third seen in clusters 6 and 2. The lowest two frequent clusters appear to be deep crosses into the penalty area. These are the only consistent progressions into the penalty area, but usually create lower quality chances through headers or contested shots.

Penalty Area

Figure 6: Clusters of opponent’s ball progressions into Arsenal’s penalty area

There are fewer groups here due to fewer progressions into the penalty area, which is expected. There are more progressions from each wide area, with a large proportion coming from much shorter progressions in clusters 2 and 1. It’s more difficult to successfully progress the ball long distances into the penalty area. The few clusters here are pretty broadly grouped, intuitively human analysts could create these groupings pretty quickly which suggests the clustering hasn’t helped much here.

Shot Assists

Figure 7: Clusters of opponent’s shot assists

When looking at only shot assists, there are lots that are received outside the penalty area. This suggests that those shots are likely from further out and of lower quality without context. Due to the added restriction of requiring a shot at the end of these passes, there is likely more variance included in these passes and harder to identify clear patterns.


Across ball progressions throughout the whole pitch, the most frequent were shorter and in the wide areas. This is expected as shorter progressions are lower risk and wide areas are less important to defend, so are areas the defending team are willing to concede.

The length of progressions from the middle third appear to be longer than in their own third or in the final third. Context is needed in all individual circumstances but this may be due to a lower risk of failure due to being further away from your own goal or trying to take advantage of a short window of opportunity to quickly progress the ball longer distances into the final third.

When in the final third, the progressions shorten again. This is not due to lower risk, but likely due to a more densely populated area. There will be the majority of all players on the pitch within a single half of the pitch, navigating through there requires precision and patience from the offensive team.

Any completed progression into the penalty area is a success for the offensive team. There is a high chance you will create a shot and if you do it’s likely to be a higher quality shot than from outside the penalty area. Though not all completed progressions into the penalty area are created equal. If it’s a completed carry into the penalty area then awesome, you likely have the ball under control and can get a pretty good shot or extra pass. If it’s a completed pass then it depends how the player receives it, aerially or on the ground makes a difference to the shot quality. Aerial passes are harder to control and headers are of less quality than shots with the feet, however aerial passes are usually easier to complete into the penalty area. So higher quantity, lower quantity than ground passes or carries.

Shot assists rely on there being a shot at the end of them. So circularly they created a shot so are ‘good’ progressions but also they are ‘good’ progressions because they created a shot. As we can see, they are much more random which means it’s harder to understand without context why they created shooting opportunities since the locations alone don’t tell us anything. Although the context is available within StatsBomb’s data, I haven’t taken a further look here.

When considering how these progressions affect Arsenal’s defensive events, remember that the majority of their defensive events were performed out wide in the full back areas and in their own penalty area. Particularly out wide in the full back areas more than other teams, whilst defensive events within their own penalty area around the same as other teams.

At each third of the pitch, the most frequent ball progressions were out wide, which places the ball frequently in the full back areas. Due to the nature of defensive events, the only events recorded would be the on-ball actions that were defined including pressures, ball recoveries, tackles etc. The ball needs to be close to you to be able to perform these actions and get them recorded as events, so the opponent ball progression frequently going out wide combined with Arsenal’s defensive events in their defensive wide positions fits together well.

What this doesn’t tell us is if these are causally linked or just correlate. I would suggest there are more ball progressions made out wide than centrally in all of football due to the defence more likely willing to concede that space, so this doesn’t necessarily tell us much about Arsenal specifically. Though in Arsenal’s matches, they do perform defensive events in their full back areas more than other teams, which may suggest that there is something more than just correlations.

If it is a specified game plan to funnel the ball out wide and perform defensive events there, then Arsenal have done a great job at completing that. It’s a robust defensive plan if you can get it to work, the wider the ball is, the harder it is to score immediately from there. When defending it’s often useful to utilise the ends of the pitch as an ‘extra’ defender, which makes it easier to overwhelm offensive players.

If you’ve made it this far, thanks for your attention. Please find the code in previous blog posts and at https://github.com/ciaran-grant/StatsBomb-Invincibles


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.