#18: Space & Structure: Attempting to quantify (Pt. 2)


Following on from the previous post where I tried to talk through why space and structure matters in football, this post here will try (emphasis on try) to quantify the concepts. The inspiration for this method of quantifying structure has come from looking at event data and trying to combine that with a notion of structure for each event.

When picturing each event, it’s at a point in time of a match with the known locations of each player. They’re snapshots in time with a load of information about how each team is organised for this specific event. These can be plot individually on a pitch, with a visual representation for each event to provide context. If you do plot each event individually, as you’d expect, each plot will look different as players move around the pitch. This difference is what I would like to quantify. Each event has a bunch of player locations in relation to the pitch and ball locations, is there a way to quantify this specific set of locations for an event? If so, can we see how that quantity changes when moving on to the next event?

If a team is defending and they have a certain quantity attached to their structure during a specific event, when the offending team does something which changes the defensive team’s structure massively and causes a goal scoring chance then I’d expect this quantity to have changed significantly to reflect the change in scenario on the pitch.

In the brief research I’ve done trying to find something that fits this criteria, it seemed sensible to take a look at network graphs. They have nodes which can represent players and edges which can be weighted depending on distances between the players at each event. Networks have been used in football in the past with reference to passing metrics and average positions to quantify and qualify different styles of play. They’re usually aggregated views of whole matches, whereas this idea is to sort of create a time series of individual sequential events each represented by networks using distances to represent how a match unfolds. For each match there would be a network representing each event, this allows a distance metric to be calculated between each event to represent how significant or important an event was in changing the structure or immediate flow of the game. Since networks don’t inherently care for locations, to compensate for this I have added in the locations of the centre of the pitch, each corner and each goal to reflect movement around the pitch.

For example, a pass between centre backs intending to maintain possession and cycle the ball may not have such a large difference in structures before and after the pass of either team. Whereas a through ball which plays a forward through on goal and cuts through the defence will have a much larger difference in structures before and after the through ball. This is the type of distinction I hope that will be quantified.

# Libraries ---------------


Enough theory, let’s try and see a simplified extreme example. Since for each event all player locations are necessary and that data is both hard and expensive to come by, I have created an extremely unrealistic event which hopefully shows the concept.

Reminder on how to go about plotting a pitch using the ‘SBpitch’ library.

goaltype = "box"
grass_colour = "#202020"
line_colour =  "#797876"
background_colour = "#202020" 
goal_colour = "#131313"

ymin <- 0 # minimum width
ymax <- 80 # maximum width
xmin <- 0 # minimum length
xmax <- 120 # maximum length

blank_pitch <- create_Pitch(
  goaltype = goaltype,
  grass_colour = grass_colour, 
  line_colour =  line_colour, 
  background_colour = background_colour, 
  goal_colour = goal_colour,
  padding = 0


plot of chunk blank_pitch

Locations on the pitch are static, so will be universal for each event.

# Pitch Locations
centre_circle <- c((xmin+xmax)/2, (ymin+ymax)/2)
bottom_left_corner <- c(xmin, ymin)
top_left_corner <- c(xmin, ymax)
bottom_right_corner <- c(xmax, ymin)
top_right_corner <- c(xmax, ymax)
left_goal <- c(xmin, (ymin+ymax)/2)
right_goal <- c(xmax, (ymin+ymax)/2)

Event 1

The first event is a pass, where both teams are in relatively neutral positions, almost like from a kick-off. The ball is on the half way line.

# Ball Location
ball_xy_1 <- centre_circle

# Player Locations
player_1_position_1 <- c(10,40)
player_2_position_1 <- c(20, 15)
player_3_position_1 <- c(20, 30)
player_4_position_1 <- c(20, 50)
player_5_position_1 <- c(20, 65)
player_6_position_1 <- c(50, 15)
player_7_position_1 <- c(50, 30)
player_8_position_1 <- c(59, 40)
player_9_position_1 <- c(50, 65)
player_10_position_1 <- c(80, 30)
player_11_position_1 <- c(80, 50)

player_12_position_1 <- c(120-10, 80-40)
player_13_position_1 <- c(120-20, 80-15)
player_14_position_1 <- c(120-20, 80-30)
player_15_position_1 <- c(120-20, 80-50)
player_16_position_1 <- c(120-20, 80-65)
player_17_position_1 <- c(120-50, 80-15)
player_18_position_1 <- c(120-50, 80-30)
player_19_position_1 <- c(120-50, 80-50)
player_20_position_1 <- c(120-50, 80-65)
player_21_position_1 <- c(120-80, 80-30)
player_22_position_1 <- c(120-80, 80-50)

# Complete Location DataFrame
event_1_player_locations <- as.data.frame(cbind(
event_1_xy <- as.data.frame(t(event_1_player_locations)) 
names(event_1_xy)[1] <- "location_x"
names(event_1_xy)[2] <- "location_y"
teams <- c("pitch", "pitch", "pitch", "pitch", "pitch", "pitch", "pitch",
           "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
           "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2")
player_size <- c(1, 1, 1, 1, 1, 1, 1, 1.5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
team_colours <- c("cyan", "Red", "White", "Green")

# Plot event 1 on the pitch
blank_pitch +
  geom_point(data = event_1_xy, aes(x=location_x, y = location_y, colour = teams),  size = player_size) +
  scale_colour_manual(values = team_colours) +
  theme(legend.position = "none")

plot of chunk event_1_pitch

There are no immediate goal scoring chances for the team in possession. We can calculate a distance matrix based on the distances between each player, the ball and points on the pitch. This distance matrix will help to create a network which looks entirely unappealing but contains all the information we want.

# Create distance matrix (weighted adjacency matrix)
dist_event_1 <- as.matrix(dist(event_1_xy %>% select(c("location_x", "location_y"))))

# create Network for all
graph_event_1 <- graph_from_adjacency_matrix(dist_event_1,
                                             mode = "undirected",
                                             weighted = TRUE,
                                             diag = FALSE)

Event 2

The second event is immediately after the first event above, a pass has been made to the forward who receives the ball in a much more advanced position. Amazingly, there has been no attempt to block or deny him any space, if anything the defence has moved completely out of the way. This is pretty poor defending with a pretty poor defensive structure after the pass.

# Ball Location
ball_xy_2 <- c(100, 40) # Edge of the right penalty area

# Player Locations
player_1_position_2 <- c(10+10, 40)
player_2_position_2 <- c(20+19, 15)
player_3_position_2 <- c(20+19, 30)
player_4_position_2 <- c(20+19, 50)
player_5_position_2 <- c(20+19, 65)
player_6_position_2 <- c(50+19, 15)
player_7_position_2 <- c(50+19, 30)
player_8_position_2 <- c(50+19, 50)
player_9_position_2 <- c(50+19, 65)
player_10_position_2 <- c(80+19, 30)
player_11_position_2 <- c(80+19, 40)

player_12_position_2 <- c(120-10, 80-40)
player_13_position_2 <- c(120-30, 80-10)
player_14_position_2 <- c(120-20, 80-25)
player_15_position_2 <- c(120-20, 80-55)
player_16_position_2 <- c(120-30, 80-70)
player_17_position_2 <- c(120-50, 80-15)
player_18_position_2 <- c(120-50, 80-30)
player_19_position_2 <- c(120-50, 80-50)
player_20_position_2 <- c(120-50, 80-65)
player_21_position_2 <- c(120-80, 80-30)
player_22_position_2 <- c(120-80, 80-50)

event_2_player_locations <- as.data.frame(cbind(
event_2_xy <- as.data.frame(t(event_2_player_locations))
names(event_2_xy)[1] <- "location_x"
names(event_2_xy)[2] <- "location_y"
teams <- c("pitch", "pitch", "pitch", "pitch", "pitch", "pitch", "pitch",
           "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
           "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2")
player_size <- c(1, 1, 1, 1, 1, 1, 1, 1.5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
team_colours <- c("cyan", "Red", "White", "Green")

# Plot event 2 on the pitch
blank_pitch +
  geom_point(data = event_2_xy, aes(x=location_x, y = location_y, colour = teams),  size = player_size) +
  scale_colour_manual(values = team_colours) +
  theme(legend.position = "none")

plot of chunk event_2_pitch

Again we can create a distance matrix which helps create an unappealing network representation of the each player’s location in relation to each other , the ball and points on the pitch.

# Create distance matrix (weighted adjacency matrix)
dist_event_2 <- as.matrix(dist(event_2_xy %>% select(c("location_x", "location_y"))))

# Create Network for all
graph_event_2 <- graph_from_adjacency_matrix(dist_event_2,
                                             mode = "undirected",
                                             weighted = TRUE,
                                             diag = FALSE)

Computing Network Differences

We have two events, one before a pass is made with the defence in a seemingly comfortable position and the second after a brilliantly executed pass that caused the defence to immediately abort mission. There is a clear difference in the players locations and scenario that can be seen from the pitch view, hopefully comparing the networks can reflect this intuitive difference.

The package NetworkDistance has several metrics which aim to compute how difference several networks are form each other using their adjacency matrices. The distance matrices that we have calculated are weighted adjacency matrices and should be appropriate candidates. It’s worth noting the obvious, that these are fully connected, undirected and weighted networks so not all measures will be appropriate. I am no expert in quantifying networks, however some research into the measures led to the intuition that the best candidates for representing the differences seen on the pitch won’t involve counting or replacing edges or looking at node specific measures. This is because each network will be fully connected, the only difference is the weighting’s between each node and we are interested in overall differences not specifically individual nodes jut yet, there may be some merit to looking at those.

The types of distance measures that seem to fit these criteria are (weighted) spectral distances, which compare the distance between the Eigenvectors of each distance matrix of the networks or the diffusion distance which is based off heat diffusion and has an element of time involved.

The measures for each is below, though I haven’t got an intuition for the scale I’m expecting these numbers to be as there’s no real situation for comparison. They are split into the pairwise distances between each network and a matrix representing the respective spectra. In this case there are only two networks for two events and so a single pairwise distance.

# Calculate distance between events (can do all events at once)
events_list <- list(

# Centrality
# event_centrality_close <- nd.centrality(events_list, out.dist = TRUE, mode = "Close", directed = FALSE)
# event_centrality_degree <- nd.centrality(events_list, out.dist = TRUE, mode = "Degree", directed = FALSE)
# event_centrality_btness <- nd.centrality(events_list, out.dist = TRUE, mode = "Between", directed = FALSE)

# L2 Distance of Continous SPectral Densities
event_csd <- nd.csd(events_list, out.dist = TRUE, bandwidth = 1)

# Discrete Spectral Distance
event_dsd_adj <- nd.dsd(events_list, out.dist = TRUE, type = "Adj")
event_dsd_lap <- nd.dsd(events_list, out.dist = TRUE, type = "Lap")
event_dsd_slap <- nd.dsd(events_list, out.dist = TRUE, type = "SLap")
event_dsd_nlap <- nd.dsd(events_list, out.dist = TRUE, type = "NLap")

# Edge Difference Distance
# event_edd <- nd.edd(events_list, out.dist = TRUE)

# Extremal Distance with top-k Eigenvalues
# event_extremal <- nd.extremal(events_list, out.dist = TRUE, k = ceiling(nrow(events_list)/5))

# Graph Diffusion Distance
event_gdd <- nd.gdd(events_list, out.dist = TRUE, vect = seq(from = 0.1, to = 1, length.out = 10))

# Hamming Distance
# event_hamming <- nd.hamming(events_list, out.dist = TRUE)

# HIM Distance
# event_him <- nd.him(events_list, out.dist = TRUE, xi = 1, ntest = 10)

# Network Flow Distance
# event_nfd <- nd.nfd(events_list, order = 0, out.dist = TRUE, vect = seq(from = 0, to = 10, length.out = 1000))

# Distance with Weighted Spectral Distribution
event_wsd <- nd.wsd(events_list, out.dist = TRUE, K = 50, wN = 4)

# Combine Relevant Distances
events_distance <- data.frame(cbind(


Still to come

There are of course some factors that haven’t been considered here such as different areas of the pitch being more important than others. Future ideas would ideally include a weighting that is applied depending on where the ball is located on the pitch. A further improvement to this would be to separate each team for the same event and evaluate their network differences separately alongside the overall difference.

This concept is still a work in progress, but thought it was worth thowing the idea out there. Any questions, suggestions or errors please give me a shout!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.