Asymmetries in football: The pass-goal paradox

We investigate the relation between the number of passes made by a football team and the number of goals. We analyze the 380 matches of a complete season of the Spanish national league"LaLiga"(2018/2019). We observe how the number of scored goals is positively correlated with the number of passes made by a team. In this way, teams on the top (bottom) of the ranking at the end of the season make more (less) passes than the rest of the teams. However, we observe a strong asymmetry when the analysis is made depending on the part of the match. Interestingly, fewer passes are made on the second part of a match while, at the same time, more goals are scored. This paradox appears in the majority of teams, and it is independent of the number of passes made. These results confirm that goals in the first part of matches are more"costly"in terms of passes than those scored on second halves.


Introduction
Year after year, the analysis of actions and patterns occurring in a football match is becoming more complex [1,2,3]. Technology is the main responsible for the avalanche of new kind of datasets that analysts and data scientists working in football clubs have to deal with [4]. In this way, every action occurring in the pitch is recorded and categorized, from passes to goals, but also tackles, shots, fouls, corners, possessions... At the same time, the position of all players (including the referees) and the ball is recorded at rates up to 25 frames per second, which allows obtaining not only the position of players in real-time but also their speeds, accelerations, or total distances covered.
The availability of these datasets has resulted in a diversity of new kind of methodologies and metrics to understand what is happening on the pitch. New points of view have arisen, such as evaluating the control of the pitch [5], measuring the area covered by the convex hull [6] or tracking the evolution of the passing networks between players [1]. Furthermore, new metrics have been defined to quantify the performance of specific actions such as the expected goal (xG) parameter [7,8], which quantifies the quality of a shot, or the post-shot expected goals (PSxG), defined for evaluating goalkeepers [9].
However, despite the increasing complexity of the analysis in football, there are still fascinating conclusions drawn from a closer inspection of the classical football indicators [10]. For example, Lago-Peñas et al. analyzed the final result of a match when the home (or away) team scored first [11]. They showed that teams that scored first ended the match scoring around the double of their opponents. Furthermore, home teams scored first around 60% of the matches. Another approach is to count the number of passes. In [12], authors counted the passes made before goals during the 1990 Fifa World Cup finals, showing that successful teams scored more goals after longer passing sequences. In a more recent study analyzing the 2004 European Championship, Yiannakos and Armatas showed the existence of a high percentage of long passes before goals but, more importantly, they reported a higher percentage of goals in the second part (57.4%) than in the first part of the match (42.6%) [13], a fact also observed in other studies [14,15].
Redwood-Brown went one step beyond and investigated the number and accuracy of passes before and after scoring a goal [16]. Interestingly, he observed that, during the five minutes before a goal, the num-ber of passes was higher than the average. On the contrary, during the five minutes after a goal, the number of passes was lower. Furthermore, the accuracy of passes was also related to scoring, with teams showing a higher percentage of successful passes before scoring a goal and a lower percentage during the following five minutes [16].
In this paper, we investigated the relation between the number of passes made by a team and the number of goals. We analyzed the 380 matches of the 2018/2019 season of the Spanish national football league "LaLiga". Our analysis focused on two issues, first, we wanted to confirm the results presented by Redwood-Brown [16], which suggested that increasing the number of passes could be related with increasing the probability of scoring a goal. Second, we investigated the differences between the first and second parts of a match, intending to find analogies/discrepancies between them. Our results show that, indeed, there is a relation between the number of passes and scored goals, although the correlation between both variables (passes and goals) was not as high as we expected. However, we found an interesting paradox when looking at the differences between parts: Despite passes and goals have a positive correlation between them, second parts have a lower number of passes while, at the same time, the number of goals is higher. In this way, the number of passes required to score a goal is much higher in the first part of a match, making passes of the second parts more efficient.

More passes, more goals
The datasets we analyzed consisted of the number of passes and goals made by each of the 20 teams participating at the Spanish national football league ("LaLiga" Santander). Specifically, we have a total of N = 357724 completed passes and M = 983 goals. We also considered the temporal information (minute and second) of both types of events, which allowed us to separate between the first and second halves of the match. Figure 1 shows the number of completed passes made by each team vs. the number of scored goals. The solid red line is the linear regression of the data, which had a correlation coefficient of r = 0.6724. It seems that there is a positive correlation between both variables, although its value is rather low. However, this result is not conclusive, so let us carry out an alternative analysis to shed more light on the interplay between passes and goals.   (3) the three teams that were relegated (RE) to the second division. We can observe how teams on the top four have the highest average number of passes, followed by the teams in the middle of the table and, finally, relegated teams. On the second column Tab.1, we show the average number of goals for each group. Comparing both columns, we can observe that the higher the number of passes of a group, the higher the number of scored goals and, furthermore, the higher the position at the final raking. Are these results statistically significant? To answer this question we considered the variables "pass" and "goal" obtained for all matches of teams belonging to

Group
Passes Goals each group. We had to randomly sample 114 values at each group, since groups have different number of observations and we were limited by the number of observations of the smallest group. Then, we run a 1-way ANOVA to compare the passes of the three groups and a 1-way Kruskal-Wallis (KW) test to compare their goals. The latter is a non-parametric approach to the former, given that the number of goals per match is very low, and thus we cannot expect it to follow a normal distribution. Then, we compared groups in pairs, to check if they have equal means/medians or not. Finally, to ensure that the statistical analysis was unbiased, we repeated this process 1000 times (sampling, general test, pair-wise comparisons), correcting the p-values for multiple comparisons with False Discovery Rate, adjusted for α = 0.01 [18]. Tables 2 and 3 show the results of the group comparisons in passes and goals, respectively. From left to right: (i) average difference (standard deviation) between groups, (ii) average p-value associated to it, and (iii) percentage of iterations (out of 1000) in which we can safely state that there are statistical differences between groups. Note that all p-values shown hereafter have been already corrected for multiple comparisons. As we can see in Tabs. 2-3, differences between relegated and middle ranking teams are not statistically significant, no matter the variable used to compare them (goals/passes). On the contrary, top 4 teams are clearly different to the other two groups in terms of passes (100% of cases in which we find statistically significant differences after correcting for multiple comparisons). Differences are one order of magnitude higher in these cases. Concerning the number of goals (Tab. 3.), differences are not as evident, but some of them fulfill the Groups  Groups

Asymmetries between the parts of the match
Next, we investigated whether the results observed during the whole match were maintained when the two parts of the match were analyzed independently. In other words, we were interested in finding asymmetries  between both halves of a match, in case they exist. With this aim, we first analyzed how the number of passes was related to each of the two parts of a match. In Fig.  2 we show, for each team i, the number of passes at the first and second parts, n 1 (i) and n 2 (i), respectively. As we can observe, there is a strong decrease in the number of passes in the second part of matches. In Fig. 2, teams are ordered, from left to right, according to the position at the end of the season. We can observe how 17 teams out of 20 had a lower number of passes in the second part, with Atlético de Madrid and F.C. Barcelona being the teams whose decrease was more pronounced. Only three teams did not display this behavior: Alavés, Levante and Huesca.
Arriving to this point, a natural question arises: How the reduction of the number of passes is related to the number of goals? To answer this question, first, we show in Fig. 3 the goals scored at each part by all teams, i.e. m 1 (i) and m 2 (i), respectively. As previously reported in the literature [13,14,15], the number of goals increased in the second part. This increase was especially significant for Sevilla and Real Betis, and it is reported at 17 teams. Only Athletic Club, Girona and Rayo Vallecano showed a decrease in the number of goals in the second part. Interestingly, Girona and Rayo Vallecano were relegated at the end of the season. In this way, despite teams completed fewer passes in the second half, they scored more goals, which may seem counterintuitive.
Next, in Fig. 4, we divided the total number of passes made at each part by the total number of goals scored by each team. This ratio is an indicator of how "efficient"  passes at each part are or, conversely, how "costly" a pass is in terms of the number of passes. Interestingly, we can observe that goals required more passes in the first part of the match for the majority of teams (18 out of 20). Real Betis was the team with the highest differences between parts. The reason is the high number of passes required to score goals in the first parts of its matches. On the other hand, only two teams deviated from the general behavior: Athletic Club, Girona and Rayo Vallecano. Finally, it is worth mentioning that Getafe was the team requiring the least number of passes to score a goal. This team has a particular style of play characterized by an intense pressure at higher positions of the field, leading to ball recoveries close to the opponent's goal and, probably, reducing the number of passes before scoring.

Conclusions
Passes and goals are two of the most relevant actions in football. Here, we investigated the interplay between them, showing that there is a strong asymmetry in both the number of passes and goals performed at each part of a match. The analysis of the 20 teams playing at the first division of the Spanish national league showed that there is a moderate correlation between the number of completed passes and the amount of scored goals. When teams were grouped according to their ranking at the end of the season, we observed that the top 4 teams were those making more passes and scoring more goals while, on the contrary, relegated teams had, on average, a lower number of passes and goals. In this way, the first conclusion of our analysis is rather intuitive: Teams making more passes score more goals and, ultimately, occupy a higher position at the end of the season. However, a paradox arises when looking at the distribution of goals between the two parts of a match: While more passes were made during the first half of a match, fewer goals were scored. This fact makes goals more "costly" in terms of the number of passes during the first part. The explanation of this paradox is twofold. On the one hand, as discussed in [15], the decrease in the physical performance of players could be related to a higher probability of making mistakes, which would increase the probability of scoring of any of the two teams. In turn, fatigue could also be responsible for tactic disorganization. On the other hand, the proximity of the end of the game could be a reason for taking more risks in order to change the final result, leading again to an increase in the probability of scoring.
Although we observed that the pass-goal paradox was present at most teams, we must also note that few of them did not fulfill it (3 teams out of 20 in our case). Therefore, further studies should be carried out to investigate (i) why some teams scape from this paradox, (ii) to evaluate its generality by applying a similar analysis to datasets coming from other football leagues and (iii) to validate the results presented here with larger datasets. Finally, other variables, such as playing at home or away, have been shown to influence the total number of passes and goals [17] during a match, and they should also be included in the "to-do" list.