Mathematical Models to Measure the Variability of Nodes and Networks in Team Sports

Pattern analysis is a widely researched topic in team sports performance analysis, using information theory as a conceptual framework. Bayesian methods are also used in this research field, but the association between these two is being developed. The aim of this paper is to present new mathematical concepts that are based on information and probability theory and can be applied to network analysis in Team Sports. These results are based on the transition matrices of the Markov chain, associated with the adjacency matrices of a network with n nodes and allowing for a more robust analysis of the variability of interactions in team sports. The proposed models refer to individual and collective rates and indexes of total variability between players and teams as well as the overall passing capacity of a network, all of which are demonstrated in the UEFA 2020/2021 Champions League Final.


Introduction
Pattern analysis is a well-established research topic in team sports performance analysis, with dyadic interactions between players of the same team being represented in the form of passes [1][2][3][4]. A common goal in the many metrics used to describe the individual and group behavior in team sports is the identification of the most influential players or a depiction of the team's behavior in terms of passing interactions, allowing for a characterization of the team's organization along with the identification of the roles each player has in the network [1,2,5,6].
Metrics such as degree centrality, closeness, betweenness and the eigenvector are often used to analyse social interactions in a wide variety of activities ranging from the analysis of social media networks [7,8] to team sports match performance analysis in sports such as volleyball [9,10], handball [11], rugby [12] or football [2,5,6,[13][14][15], among others.
Broadly, these metrics attempt to identify the central elements in a group, the elements with more connections within it or the most influential ones [5,7]. In team sports such as football, knowing the most influential player or the player who is the element with more connections with the remaining elements of the team is crucial, as the opposing team may create strategies to prevent the ball from reaching said players, therefore creating difficulties for the opposing team [1][2][3]5,6].
The application of the concept of entropy to model the interactions of players passing the ball in football has also been used [16][17][18], pointing to higher entropy values leading to greater chances of creating goal-scoring opportunities [17]. The importance of entropy in football lies in the recognition that variability in the behavioral patterns of interaction between players is beneficial for the team, as the system is more unpredictable. Here, unpredictability leads to greater difficulties for the opposing team to interrupt the passing sequences.
An alternative approach to the analysis of these passing sequences in football has also been used, where the predictability of the passes being made is determined using Markov chains [19,20] and, associated to these, node and network entropy mathematical models [21]. These have attempted to explain the degree of variability associated with conditional probabilities, adding a non-linear perspective to a Bayesian approach to match performance analysis and providing a prediction of the variability of occurrence of passes. However, the individual and collective capacity of variability is not given, nor it is possible to assess the level of variability in a comparable ratio and index. Knowing how much variability of a node (player) or a system (team) is situational and specific to each condition (game), neither allows for a clear view of the level of performance that may be attainable, nor of the comparison between different conditions within or between games.
Based in a mixed Bayesian and non-linear approach to team sports analysis, this work aims to introduce new individual and collective rates and indexes of variability within the team, providing new insights about the passing and receiving capacity of a certain team. These new mathematical models allow for a bounded analysis of the passing and receiving capacity of variability, allowing for the comparison between different players, between games or even between different tactical formations during the game, providing valuable information about the individual and team performance in various contexts of the game.

Variability of Nodes and Networks
Based on information theory and probability theory, this section presents mathematical concepts that can be applied to networks that are considered weighted digraphs or weighted directed networks. The elementary concepts of nodes and networks used in this paper are presented in [5,22,23]. So, we assume that A w D . is the weighted adjacency matrix of a weighted digraph with n nodes, G w D . Therefore, we consider the concepts transition matrix of the Markov chain, M T , associated with A w D of a G w D , the k-step node transition and the probability of all nodes in the network after k times steps presented in [21].
Considering the mathematical concept of the entropy of a random variable, defined on the sample space of a random experiment and taking on a finite number [24] (p. 51), [25] and [26] (p. 19), and a weighted digraph with n nodes, we define the concept of rate of passing of a node, and it is the rate of possible variability in passing from this node to all other nodes of a network. Definition 1. Given a weighted digraph, G w D , with n nodes. The R out i (n i ) is called rate of passing of a node n i and it is the rate of possible variability of a node, in passing from node n i to all other nodes n j . It is determined by: where m ij are the elements of M T associated with A w D of G w D , i, j = 1, . . . , n.

Remark 1.
In Definition 1, if we replace w ij by w ji and m ij by m ji , we obtain the concept rate of reception, R in i (n i ), i.e., the capacity of the variability of a node, when a reception by n i from all other nodes n j can occur.
Based on the mathematical concept of the rate transmission [25] and a weighted digraph with n nodes, we can propose the concept of the index rate of passing of a node. Proposition 1. Given a weighted digraph, G w D , with n nodes, the IndR out i (n i ), index of rate of passing of a node n i , is determined by: Proof. Consider that the index of rate of passing of a node n i is the ratio between R out i (n i ) and the maximum that R out i (n i ) can assume. Thus, By Shannon [25] (p. 394), the maximum value occurs when m ij = 1 n . Then, Therefore, Remark 2. In Proposition 1, if we consider R in i (n i ) , we obtain the concept of the index of rate of reception of a node n i , IndR in i (n i ).
Considering the mathematical concept of the rate of passing of a node and the rate of variability when a pass can occur from this node to all other nodes of a network, we define the concept of the rate of passing of a network. Definition 2. Given a weighted digraph, G w D , with n nodes, the R out N is called rate of passing of a network, and it is the rate of variability of a network, when a pass from node n i to all other nodes n j can occur. It is determined by: where m ij are the elements of M T associated with the A w D of G w D , i, j = 1, . . . , n.

Remark 3.
In Definition 2, if we replace w ij with w ji and m ij with m ji , we obtain the concept of the rate of reception, R in N , i.e., the capacity of the variability of a network, when a reception can occur by n i from all other nodes n j .
Based on the mathematical concept of the rate reception [25] and a weighted digraph with n nodes, we can propose the concept of the index rate of passing of a network. Proposition 2. Given a weighted digraph, G w D , with n nodes, the IndR out N , index of rate of passing of a network, is determined by: Proof. Consider that the index of rate of passing of a node n i is the ratio between R out i (n i ) and the maximum that R out i (n i ) can assume. Thus, By Shannon [25] (p. 394), the maximum value occurs when all probabilities are Therefore,

Remark 4.
In Proposition 2, if we consider R in N , we obtain the concept of the index of rate of reception of a network, IndR in N .
Based on the mathematical concept of the entropy of a random variable X with marginal distribution of joint distribution (X,Y) of the Markov Chain associated with the weighted adjacency matrix of a weighted digraph with n nodes [25] (p. 395), we can define the concept of the rate of total variability between nodes that performed the passing in a network. Proposition 3. Given a weighted digraph, G w D , with n nodes and two random variables X and Y such that the pair of transmitter and receiver (X,Y) is the joint distribution of the Markov Chain associated with A w D of G w D , the E out N is called the rate of total out-entropy of a network. It is the rate of total variability between nodes that performed the passing in a network, determined by: where w ij are the elements of A w D and L w D = n ∑ i=1 n ∑ j=1 w ji , i, j = 1, . . . , n.
Proof. By the notion of marginal distribution in X of a joint distribution [25] (p. 395), we obtain Remark 5. In Proposition 3, if we consider the mathematical concept of the entropy of a random variable Y with marginal distribution of joint distribution (X,Y) of the Markov Chain associated with the weighted adjacency matrix of a weighted digraph with n nodes [25] (p. 395), we can obtain the concept of the rate of total in-entropy of a network, E in N .
Based on the mathematical concept of rate of mutual information between two random variables X and Y such that the pair of transmitter and receiver (X,Y) is the joint distribution of the Markov Chain associated with the weighted adjacency matrix of a weighted digraph with n nodes [26] (p. 31), [27] (p. 246), we can propose the concept of the capacity of passing of network when we know information about the receivers. Proposition 4. Given a weighted digraph, G w D , with n nodes and two random variables X and Y such that the pair of transmitter and receiver (X,Y) is the joint distribution of the Markov Chain associated with A w D of G w D , the C out N is the capacity of passing of a network and is determined by: where i, j = 1, . . . , n.
Proof. Consider two random variables X and Y such that the pair of transmitter and receiver (X,Y) is the joint distribution of the Markov Chain associated with A w D of G w D , i, j = 1, . . . , n.
By concept of rate transmission presented in Shannon [25] (p. 407), we obtain Remark 6. Similarly to Proposition 4, considering the concept of the capacity of reception [25] and a weighted digraph with n nodes, we can propose the concept of the capacity of reception of a network, C in Based on the mathematical concept of the rate of transmission [25] and a weighted digraph with n nodes, we can propose the concept of the index of capacity of passing of a network.
Proposition 5. Given a weighted digraph, G w D , with n nodes and two random variables X and Y such that the pair of transmitter and receiver (X,Y) is the joint distribution of the Markov Chain associated with A w D of G w D , the IndC out N is the index of capacity of passing of a network and is determined by: where m ij are the elements of M T associated with A w D , i, j = 1, . . . , n.
Proof. Consider that the index of rate of passing of a network is the ratio between C out N and the maximum that C out N can assume and two random variables X and Y such that the pair of transmitter and receiver (X,Y) is the joint distribution of the Markov Chain associated with A w D of G w D , i, j = 1, . . . , n. Thus, By Shannon [25] (p. 417), we obtain So, Remark 7. In Proposition 5, if we consider C in N , we obtain the index of capacity of reception of a network, IndC in N .

Experimental Results
The 2020/2021 Champions League final was used to demonstrate and interpret the mathematical models proposed. The two opposing teams were Manchester City (MC) and Chelsea (CH). The match, after being originally scheduled to be played in Instambul, Turkey, took place in Porto, Portugal due to the COVID-19 pandemic restrictions at the time. In a very levelled match, CH won 1-0, with Kay Havertz scoring the winning goal. For the notational analysis of each transition between nodes, i.e., passing sequence, the uPATO platform was used [22,28]. The adjacency matrices were computed to calculate all transition state matrices, as described in [21].
The rate of passing variability and receiving variability of each player on both teams (CH and MC, respectively) is presented in Figures 1 and 2, and in Table A1 of the Appendix A. The players of both teams are relatively uniform and show little capacity of variability, showing that the teams have well established passing patterns in the game and that no player stands out in this regard.

Experimental Results
The 2020/2021 Champions League final was used to demonstrate and interpret the mathematical models proposed. The two opposing teams were Manchester City (MC) and Chelsea (CH). The match, after being originally scheduled to be played in Instambul, Turkey, took place in Porto, Portugal due to the COVID-19 pandemic restrictions at the time. In a very levelled match, CH won 1-0, with Kay Havertz scoring the winning goal. For the notational analysis of each transition between nodes, i.e., passing sequence, the uPATO platform was used [22,28]. The adjacency matrices were computed to calculate all transition state matrices, as described in [21].
The rate of passing variability and receiving variability of each player on both teams (CH and MC, respectively) is presented in Figures 1 and 2, and in Table A1 of the Appendix A. The players of both teams are relatively uniform and show little capacity of variability, showing that the teams have well established passing patterns in the game and that no player stands out in this regard.   When looking at the capacity of receiving variability, Player 1 is among the teams' least variable. These are both goalkeepers, and the obtained values are in accordance with what is expected in a football game: passing to the goalkeeper is not common and very often regarded as an extraordinary occasion. Alternatively, and if this player takes part in the organization of the attack, they receive the ball from a very limited number of players-usually the center backs. In fact, this kind of option is taken in MC's strategy when the team incorporates the goalkeeper in the passing sequences. Naturally, variability here is minimal. It is also worth noting that field players with little capacity of receiving variability such as CH's Player 4 (Thiago Silva) or the substitute Players 13 and 14 (Kovacic and Pulisic) may show this characteristic due to the time each player played. Thiago Silva played only 39 min due to a sudden injury, and Kovacic and Pulisic entered late in the game. As for Player 9 (Timo Werner), the fact that this is a forward/striker, the early lead by CH in the game, or the team's tactical setup may explain the low value.
Regarding MC's metrics, and apart from Player 1, Players 7, 9 and 10 and the substitutes 12, 13 and 14 had low receiving variability capacity. Interestingly, these were the players that were substituted during the match. This may show that the technical staff were right to be dissatisfied with the players' performance and attempted to change the course of the game.
In practical terms, knowing that a player has a higher rate of variability may indicate that he shows a higher degree of unpredictability, and it therefore is more difficult to predict to where the ball is going to go. The opposing team will face greater challenges when playing against unpredictable players. On the other hand, variability must also be seen within the group interactions. For example, teammates may have difficulties in creating synergies with highly variable players due to their intrinsic higher unpredictability. There is, therefore, a thin balance between what can be predictable and bring stability and reliability to the game, and the variable, unpredictable and creative.
It is important to note that the units of the rate of variability metric are arbitrary and are related exclusively to each node's maximum capacity, making comparisons between nodes (players) difficult. For a more comparable analysis, the index rate of variability levels down all values and places them between 0 (least variability) and 1 (highest variability).
The index of rate of passing and receiving for each player that participated in the game are presented in Figures 3 and 4. The values of each player are presented in Table  A2 of the Appendix A.  When looking at the capacity of receiving variability, Player 1 is among the teams' least variable. These are both goalkeepers, and the obtained values are in accordance with what is expected in a football game: passing to the goalkeeper is not common and very often regarded as an extraordinary occasion. Alternatively, and if this player takes part in the organization of the attack, they receive the ball from a very limited number of players-usually the center backs. In fact, this kind of option is taken in MC's strategy when the team incorporates the goalkeeper in the passing sequences. Naturally, variability here is minimal. It is also worth noting that field players with little capacity of receiving variability such as CH's Player 4 (Thiago Silva) or the substitute Players 13 and 14 (Kovacic and Pulisic) may show this characteristic due to the time each player played. Thiago Silva played only 39 min due to a sudden injury, and Kovacic and Pulisic entered late in the game. As for Player 9 (Timo Werner), the fact that this is a forward/striker, the early lead by CH in the game, or the team's tactical setup may explain the low value.
Regarding MC's metrics, and apart from Player 1, Players 7, 9 and 10 and the substitutes 12, 13 and 14 had low receiving variability capacity. Interestingly, these were the players that were substituted during the match. This may show that the technical staff were right to be dissatisfied with the players' performance and attempted to change the course of the game.
In practical terms, knowing that a player has a higher rate of variability may indicate that he shows a higher degree of unpredictability, and it therefore is more difficult to predict to where the ball is going to go. The opposing team will face greater challenges when playing against unpredictable players. On the other hand, variability must also be seen within the group interactions. For example, teammates may have difficulties in creating synergies with highly variable players due to their intrinsic higher unpredictability. There is, therefore, a thin balance between what can be predictable and bring stability and reliability to the game, and the variable, unpredictable and creative.
It is important to note that the units of the rate of variability metric are arbitrary and are related exclusively to each node's maximum capacity, making comparisons between nodes (players) difficult. For a more comparable analysis, the index rate of variability levels down all values and places them between 0 (least variability) and 1 (highest variability).
The index of rate of passing and receiving for each player that participated in the game are presented in Figures 3 and 4. The values of each player are presented in Table A2 of the Appendix A.  The low index values presented by the players are in accordance with the rate of variability presented in Figures 1 and 2, as well as in Table A1. Here, a comparative analysis may be made between players, taking into consideration that the values shown are always between 0 and 1, the least and most variability possible, respectively. In this example, both teams keep relatively similar values for the index of passing variability. However, the index of receiving variability shows more dissimilar values, with CH's Player 3 presenting the highest team value. CH's Player 4 was substituted early in the game, showing a low index value. An analysis of the opposing team shows that apart from MC's goalkeeper (Player 1), Players 7 and 9, both on the starting 11 and both substituted around minute 60, showed low indexes of reception variability. This may have happened due to the effective defensive strategy of the opposing team, lowering the possibilities of ways of getting the ball to them, or due to an ineffective capacity to create receiving opportunities in that game.
When analyzing the teams' global rate of variability (Table 1), CH shows higher values than MC both for the passing (0.256 vs. −0.112) and the receiving (0.407 vs. 0.293) actions. It is important to note that the MC's negative rate of total entropy depicts the team's tendency to follow more stable passing patterns (out-entropy) between players than receiving (in-entropy) ones.

Manchester City
Index of rate of passing Index of rate of reception  The low index values presented by the players are in accordance with the rate of variability presented in Figures 1 and 2, as well as in Table A1. Here, a comparative analysis may be made between players, taking into consideration that the values shown are always between 0 and 1, the least and most variability possible, respectively. In this example, both teams keep relatively similar values for the index of passing variability. However, the index of receiving variability shows more dissimilar values, with CH's Player 3 presenting the highest team value. CH's Player 4 was substituted early in the game, showing a low index value. An analysis of the opposing team shows that apart from MC's goalkeeper (Player 1), Players 7 and 9, both on the starting 11 and both substituted around minute 60, showed low indexes of reception variability. This may have happened due to the effective defensive strategy of the opposing team, lowering the possibilities of ways of getting the ball to them, or due to an ineffective capacity to create receiving opportunities in that game.
When analyzing the teams' global rate of variability (Table 1), CH shows higher values than MC both for the passing (0.256 vs. −0.112) and the receiving (0.407 vs. 0.293) actions. It is important to note that the MC's negative rate of total entropy depicts the team's tendency to follow more stable passing patterns (out-entropy) between players than receiving (in-entropy) ones.

Manchester City
Index of rate of passing Index of rate of reception The low index values presented by the players are in accordance with the rate of variability presented in Figures 1 and 2, as well as in Table A1. Here, a comparative analysis may be made between players, taking into consideration that the values shown are always between 0 and 1, the least and most variability possible, respectively. In this example, both teams keep relatively similar values for the index of passing variability. However, the index of receiving variability shows more dissimilar values, with CH's Player 3 presenting the highest team value. CH's Player 4 was substituted early in the game, showing a low index value. An analysis of the opposing team shows that apart from MC's goalkeeper (Player 1), Players 7 and 9, both on the starting 11 and both substituted around minute 60, showed low indexes of reception variability. This may have happened due to the effective defensive strategy of the opposing team, lowering the possibilities of ways of getting the ball to them, or due to an ineffective capacity to create receiving opportunities in that game.
When analyzing the teams' global rate of variability (Table 1), CH shows higher values than MC both for the passing (0.256 vs. −0.112) and the receiving (0.407 vs. 0.293) actions. It is important to note that the MC's negative rate of total entropy depicts the team's tendency to follow more stable passing patterns (out-entropy) between players than receiving (in-entropy) ones. Regarding both teams' index of capacity of passing (0.067 vs. 0.029) and receiving (0.107 vs. 0.077), the differences become clearer. CH showed a greater passing and receiving variability index. This shows that the team was more unpredictable in their actions, and thus more difficult to defend by the opponent, possibly creating better conditions to control the game and creating chances of winning it.

Conclusions
To summarize, using a mixed Bayesian and non-linear approach to network analysis, the proposed mathematical models allow us to better display the rate of variability within the passing sequences in the football game, increasing the tools available for match performance analysis. Higher rates of variability are associated to more unpredictability in the patterns, posing greater challenges to the opposing team. Additionally, the advantage of an index that allow one to analyze the performance on a finite scale provides the analysts with a clearer understanding of the team or player's level of variability. Further studies should focus on applying these metrics to compare, for example, the effect of different tactical formations on the rate and index of variability of each player or the whole team. Additionally, comparing how these models behave against other metrics may be worthwhile, as the more tangible values given may be of greater value to the community. Finally, we hope to extend the uPATO platform to include these new mathematical models, making them available to the community and spreading their use and interpretation. Funding: This research was funded by FCT/MCTES through national funds and, when applicable, cofunded EU funds under the project UIDB/50008/2020. The APC was funded by IPC/ESEC-UNICID.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.