Node and Network Entropy—A Novel Mathematical Model for Pattern Analysis of Team Sports Behavior

: Pattern analysis is a well-established topic in team sports performance analysis, and is usually centered on the analysis of passing sequences. Taking a Bayesian approach to the study of these interactions, this work presents novel entropy mathematical models for Markov chain-based pattern analysis in team sports networks, with Relative Transition Entropy and Network Transition Entropy applied to both passing and reception patterns. To demonstrate their applicability, these mathematical models were used in a case study in football—the 2016 / 2017 Champions League Final, where both teams were analyzed. The results show that the winning team, Real Madrid, presented greater values for both individual and team transition entropies, which indicate that greater levels of unpredictability may bring teams closer to victory. In conclusion, these metrics may provide information to game analysts, allowing them to provide coaches with accurate and timely information about the key players of the game.


Introduction
Social Network Analysis (SNA) has increasingly gained attention in the research community over the past few years, emerging as a method of analysis of intra-team passing networks in team sports [1][2][3][4][5]. In football, passing networks between players (nodes) reflect how the entities are connected and bring light to how each entity within that group interacts with the others [1,2,4,6]. Group and individual metrics are used to describe the team's interactive behaviors using different degree measures to depict how connected the team is, as well as to identify the influential players within the team. By applying degree centrality and betweenness and closeness metrics to the global interaction of players, it is possible to assess the intermediary role of each player in distributing the ball during the game, allowing the identification of players with more relevant roles within the team [1,3,[7][8][9][10].
uPATO is a dedicated software tool for network analysis in team sports [11,12]. In the last few years, the use of network analysis tools has been applied to team sports to understand how collective and individual performance may be optimized. These tools consider the network of passes between players during a game, which defines an adjacency matrix. Then, this matrix can be analyzed based on graph theory, which allows us to know which nodes (i.e., players) are more important and more involved in the game.
Another approach that can be performed to study ball passing networks created during the game is by using a Markov chain. According to this approach, the transition probability P( j i) , represents a directed edge between player j to player i, and all the edges between possible player are given in advance. In this, every pass is independent of the previous sequence of passes [13,14]. Yamamoto [13], for example, has analyzed passing sequences in football using a Markov chain approach to estimate the predictability of certain passing courses and to determine when there is a chance of error in the information system-i.e., when a sequence of passes fails and the team loses possession.
Determining the degree of entropy of the probability matrices associated to Markov chains has not yet been explored in team sports analysis. According to entropy theory, higher levels of entropy show a greater unpredictability in how the system behaves. In our case, this means that the player (or the team) has a passing pattern that is more difficult to predict, and therefore more difficult to defend.
The aim of this paper is to present several novel mathematical models for pattern analysis in team sports, particularly football, to analyze the level of entropy in passing networks using a Markov chain approach. The novelty of this study relies in an innovative approach that proposes to estimate the likelihood of one player passing to, or receiving from, any other in the weighted directed network, as well as an estimation of the level of unpredictability of that player. In practical terms, higher levels of unpredictability will be reflected in a greater difficulty in predicting where to intercept the pass. Conversely, players with more predictable destinations will be more easily defendable and countered.

Novel Mathematical Models for Entropy of Nodes and Weighted Directed Networks
In this section, we present mathematical concepts that are based on information theory and probability theory that can be applied to networks when such networks can be considered weighted digraphs or weighted directed networks. The relation between the nodes of a network presented in this paper is described, in graph theory, by weighted digraphs [12,15,16]. Therefore, we consider that A w D is the weighted adjacency matrix of a weighted digraph with n nodes, G w D . Based on the mathematical concept of conditional probability involving two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the weighted adjacency matrix, A w D , of a weighted digraph with n nodes, G w D , [17](p. 13)- [18][19][20] (p. 156), we can define the concept of the Markov chain transition matrix. Definition 1. The n × n real matrix of order n, M T = m ij ∈ R n×n , is called the transition matrix of the Markov chain associated with the A w D of a G w D with n nodes. Each element of M T is obtained by: where the two random variables X and Y are such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D , i, j = 1, . . . , n.
Considering the mathematical concept of n-step transition probabilities [20] (p. 157) and a weighted digraph with n nodes, we can define the concept of k-step node transition.

Definition 2.
Given a M T = m ij ∈ R n×n , the transition matrix of the Markov chain associated with the weighted adjacency matrix, A w D , of a weighted digraph, G w D , with n nodes, each m (k) ij element of matrix (M T ) K is called the k-step node transition, since they give the probabilities of transitions from x i to x j in k time steps, where (M T ) K is the power k of matrix M T , i, j = 1, . . . , n and k ≥ 1.

Remark 1.
In Definition 2, if k = 1, then m (1) ij = m ij , and so m ij is called node transition.
Based on Theorem 8.3 [20] (p. 156) and a weighted digraph with n nodes, we can define the probability of each node after the k times step in the network. Definition 3. Given a M T = m ij ∈ R n×n , the transition matrix of the Markov chain associated with the A w D of a G w D with n nodes, the probability of all the nodes in the network after k times steps is obtained by: where p(0) = [p i (0)] ∈ R 1×n and p i (0) = p(X = x i ), i = 1, . . . , n.
Considering the mathematical concept of entropy of a random variable, defined on the sample space of a random experiment and taking on a finite number [17] (p. 51) [21] (p. 19), and a weighted digraph with n nodes, we define the concept of node out-entropy. Definition 4. Given a weighted digraph, G w D , with n nodes, E out (n i ) is called the node out-entropy of a node n i , and is determined by: where m ij are the elements of M T associated with the A w D of G w D , and w ij ∈ A w D , i, j = 1, . . . , n.

Remark 2.
In Definition 4, if we replace m ij with m ji , we obtain the concept node in-entropy, E in (n i ).
Based on the mathematical concept of conditional entropy [21] (p. 22) [22] (p. 70) and two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the weighted adjacency matrix of a weighted digraph with n nodes, we can define the concept of node transition out-entropy.
Definition 5. Given a weighted digraph, G w D , with n nodes, the E out T (n i ) is called node transition out-entropy of a node n i , and is determined by: where L w D = n i=1 n j=1 w ij is the total links of G w D , and E out (n i ) is the node out-entropy of node n i , i, j = 1, . . . , n.

Remark 3.
In Definition 5, if we replace w ij with w ji , and m ij with m ji , we obtain the concept of node transition in-entropy, E in T (n i ).
Considering the mathematical concept of relative entropy between two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the weighted adjacency matrix of a weighted digraph with n nodes [21] (p. 248) [23] (p. 25), we can define the concept of the relative out-entropy between two nodes. Proposition 1. Given a weighted digraph, G w D , with n nodes and two distributions, p X (x k ) = {m ik : i, k = 1, . . . , n} and q X (x k ) = m jk : j, k = 1, . . . , n , with two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D , the relative out-entropy between two nodes n i and n j , E out R n i n j , is obtained by: where E out (n i ) is the node out-entropy of a node n i , and m ik and m jk are the elements of M T associated with the A w D of G w D , i, j = 1, . . . , n.
Proof. The relative entropy between two distributions [23] (p. 247) is defined as: Considering p X (x k ) = {m ik : i, k = 1, . . . , n} and q X (x k ) = m jk : j, k = 1, . . . , n , we can write: By expression 3, we can write: Remark 4. In Proposition 1, if we replace ik with ki and jk with k j , we obtain the concept relative in-entropy, E in R n i n j .
Considering the mathematical concept of conditional entropy between two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the weighted adjacency matrix of a weighted digraph with n nodes [21] (p. 22), [22] (p. 70), we can define the concept of network transition out-entropy.

Proposition 2.
Given a weighted digraph, G w D , with n nodes and two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D , E out NT is the network transition out-entropy of a G w D and is determined by: where E T (I) is the node transition entropy of a node I i , i = 1, . . . , n.
Proof. Consider two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D . The conditional entropy [22] (p. 70) is defined as: Remark 5. In Proposition 2, if we replace ij with ji and ji with i j, we obtain the concept network transition in-entropy, E in NT .
Considering the mathematical concept of joint entropy [19] (p. 394) [21] (p. 22) [22] (p. 69) and a weighted digraph with n nodes, we can define the concept of total network entropy. Proposition 3. Given a weighted digraph, G w D , with n nodes and two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D , the total network entropy, E N , is obtained by: where L w D = n i=1 n j=1 w ij is the total links of G w D , i, j = 1, . . . , n.
Proof. Consider two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D . The joint entropy [19] (p. 236) is defined as: Considering the mathematical concept of relative entropy between two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the weighted adjacency matrix of a weighted digraph with n nodes [21] (p. 247) [23] (p. 25), we can define the concept of network relative out-entropy. Proposition 4. Given a weighted digraph, G w D , with n nodes and two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D . The network relative out-entropy, E out NR , is obtained by: where i, j = 1, . . . , n.
Proof. Consider two distributions, p X (x k ) = {m ik : i, k = 1, . . . , n} and q X (x k ) = m jk : k, j = 1, . . . , n , with two random variables X and Y, such that the pair of transmitter and receiver (X, Y) is the joint distribution of the Markov chain associated with the A w D of G w D . By the concept of relative entropy [21] (p. 249), we obtain: Remark 6. In Proposition 4, if we consider E in R n i n j , we obtain the concept network relative in-entropy, E in NT .

A Case Study
The 2015/2016 Champions League final is presented as the case study for the presentation and interpretation of the metrics proposed. The two opposing teams were Real Madrid (RM) and Atletico Madrid (AM). After a 1-1 draw during regular time, RM became champion with a 5-3 win after a penalty shoot-out. For the notational analysis of each pass sequence (transition between nodes), the uPATO platform was used [11,12]. From this, adjacency matrices were computed to calculate the Markov chains, represented in Appendix A as the probability of state out-transition matrices-i.e., the probability of each player (node) passing (transitioning information) to any other player (node) present in the game (network). Note that, before computing the Markov chains, a Laplace Smoothing [18] was performed upon the weighted adjacency matrices. This smoothing technique serves the purpose of avoiding the pitfall of estimating chances that appear impossible. In our case, we use the special case of Laplace Smoothing, where 1 was added to every connection between two distinct nodes, also called add-one smoothing.
The RM starting 11 were the following: player 1: Keylor Navas (1) Tables A1 and A2, presented in the Appendix A section, show RM and AM's state out-transition probability matrices, there are two nodes with probabilities over 0.3. For RM, transitioning between player 2 and 10 had a 0.33 chance of occurring, and there was a 0.34 chance between player 3 and 5. This might mean that when the ball is in 2, the game is vertical through the center, in search for a creative midfielder. Conversely, player 3 searches for a supported, winged play. In AM's game, passing between 2 and 8 had a chance of 0.30, and between 5 and 11 has a chance of 0.37.
The state in-transition probabilities were also calculated (Tables A3 and A4), representing the probability of each player receiving the ball from each other player. Contrarily to the out-transition probabilities, these tables should be read column-wise. Here, the reception probabilities in RM's game only show player 5 receiving the ball from player 3, with a 0.32 probability of occurring. In AM's game, the same happened in only one case. In practical terms, for RM's game, a key interaction between two players (player 3 and 5) may have been identified. Specifically, player 5 predominantly receives the ball from 3, and the latter predominantly passes to 5. In AM's game, it is player 6 that has a greater chance of receiving the ball from 11.
Having this information might be useful for the opposing team-for example, in order to adapt their defensive tactics to reduce the chances of interaction between players. Knowing how team players interact with each other may identify the prime pathways to feed the attacking players and eventually reduce the chances of creating goal opportunities. This type of empirical observations requires, in our opinion, further analysis in order to produce ratios and cut-off values.
Applying a nonlinear-based approach to state transition matrices in football has not been done yet. Theoretically, higher levels of entropy reflect greater variability of node transitions. Node Transition Entropy was calculated for each node of the team to represent the degree of transition variability of each player, based on the probability of it happening. The more positive a value is, the more that node contributes to the overall entropy of the network when compared to the other nodes.
In the examples below (Tables A5 and A6), we selected the cut-off value of |0.90| for the Relative Out-Transition Entropies. RM showed more values above our stipulated limit, showing that there were more players with more chaotic passing probabilities. Regarding the AM relative entropy values, only two players showed more chaotic behavior in their passing probabilities, and this difference is against the goalkeeper (node 1). This shows that AM players were more consistent, and therefore more predictable, in their passing patterns. It is important to note that the defined Relative Out/In-Transition Entropies can both have negative and positive values. Negative values indicate that the first node in the pair is more chaotic-i.e., E out R n i n j < 0 means that n i is more chaotic than n j . The inverse also applies, where if E out R n i n j > 0, n i is less chaotic than n j . In the case of E out R = 0, both nodes have the same chaocicity, meaning that they add the same value to the network as a whole.
Similarly, in-transition entropy reflects the degree of variability in receiving the ball (Tables A7 and A8). Applying the same criteria, we can note that for RM, two players (player 5 and 8) showed chaoticity in the probability of receiving passes from other teammates. In line with what has been previously observed, AM players also had smaller entropy values for this metric, again showing a more predictable behavior. In this case, no AM player presented values above the selected cut-off value.
The final step for this case study is to calculate team metrics for each team, presented in Table 1. Network Relative Out-Entropy (NROE) and Network Relative In-Entropy (NRIE) show the consistency of the network. Here, higher values of entropy reflect a smaller consistency of interactions between players or, in other terms, a more unpredictable passing pattern. In our football example, both values were higher for Real Madrid, which means that the team was more unpredictable both in passing and in receiving the ball from the other players. It has been shown that unpredictable passing patterns may create more goal-scoring opportunities and contribute to ball possession [1,24]. This may be due to the fact that the team has more players involved in each passing sequence, increasing the unpredictability of each play. In this case, success may be due to either the number of passes or the time it possessed the ball, and not to higher entropy values [25]. That was not the case, as RM had fewer passes than AM. We may regard entropy, therefore, as a positive feature in the game. Real Madrid's unpredictability in passing may have increased the chances of goal-scoring opportunities, potentially having contributed to the win. The concept of entropy to model social interactions has already been used [26][27][28]. Newman and Vilenchik (2019) used the concept of relative entropy to model the interactions of players passing the ball in football, having found that when comparing two opposing teams, higher entropy values lead to more chances of creating goal-scoring opportunities [27]. The authors limit their analysis to whether an interaction between players occurred, without defining the type of interaction-passing or receiving the ball. In other words, direction did not matter. In football, however, variability in passing patterns may come from these two actions and they may lead to different conclusions: a team can be more unpredictable in passing than in receiving the ball. Fewer players passing the ball to many players is different from many players passing the ball to a few. Concomitantly, game strategy must take this into account and adapt accordingly.
The division between the Network Transition Out-Entropy (NTOE) and Network Transition In-Entropy (NTIE) separates the teams' out and in interactions. NTOE, therefore shows the entropy of the network in sending the information to other nodes. Conversely, NTIE aims to show the entropy of receiving information from other nodes. For our football example, NTOE and NTIE are the entropy levels of the players' probability of passes and receptions, respectively. NTOE and NTIE were slightly higher for Real Madrid, showing again that this team was more unpredictable in the interactions. Against an opponent, higher levels of entropy may prevent the defending team from learning the patterns the other team. It is worth noticing that the division in the out and in entropies allows, in our opinion, a deeper understanding of the team's dynamics by extending the network analysis to more than the existence (or not) of interactions between nodes, taking into account the direction of information and its probability of occurring.
Total Network Entropy (TNE) reflects the degree of variability and therefore the unpredictability of the team as a whole. Real Madrid was slightly more unpredictable than Atletico, with a TNE of 6.44 vs. 6.22. The unpredictability of passing patterns however, does not happen isolated from other actions; it is, rather, dependent on the interactions between rivals and teammates, space occupation, time, tactics, and game situation [10,29]. This leads to the need to constantly analyze how the team behaves during the different phases of the game, as they may behave differently. For instance, if, when winning, ball possession is kept using the same passing patterns or variability, the unpredictability decreases.

Conclusions
In summary, the proposed metrics present a new method to analyze the passing interactions in a football match. From a Bayesian perspective, knowing the probability of one player passing to other may provide the opposing team the information it needs to anticipate and create more opportunities for stealing ball possession. The innovation of this approach lies in the division between ball passing and reception probability patterns. An entropy-based analysis to in and out-transition matrices attempts to merge the Bayesian and non-linear approaches into one, increasing the tools available for match analysts. With this new approach, football analysts may also start providing information to the coach staff during the game, helping identify the players where passing or receiving is more likely to occur, as well as give the degree of unpredictability of each of the actions. Future studies should provide ratios and cut-of values for the state transition probabilities, helping decide when one passing pattern is predictable or not. Funding: This research was funded by FCT/MCTES through national funds and when applicable co-funded EU funds under the project UIDB/50008/2020. The APC was funded by IPC/ESEC-UNICID.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.