Next Article in Journal
Factors Influencing Students’ Intention to Use E-Textbooks and Their Impact on Academic Achievement in Bilingual Environment: An Empirical Study Jordan
Next Article in Special Issue
Incorporating a Machine Learning Model into a Web-Based Administrative Decision Support Tool for Predicting Workplace Absenteeism
Previous Article in Journal
ICT Governance and Management Macroprocesses of a Brazilian Federal Government Agency
Previous Article in Special Issue
Prediction of Rainfall in Australia Using Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Forward-Looking Approach to Compare Ranking Methods for Sports

by
Peter Juma Ochieng
1,†,
András London
1,2,† and
Miklós Krész
3,4,5,*
1
Institute of Informatics, University of Szeged, 2 Árpád tér, H-6720 Szeged, Hungary
2
Department of Operations Research and Mathematical Economics, Institute of Informatics and Quantitative Economics, Poznań University of Economics and Business, 61-875 Poznań, Poland
3
InnoRenew CoE, Livade 6, 6310 Izola, Slovenia
4
Andrej Marušič Institute, University of Primorska, Muzejski trg 2, 6000 Koper, Slovenia
5
Department of Applied Informatics, University of Szeged, Boldogasszony sgt. 6, H-6725 Szeged, Hungary
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Information 2022, 13(5), 232; https://doi.org/10.3390/info13050232
Submission received: 16 March 2022 / Revised: 22 April 2022 / Accepted: 27 April 2022 / Published: 3 May 2022
(This article belongs to the Special Issue Predictive Analytics and Data Science)

Abstract

:
In this paper, we provide a simple forward-looking approach to compare rating methods with respect to their stability over time. Given a rating vector of entities involved in the comparison and a ranking indicated by the rating, the stability of the methods is measured by the change in rating vector and ranks of the entities over time from a forward-looking perspective. We investigate various linear algebraic rating methods and use the Euclidean distance and Kendall tau rank correlation to measure their stability in rating and ranking, respectively. The investigations are based on both rolling and expanding window approaches. We apply the methodology to sports as a widely known ranking and rating environment. The results suggest that PageRank and Massey rating methods provide better rating and ranking stability than simple methods, such as winning percentage, and more advanced ones, such as Colley’s least square and Keener’s eigenvector-based method. Finally, a simple way to examine the potential predictive power of the rating methods is also provided.

1. Introduction

Rating items is a fundamental task that aims at providing a ranking and making decisions according to it. For instance, in sports, the ranking of players or teams is provided by some scoring system, such as ‘three points for a win and one for the draw’ in soccer, or more complex systems such as Élő in chess or ATP ranking in men’s tennis. For a good book on ratings in sports, see, e.g., [1].
Many different rating methods have been developed, and all of them are based on some assumptions or formal axioms that have to be satisfied by the rating; see, e.g., [2,3,4,5]. In the case of sports, rating methods are also considered as key elements of making single-game outcome predictions; see, e.g., [6].
Although the literature on rating and rating-based predictions in sports is vast, only a few papers can be found that address the problem of evaluating and comparing the stability and robustness of rating methods over a season in a round robin-like system, such as soccer or US major sports. For some related papers, see, e.g., [5,7,8]. Our study, however, is different from the above ones, as outlined in the following paragraph.
The paper [5] focuses on the general properties of sport ranking including Colley, Win-loss, Elo, and Markov methods. The authors evaluate the ranking of these methods in relation to properties such as Opponent Strength, Incentive to Win and Sequence of Matches. In our case, we propose a new comparison method based on a forward-looking approach to evaluate the ranking and rating stability of selected common ranking methods. In [7], the authors empirically evaluate the predictive power of eight sports ranking methods. Although we evaluated similar ranking methods such as PageRank, Winning Pacentage, Rating Percentage Index, and Keener, our comparison approach is different mainly by the stability measures using forward-looking approach. Our investigations may be considered as a meta analysis for predictive power studies: we hypothesize that there is a relation between predictive power and stability/robustness, and we consider our study as an initial step in this direction. Finally, in [8], the authors focus on the analysis of the sensitivity measure of the rating vectors of three linear-algebra-based ranking methods including Colley, Massey, and Markov methods. The authors employ reverse engineering of a simple input ranking vector that they use to build a perfect season to determine the output rating vectors produced by the three methods to measure the sensitivity. This is also a different technique from our approach.
The stability problem has also been addressed in the literature of network science, especially in the case of centrality measures; see, e.g., [9,10,11]. Since many rating methods can be interpreted as network centrality measures, investigating the stability problem for ratings in the sport domain is a convenient next step in this direction.
In this study, we propose a simple forward-looking approach to actively compare rating and ranking methods with respect to their robustness and stability over time. Informally, a rating (or ranking) method is considered to be stable over time if the differences between the rating (or ranking) vectors obtained for the consecutive time periods are steadily small, using proper functions to calculate the difference. Our approach is a forward-looking one in the sense that stability is measured from a future perspective: if a rating ‘at present’ is closer to the rating obtained at some future time point, this indicates stability. This study attempts to evaluate and compare ratings and rankings by dynamically modifying the dataset (used to calculate ratings) using rolling and expanding window simulations, respectively.
The rest of this paper is organized as follows: In Section 2, we formally discuss several commonly used rating and ranking methods that we use in our simulations. In Section 3, we describe the evaluation framework and comparison methods. In Section 4, we discuss the simulations results on some European football league datasets. Finally, we conclude and address some future research directions in Section 5.

2. Rating and Ranking Methods

In this section, we give a short description of the ranking methods we will use. For a more detailed introduction about ranking methods, refer to [12,13].
Let V = ( 1 , , n ) be the set of n teams to be rated, and let R be the number of rounds in a competition among the teams in V. After round r ( r = 1 , , R ), a rating function ϕ r : V R assigns a score to each team which we may call their quantitative ‘strength’. A ranking σ r : V V is an ordering of the teams simply obtained from a rating on V by a proper sorting. For rating the teams, we consider only the final scores of the games played.
We define the n × n matrices W and D as
W i j = # { i won against j } ,
and
D i j = # { draws between i and j } .
The score matrix S R n × n is defined as
S i j = # { points i scored against j } .
To avoid fully zero rows in S, we consider S i j = S j i = 1 / 2 if the outcome of the game is 0:0.
Using W matrix, the elements of the vectors w = W 1 , l = W t 1 , d = D 1 , and t = ( W + W t + D ) 1 are the number of wins, loses, draws, and total number of games played by team i ( i = 1 , n ), respectively, where 1 is the n-element vector with all entries being one. Since each game is either a win, a lose, or a draw, t = w + l + d . We define T = diag ( t i ) , which is the diagonal matrix with entries T i i = t i and T i j = 0 , if i j ( i , j = 1 , n ). Similarly, we may define the vectors s = S 1 , u = S t 1 as the total number of scores by team i against the opponents and by the opponents against team i, respectively.

2.1. Winning Percentage (WP)

The winning percentage of team i after round r is simply defined as ϕ W P r ( i ) = ( w i + κ d i ) / t i , where κ is a parameter between 0 and 1 and can be interpreted as the ‘value’ of a draw. For example, if we take κ = 1 / 3 , it refers to the fact that the value of a draw is one third of the value of a win. The vector of winning percentages after round r can be computed as
ϕ W P r = T 1 ( w + κ d ) .
By considering the score matrix S, a similar quantity can be calculated as ϕ W P ( S ) r = T 1 s .
Observe that this method does not take into consideration the strength of the opponent teams; only the outcome games count.

2.2. Rating Percentage Index (RPI)

The Rating Percentage Index takes into account the WP of the team’s opponents and the WP of their opponents’ opponents [14]. The average winning percentage of team i’s opponents after round r is calculated as
1 t i j ( W i j + W j i + D i j ) ϕ W P r ( j ) ,
where the average is taken over the set of the team’s previous opponents after round r. The vector of the average opponents’ winning percentages is T 1 ( W + W t + D ) ϕ W P r . The winning percentages of the opponents’ opponents can be calculated as T 1 ( W + W t + D ) 2 ϕ W P r . After round r, RPI vector is calculated as the following weighted average:
ϕ R P I r = 1 4 ϕ W P r + 1 2 T 1 ( W + W t + D ) ϕ W P r + 1 4 T 1 ( W + W t + D ) 2 ϕ W P r ,
and similarly, given score matrix S, as
ϕ R P I r = 1 4 ϕ W P ( S ) r + 1 2 T 1 ( S + S t ) ϕ W P ( S ) r + 1 4 T 1 ( S + S t ) 2 ϕ W P ( S ) r .

2.3. Massey’s Least Squares Method (M)

The only statistics used in Massey’s least squares method [15] are the number of wins and losses for each team. The rating ϕ M r of the teams after round r is obtained by the solution of the linear system
M ϕ M r = w l ,
where M = T W W t D contains the total number of games played by the teams in the diagonal, while M i j is 1 times the number of games played between teams i and j, i j . The method naturally incorporates draws, since a draw between two teams increases M i j and M j i by one, while the right-hand side w l remains unchanged. Since r a n k ( M ) < n , the linear system does not have a unique solution. To handle this problem, one possible solution is to replace any row in M with 1 and the corresponding entry of w l with zero.

2.4. Colley’s Least Squares Method (C)

The Colley method is also a modification of the least squares method utilizing an observation called Laplace’s rule of succession (see [16], p. 148), which states that if one observed k successes out of m attempts, then ( k + 1 ) / ( m + 1 ) is a better estimate for the next event to be a success than k / m . The rating vector ϕ C r of the teams is the solution of the linear system
C ϕ C r = b ,
where C = M + 2 I (here, I is the identity matrix) and b = 1 + 1 / 2 ( w l ) . It can be proved that the linear system has a unique solution.

2.5. Keener Method (K)

Keener’s method [17] is a so-called spectral rating method which uses the Perron–Frobenius eigenvector for the rating, and (after round r) it is given by the solution of the eigenvalue equation
T 1 ( W + κ D ) ϕ K r = λ ϕ K r ,
where λ is the dominant eigenvalue of the matrix T 1 ( W + κ D ) , and it exists for a matrix with non-negative entries such that any other eigenvalue is smaller in absolute value. The corresponding eigenvector, called the Perron–Frobenius eigenvector, has non-negative entries and provides the rating of the teams. Originally, the method was defined for the case in which we consider the score matrix S. The Keener matrix, also based on the Laplace’s rule of succession, is defined as
K i j = h S i j + 1 S i j + S j i + 1 ,
where h is a skewing function helping to reduce the difference between the upper and lower ends of the rating. We use the original function defined by Keener, namely,
h ( x ) = 1 2 + 1 2 sgn ( x 1 2 ) | 2 x 1 |
The Keener rating vector ϕ K ( S ) r of the teams is given by the solution of the equation
T 1 K ϕ K ( S ) r = λ ϕ K ( S ) r .

2.6. PageRank Method (PR)

The PageRank method [18] was originally designed to rank web pages based on their position in the WWW network. The idea behind it came from the basic properties of Markov chains (see, e.g., [12], Chapter 4). In the context of sports, the rating of the teams is calculated in an iterative way using the recursion formula
P R ( i ) = λ n + ( 1 λ ) j N + ( i ) P R ( j ) w j ,
where N + ( i ) is the set of teams defeated by team i at least once, w j is the total number of wins of team j, and λ [ 0 , 1 ] is a parameter (usually 0.1 or 0.2 ) to guarantee convergence.
To see the relationship between the PageRank formula and the theory of Markov chains, we may write the above equation in a vector equation form as
PR = λ N [ I ( 1 λ ) S D 1 ] 1 1 ,
where PR PageRank vector contains the PageRank values of each team, D is the diagonal matrix D = diag [ ( D i i = = 1 n S i ) i = 1 n ) ] , while I is the n × n identity matrix. Assuming that 1 PR = 1 implies that PR = M PR , with M = λ / n 1 1 T ( 1 λ ) S D 1 . This shows that PR is the eigenvector of matrix M for eigenvalue one, which is the largest eigenvalue of M as a consequence of the Perron–Frobenius theorem for row-stochastic matrices. The rating vector ϕ P R ( S ) r of the teams after round r can be calculated using, for instance, the power iteration method.

2.7. Graph Representation of the Methods

We shall emphasize that all the above-defined methods have a graph theoretical interpretation. Using the game results data set, one can define a directed multigraph, where nodes represent players/teams, while edges between them represent outcomes of games they played. The edges are directed and each of them is going from the loser team to the winning team. If ties are also considered, they can be represented by two directed links with opposite directions and half or some fractional weight. In this case, matrix W is the adjacency matrix of the directed multigraph, and w and l contain the in- and out-degrees of nodes, respectively. From a network science perspective, Massey’s M matrix is the graph Laplacian if the result matrix is treated as the matrix of a symmetric undirected graph. The Massey rating vector ϕ M is then equivalent to the potential vector over a resistor network defined by W with supply vector w l [19]. The PageRank method is a simple modification of the classic PageRank algorithm, performed on the results graph.

3. Evaluation and Comparison of Rating Methods

In this section, we present the applied simulation approaches and the definitions of the stability of ratings and rankings as well as the rating error. To deal with the dynamic nature of sport competitions, we perform rolling window (RW) and expanding window (EW) simulations, described as follows.

3.1. Rolling Window Approach

Let W t (or S t ) be the results matrix generated just after t games (here t = 50 , 60 , 70 , ). Let ϕ R W t be the rating vector after t games played. We generate the results matrix W R W Δ t , t + Δ t with the fixed number of games (window length) Δ t and calculate the rating ϕ R W Δ t , t + Δ t for the new matrix using the same rating method. For example, if Δ t = 10 , then games from 1 to 50, 11 to 60, etc., are considered to create the results matrix and ratings.

3.2. Expanding Window Approach

In the expanding window case, let W E W ( T , Δ t ) (or S E W ( T , Δ t ) ) be the result matrix generated by an incremental number of games starting from the first T games with expansion factor Δ t . For instance, if starting from T = 50 with expansion factor Δ t = 10 , then W E W ( T , Δ t = 10 ) is the result matrix generated considering the first 50, 60, 70, etc., games from the beginning of the competition. The team rating after game t is given by ϕ E W t .

3.3. Rating Stability

To measure the stability of the considered methods, we compute the Euclidean distance between consecutive rating vectors obtained by either the rolling or the expanding window approach with specified Δ t values [20]. Formally, we calculate
d R W 2 ( t ) = | | ϕ R W ( k Δ t , t + Δ t ) ϕ R W t ) | | 2 2 ,
where | | · | | 2 denotes the Euclidean norm. If we average d R W 2 ( t ) for all t = 50 , 60 , with k = 1 , 2 , we obtain a single value representing the average stability of the rating method over the whole competition or up to a given round. The stability in the case of the expanding window approach is measured similarly.

3.4. Ranking Stability

To measure the stability of rankings generated by the applied rating methods, we measure rank correlations using the Kendall tau method [21]. Given two consequtive rankings, σ R W t = σ 1 and σ R W ( t + Δ t ) = σ 2 , the Kendall tau distance is defined as
τ R W t = # teams ( i , j ) : σ i 1 > σ j 1 and σ i 2 < σ j 2 or σ i 1 < σ j 1 and σ i 2 > σ j 2 1 2 n n 1
where σ i 1 and σ i 2 is the rank of team i in ranking σ R W t and σ R W ( k Δ t , t + Δ t ) ( t = 50 , 60 , with k = 1 , 2 , ), respectively.
We can average τ R W t for all t = 50 , 60 , : we obtain a single value representing the mean stability of the ranking method over the whole competition or up to a given round. The stability in the case of the expanding window approach can be measured similarly.

3.5. Rating Error

We also estimate the potential predictive power of the rating methods in a simple way. Each dataset is divided into two subsets: a training set and a test set. For the training set, a rating ϕ t is calculated for t games ( t = 50 fixed in the case of rolling window approach, while t = 50 , 60 , in the case of the expanding window approach). The test set consists of the next Δ t games ( Δ t = 10 in our simulations). We define the prediction error E ϕ t of a rating method ϕ as the proportion of games in the test set, such that the lower-rated team beat the higher-rated one, i.e.,
E ϕ t = # team i beats j in test set and ϕ i < ϕ j # games in test set + # team i and j tie in dataset t and ϕ i ϕ j # games in test set
The total error is calculated as the average of the errors obtained for each train and test set sample.

4. Results

We performed our experiments using English Premier League Datasets (source: https://www.kaggle.com/datasets/saife245/english-premier-league (accessed on 15 March 2022)). The datasets contain the date of the game, the name of the teams, the home and away scores, and the total points of the teams during the competition. To generate the results matrices (graphs), we used W matrix in the case of PageRank, Massey, Colley, WP, and RPI methods. In the case of the Keener Method, we considered the Score matrix S. We performed rolling window (RW) and expanding window (EW) simulations to analyze the ranking and rating stability using the Kendall tau and Euclidean methods, respectively. The results are presented via tables and plots in this section.

4.1. Comparison of Top-5 Teams Ranking by Rolling Window Approach

First, we compared the rankings and ratings of the top-5 teams using our rolling window and expanding window approaches. Here, we considered standard deviation in rating the top-5 teams at different window times (games). Table 1 summarizes the rolling window results. In all the investigated windows (10–60, 20–70, and 30–80 games), Man. City was rated and ranked the best team among the top-5 teams by PageRank (sd ± 0.0522; sd ± 0.0116; sd ± 0.0125), Massey (sd ± 0.0333; sd ± 0.0409; sd ± 0.0418), and Keener (sd ± 0.0328; sd ± 0.0434; sd ± 0.0482), while the Massey and Keener methods ranked and rated Man. United as the second best team among the top five. On the other hand, Man. City and Man. United were rated and ranked as the first and second teams, respectively, by the WP (sd ± 0.2418; sd ± 0.2256; sd ± 0.2097) method in all the windows. In general, using our rolling window approach, we can observe that PageRank, Massey, and Keener perform relatively better compared to other investigated ranking methods (see Table A1 in Appendix A). These three ranking methods recorded a relatively small standard deviation. Small standard deviation at different windows implies small variation in team rating, hence rank–rate stability and vice versa.

4.2. Comparison of Top-5 Teams Ranking by Expanding Window Approach

Next, we compared the rank–rate of the top-5 teams using expanding window approach. Table 2 shows the summary of results. According to the analysis after 60 and 70 games, Man. City and Man. United were rated and ranked the best teams among the top-5 teams by PageRank (sd ± 0.0165; sd ± 0.0174; sd ± 0.0176), Massey (sd ± 0.0242; sd ± 0.0226; sd ± 0.0210), and WP (sd ± 0.0971; sd ± 0.0865; sd ± 0.0737). Both WP and PageRank rated and ranked Man. City as the best team among top-5 teams in all windows, while Arsenal was rated and ranked the best team in all windows by Colley (sd ± 0.0773; sd ± 0.0754; sd ± 0.0775). In general, using our expanding window approach, we can observe that PageRank and Massey perform relatively better compared to other investigated ranking methods (see Table A2 in Appendix A), although the PageRank and Massey methods recorded a relatively small standard deviation. PageRank was more stable in ranking compared to the other investigated methods. As mention in Section 4.1, a small standard deviation for team ratings implies small variation in team ranking, hence rank–rate stability.

4.3. Rating Stability

We evaluate the rating stability based on the Euclidean distance measure described in Section 3.3. In this analysis, we compute Euclidean distance between two consecutive rating vectors obtained by rolling and expanding window approaches, respectively, to measure their similarity or deviation. In this scenario, the mean stability of the rating methods is based on average Euclidean distance d R W 2 t and d E W 2 t ( t = 50 , 70 , ). In this case, we measure the mean distance between team rating vectors. The lower the d 2 t value, the more stable the rating method.

4.3.1. Evaluation by Rolling Window Approach

We measure the distance d R W 2 t between two consecutive rating vectors. According to the results in Figure 1, for rolling window simulation, the distance values d R W 2 t tend to change over time (i.e., on each window/game). Generally, PageRank and Massey recorded low average values of d R W 2 = 0.025 and d R W 2 = 0.029 , respectively. On the other hand, Colley, Keener, WP, and RPI recorded higher distance values with an average of d R W 2 0.035 . This implies those methods have lower rating stability due to high deviation (i.e., low similarity) in rating vectors.

4.3.2. Evaluation by Expanding Window Approach

We also compared and evaluated the rating stability of the investigated methods using the expanding window approach. Similarly, we measure distance d E W 2 t between two consecutive rating vectors at incremental window size (i.e., after 50, 60, 70, ) as described in Section 3.3. The results in Figure 2 suggest that the distance values d E W 2 t for expanding window simulation increase over time (i.e., on each window/game). Again, PageRank and Massey recorded low average distance values ranging between d E W 2 = 0.025 and d E W 2 = 0.03 . Again, low d E W 2 value implies low deviation (i.e., high similarity) in rating vectors and hence a high rating stability. Similarly, Colley, Keener, WP, and RPI recorded slightly higher average distance values ranging between d E W 2 = 0.035 and d E W 2 = 0.040 . This indicates that those methods have lower rating stability due to high deviation (i.e., low similarity) in rating vectors.

4.4. Ranking Stability

As mentioned in Section 3.4, we compared ranking stability for the investigated methods using rolling window and expanding window based on the Kendall tau method. Here, we consider rank correlation coefficient τ taking values between 1 and +1, which characterizes the degree of ranking stability (i.e., agreement between two rank lists). Statistically, τ measures the similarity (concordant and discordant) of two rank lists. The values of τ = + 1 indicate the highest possible ranking stability, i.e., the two rank lists are exactly the same, while τ = 1 indicates low ranking stability, i.e., the two team rank lists are exactly the opposite, and τ r = 0.00 implies that one rank list is a random reordering of the other.

4.4.1. Evaluation by Rolling Window Approach

According the result in Figure 3, PageRank and Massey recorded the highest rank correlation, τ R W 0.60 and τ R W 0.80 , respectively. On the other hand, both Colley and Keener recorded a rank correlation of τ R W 0.60 . However, WP and RPI recorded a low rank correlation, i.e., τ R W 0.60 . In general, PageRank, Colley, and Massey have relatively stable ranking performance compared with Keener, WP, and RPI, which tend to be unstable over time (at different windows/number of games). τ R W 0 implies that all six investigated ranking methods show ranking stability using our rolling window approach.

4.4.2. Evaluation by Expanding Window Approach

We further compared the ranking stability of all the investigated ranking methods using the expanding window approach. According to the result in Figure 4, PageRank, Colley, Massey, and Keener methods recorded a higher rank correlation value of τ E W 0.60 with PageRank recording highest values of τ E W 0.70 . WP and RPI recorded a relatively low rank correlation value of τ E W 0.60 . Overall, the analysis indicates that as we increase/expand the window size (i.e., number of games), the rating stability tends to increase over time.

4.5. Rating Error

As described in Section 3.5, we evaluated the predictive power of the rating methods using a very simple and intuitive approach. For the training set, we considered a fixed number of games (50) or incremental number of games (50, 70, ) with respect to rolling window and expanding window simulations. For the test set, a fixed number of games (10 in our case), played right after the games considered in the training set, was used.
According to the evaluation results, the rating errors are shown in Table 3. It was evidenced that PageRank and Massey had a low average rating error, that is, E ϕ & 0.2568 and E ϕ & 0.2819 , respectively. This leads to the hypothesis that both PageRank and Massey rankings had higher predictive power than the others. A more detailed comparison of rating error can be seen from Figure A1 and Figure A2 in Appendix B.

5. Discussions

To gain a deeper insight into how some widely used rating systems work, we compared the rating and ranking performance of six rating methods. We applied a forward-looking approach to compare and evaluate their ranking and rating stability. In our experimental investigations, we considered the 2014 English Primer League dataset for simulations (similar to NFL data used in a related study [12], or US major sports data used in [7]). Our approach provides an efficient tool to compare and evaluate the stability of ranking or rating of teams obtained by different methods.
We used a distance-based approach to compare the rating stability utilizing the Euclidean distance measure. It takes into consideration the difference of the consecutive rating vectors. Rating methods with small deviation measures tend to have higher rating stability [22,23]. According to the results in Figure 1 and Figure 2, PageRank generally recorded low deviation measures in both rolling window and expanding window simulation.
The results of the evaluation of ranking stability by rolling window and expanding window are presented in Section 4. Among the six methods we examined (PageRank, Colley, Massey, Keener, WP, and RPI), we observed the difference of ranking results at different time windows and window sizes using Kendall tau rank correlation. Some rating methods, such as WP and RPI rank are similar compared to the others. If we consider the round-robin tournament, the rank correlation coefficient changes irregularly over time at different window sizes.
We also conducted a comparison of rank–rate performance providing some new insights into the functionality of rating systems (see Table A1 and Table A2, Appendix A).
When we considered an increasing time window (by a constant factor), we observed that the Kendall tau rank correlation stabilized over time. This implies that the overall ranking is becomes generally more stable when approaching the end of the competition.
According to the prediction error results in Table 3, for the rolling window simulation, PageRank and Massey methods recorded a low mean prediction error of 0.257 and 0.282, respectively. On the other hand, WP (0.472) and RPI (0.481) recorded higher prediction errors. Further evaluation of the prediction error based on the expanding window approach shows a similar trend. However, PageRank and Massey recorded slightly higher prediction errors in this case, being 0.283 and 0.324, respectively. In contrast, Colley, Keener, WP, and RPI recorded slightly low prediction errors compared to the rolling window case. Colley, Keener, WP, and RPI tended to predict better using the expanding window approach (see Appendix B).
We have also seen that prediction error depends on the rating and ranking stability of the methods. Stable rating methods tend to record low prediction errors compared to less stable methods, in agreement with the findings in [24]. Generally, the findings of this study, in agreement with the related literature, suggest that PageRank is a more stable and robust rating method in the sport domain compared to the other five methods. PageRank, which was developed originally in the search engines domain [18], has been applied in various other domains as well as in sports. Just to mention some related studies, a time-dependent PageRank was also used for ranking sports tournaments [25,26]. PageRank was also applied on randomized sports data to rank teams and individual players in sports [27]. Our findings, in general, coincide with the previous ones showing the distinguished capability and performance of PageRank in rating and ranking compared to most of the other approaches.

6. Conclusions

This study presents a forward-looking approach to compare and evaluate six basic rating methods with two different simulation scenarios, namely a rolling window and an expanding window approach, respectively. Rank–rate comparison indicates that the PageRank and Massey methods are consistent and robust in rating and ranking teams in both rolling and expanding forward-looking approaches. Evaluation of ranking stability by using Kendall tau correlation coefficients shows that PageRank has a high rank correlation coefficient. This indicates its stability in ranking over time. Similarly, evaluation of rating stability by the Euclidean distance measure indicates both the PageRank and Massey methods have only a small change in distance measure in both simulation setups, hence showing a high rating stability in general. Evaluation of rating error suggests that PageRank has high predictive power in both rolling and expanding window simulations. In general, the PageRank and Massey methods performed well in both rolling and expanding window tests. Nevertheless, further comparisons may be needed to test their rating stability as well as their robustness in other applications.

Author Contributions

Conceptualization, P.J.O., A.L. and M.K.; methodology, A.L.; software, P.J.O.; validation, P.J.O.; formal analysis, P.J.O.; investigation, A.L. and M.K.; resources, M.K.; data curation, P.J.O.; writing—original draft preparation, P.J.O. and A.L.; writing—review and editing, M.K.; visualization, P.J.O.; supervision, M.K.; project administration, M.K.; funding acquisition, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

M.K. gratefully acknowledges the European Commission for funding the InnoRenew CoE project (Grant Agreement no. 739574) under the Horizon2020 Widespread-Teaming program and the Republic of Slovenia (Investment funding of the Republic of Slovenia and the European Union of the European Regional Development Fund). He is also grateful for the support of the Slovenian Research Agency (ARRS) through grant N2-0171. András London was supported by National Research, Development and Innovation Office—NKFIH Fund No. SNN-135643.

Data Availability Statement

The code and data used for the experimental simulation and the data supporting the reported results can be found at https://github.com/peter26jumaochieng (accessed on 15 March 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Rank–Rate Comparison of Top-5 Teams by Rolling Window Approach

Table A1 shows the extended results of comparison of rank–rate and the standard deviation of the top-5 teams by rolling window between 10–60, 20–70,30–80, 40–90, 50–100, and 60–110 games.

Appendix A.2. Rank–Rate Comparison of Top-5 Teams by Expanding Window Approach

Table A2 shows the extended results of comparison of the rank–rate and the standard deviation of the top-5 teams by expanding window after 50, 60, 70, 80, 90, and 100 games.
Table A1. Detailed comparison of top-5 teams rate–rank by rolling window approach.
Table A1. Detailed comparison of top-5 teams rate–rank by rolling window approach.
10–60 Games 20–70 Games 30–80 Games 40–90 Games 50–100 Games 60-110 Games
TeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatings
PageRank‘Man. City’10.0696‘Man. City’10.0894‘Man. City’10.0895‘Man. City’10.0894‘Man. City’10.0944‘Man. City’10.0002
‘Man. United’20.0067‘Arsenal’20.0815‘Arsenal’20.083‘Arsenal’20.0805‘Arsenal’20.0820‘Man. United’20.0000
‘Arsenal’30.1564‘Man. United’30.0671‘Man. United’30.068‘Man. United’30.0683‘Man. United’30.0716‘Arsenal’30.0004
‘Tottenham’40.0544‘Newcastle’40.0612‘Newcastle’40.0655‘Newcastle’40.0644‘Newcastle’40.0661‘Tottenham’40.0002
‘Chelsea ’50.1344‘Tottenham’50.0635‘Tottenham’50.0639‘Tottenham’50.0625‘Tottenham’50.0651‘Chelsea’50.0004
sd ± 0.0522 sd ± 0.0116 sd ± 0.0125 sd ± 0.0122 sd ± 0.0251 sd ± 0.0185
Colley‘Man. City’10.1923‘Man. United’1−0.042‘Arsenal’1−0.0368‘Tottenham’1−0.0198‘Tottenham’1−0.0187‘Man. City’10.1847
‘Man. United’2−0.0138‘Man. City’2−0.0579‘Man. City’20.057‘Man. City’20.0489‘Man. City’20.0472‘Man. United’2-0.0291
‘Arsenal’3−0.1511‘Newcastle’30.1823‘Tottenham’30.2564‘Man. United’30.2119‘Man. United’30.2121‘Arsenal’3−0.1408
‘Tottenham’4−0.0149‘Arsenal’40.0206‘Man. United’4−0.1074‘Arsenal’4−0.0829‘Arsenal’4−0.0811‘Tottenham’4−0.0114
‘Newcastle’50.061‘Tottenham’5−0.0821‘Newcastle’50.1348‘Newcastle’50.1111‘Newcastle’50.1131‘Newcastle’5−0.0030
sd ± 0.1104 sd ± 0.1267 sd ± 0.1333 sd ± 0.1303 sd ± 0.1135 sd± 0.0797
Massey‘Man. City’1−0.0696‘Man. City’1−0.1379‘Man. City’1−0.1053‘Man. City’1−0.0961‘Man. City’1−0.0002‘Man. City’1−0.0961
‘Man. United’2−0.0067‘Man. United’2−0.0599‘Man. United’2−0.2243‘Tottenham’20.0494‘Man. United’20.0000‘Tottenham’20.0494
‘Arsenal’3−0.1564‘Arsenal’30.7011‘Arsenal’30.8573‘Man. United’30.4733‘Tottenham’4−0.0002‘Arsenal’3−0.0004
‘Tottenham’4−0.0544‘Tottenham’4−0.3316‘Tottenham’4−0.314‘Arsenal’4−0.2376‘Arsenal’50.4678‘Man. United’30.4733
‘Newcastle’50.1344‘Newcastle’50.4368‘Chelsea’50.4633‘Chelsea’50.4678‘Newcastle’50.0004‘Fulham’4−0.2376
sd ± 0.0333 sd ± 0.0409 sd ± 0.0418 sd ± 0.0427 sd ± 0.0412 sd ± 0.05328
Keener‘Man. City’10.1843‘Man. City’10.1612‘Man. City’10.1617‘Man. City’10.1589‘Man. City’10.1589‘Man. City’10.2211
‘Man. United’20.2164‘Man. United’20.2157‘Man. United’20.2145‘Man. United’20.2164‘Man. United’20.2164‘Man. United’20.2231
‘Arsenal’30.2415‘Tottenham’30.2238‘Tottenham ’30.2258‘Tottenham’30.2237‘Everton’30.2237‘Arsenal’30.2248
‘Tottenham’40.2348‘Arsenal’40.2496‘Arsenal’40.2465‘Arsenal’40.2347‘Arsenal’40.2347‘Tottenham’40.2244
‘Newcastle’50.2065‘Newcastle’50.1889‘Newcastle’50.1759‘Newcastle’50.1768‘Newcastle’50.1768‘Newcastle’50.2222
sd ± 0.0328 sd ± 0.0434 sd ± 0.0482 sd ± 0.0428 sd ± 0.0306 sd ± 0.0515
WP‘Man. United’10.0513‘Man. United’10.0511‘Man. United’10.0509‘Man. United’10.0508‘Man. United’10.0506‘Man. City’10.0487
‘Man. City’20.0484‘Man. City’20.0485‘Man. City’20.0484‘Man. City’20.0486‘Man. City’20.0485‘Man. United’20.0505
‘Arsenal’30.0514‘Arsenal’30.0518‘Arsenal’30.0513‘Arsenal’30.0520‘Arsenal’30.0519‘Arsenal’30.0518
‘Tottenham’40.0509‘Tottenham’40.0502‘Tottenham’40.0508‘Tottenham ’40.0512‘Tottenham’40.0511‘Tottenham’40.0512
‘Newcastle’50.0531‘Newcastle’50.0519‘Newcastle’50.0515‘Swansea City’50.0504‘Swansea City’50.0505‘Chelsea’50.0488
sd ± 0.2418 sd ± 0.2256 sd ± 0.02097 sd ± 0.1941 sd ± 0.2024 sd ± 0.2668
RPI‘Man. City’10.0489‘Man. City’10.0484‘Man. City’10.0483‘Man. City’10.0485‘Man. United’10.0508‘Tottenham’10.0515
‘Man. United’20.0490‘Man. United’20.0510‘Man. United’20.0514‘Tottenham’20.0515‘Man. City’20.0486‘Man. City’20.0486
‘Arsenal’30.0526‘Arsenal’30.0510‘Arsenal’30.0521‘Man. United’30.0510‘Newcastle’30.0523‘Man. United’30.0508
‘Tottenham’40.0482‘Tottenham’40.0514‘Tottenham’40.0508‘Arsenal’40.0512‘Arsenal’40.0523‘Arsenal’40.0523
‘Newcastle’50.0523‘Newcastle’50.0533‘Chelsea’50.0491‘Chelsea’50.0492‘Tottenham’50.0515‘Newcastle’50.0523
sd ± 0.2122 sd ± 0.2819 sd ± 0.3418 sd ± 0.3435 sd ± 0.1752 sd ± 0.1769
Table A2. Detailed comparison of top-5 teams rate–rank by expanding window approach.
Table A2. Detailed comparison of top-5 teams rate–rank by expanding window approach.
50 Games Afer 60 Games After 70 Games After 80 Games After 90 Games After 100
TeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatingsTeamsRanksRatings
PageRank‘Man. City’10.1217‘Man. City’10.1226‘Man. City’10.1237‘Man. City’10.1249‘Man. City’10.1263‘Man. City’10.1566
‘Chelsea’20.0889‘Chelsea’20.0898‘Chelsea’20.0907‘Chelsea’20.0919‘Chelsea’20.0932‘Chelsea’20.1211
‘Man. United’30.0757‘Man. United’30.0758‘Man. United’30.0759‘Man. United’30.0759‘Man. United’30.0760‘Man. United’30.0781
‘Newcastle’40.0670‘Arsenal’40.0668‘Arsenal’40.0666‘Arsenal’40.0663‘Arsenal’40.0659‘Arsenal’40.0706
‘Tottenham’50.0640‘Tottenham’50.0641‘Tottenham’50.0643‘Tottenham’50.0646‘Tottenham’50.0648‘Tottenham’50.0669
sd ± 0.0165 sd ± 0.0174 sd ± 0.0176 sd ± 0.0179 sd ± 0.0213 sd ± 0.0244
Colley‘Arsenal’10.1923‘Arsenal’10.1815‘Arsenal’10.1846‘Arsenal’10.1848‘Arsenal’10.1847‘Arsenal’10.1847
‘Man. City’2−0.0138‘Man. City’2−0.0205‘Man. City’2−0.0290‘Man. City’2−0.0291‘Man. City’2−0.0292‘Man. City’2−0.0291
‘Man. United’3−0.1511‘Man. United’3−0.1462‘Man. United’3−0.1436‘Man. United’3−0.1408‘Man. United’3−0.1437‘Man. United’3−0.1408
‘Tottenham’4−0.0149‘Tottenham’4−0.0085‘Tottenham’4−0.0113‘Tottenham’4−0.0113‘Tottenham’4−0.0114‘Tottenham’4−0.0114
‘Newcastle’50.0610‘Chelsea’50.0030‘Chelsea’50.0001‘Chelsea’5−0.0029‘Chelsea’5−0.0030‘Newcastle’5−0.0030
sd ± 0.0773 sd ± 0.0754 sd ± 0.0775 sd ± 0.0747 sd ± 0.0783 sd ± 0.0778
Massey‘Man. City’1−0.0696‘Man. City’1−0.0569‘Man. United’10.1560‘Man. City’1−0.0380‘Man. City’1−0.0310‘Man. City’1−0.0002
‘Arsenal’2−0.0067‘Man. United’2−0.0055‘Man. City’2−0.0509‘Man. United’2−0.0037‘Man. United’2−0.0030‘Man. United’20.0000
‘Man. United’3−0.1564‘Arsenal’3−0.1278‘Arsenal’3−0.1044‘Arsenal’3−0.0853‘Arsenal’3−0.0697‘Arsenal’3−0.0004
‘Tottenham’4−0.0544‘Tottenham’4−0.0444‘Newcastle’40.0121‘Tottenham’4−0.0297‘Tottenham’4−0.0242‘Tottenham’4−0.0002
‘Newcastle’50.1344‘Newcastle’50.1098‘Chelsea’50.0464‘Newcastle’50.0733‘Newcastle’50.0599‘Newcastle’50.0004
sd ± 0.0242 sd ± 0.0226 sd ± 0.0210 sd ± 0.0518 sd ± 0.0337 sd ± 0.0296
Keener‘Man. United’10.1843‘Man. United’10.1870‘Man. United’10.1897‘Man. United’10.1923‘Man. United’10.1948‘Man. United’10.2211
‘Man. City’20.2164‘Man. City’20.2169‘Man. City’20.2173‘Man. City’20.2178‘Man. City’20.2183‘Man. City’20.2231
‘Arsenal’30.2415‘Arsenal’30.2403‘Arsenal’30.2391‘Arsenal’30.2380‘Arsenal’30.2369‘Arsenal’30.2248
‘Tottenham’40.2348‘Tottenham’40.2341‘Tottenham’40.2333‘Tottenham’40.2326‘Tottenham’40.2319‘Tottenham’40.2244
‘Newcastle’50.2065‘Newcastle’50.2073‘Newcastle’50.2081‘Swansea City’50.2090‘Swansea City’50.2098‘Newcastle’50.2222
sd ± 0.0326 sd ± 0.0408 sd ± 0.0418 sd ± 0.0394 sd ± 0.0314 sd ± 0.0416
WP‘Man. City’10.0555‘Man. City’10.0553‘Man. City’10.0550‘Man. City’10.0486‘Man. City’10.0487‘Arsenal’10.0511
‘Man. United’20.0490‘Man. United’20.0490‘Man. United’20.0490‘Man. United’20.0511‘Man. United’20.0510‘Man. City’20.0488
‘Arsenal’30.0483‘Tottenham’30.0497‘Tottenham’30.0497‘Tottenham’30.0508‘Everton’30.0504‘Man. United’30.0509
‘Tottenham’40.0497‘Arsenal’40.0483‘Arsenal’40.0484‘Arsenal’40.0512‘Arsenal’40.0511‘Tottenham’40.0507
‘Newcastle ’50.0502‘Newcastle’50.0501‘Newcastle’50.0501‘Newcastle’50.0526‘Newcastle’50.0525‘Chelsea’50.0489
sd ± 0.0971 sd ± 0.0865 sd ± 0.0737 sd ± 0.0626 sd ± 0.0885 sd ± 0.0571
RPI‘Man. City’10.0573‘Man. United’10.0497‘Man. United’10.0497‘Arsenal’10.0491‘Arsenal’10.0491‘Arsenal’10.0508
‘Man. United’20.0497‘Man. City’20.0570‘Man. City’20.0567‘Man. City’20.0563‘Man. City’20.0559‘Man. City’20.0487
‘Arsenal’30.0490‘Arsenal’30.0490‘Arsenal’30.0490‘Man. United’30.0497‘Man. United’30.0497‘Man. United’30.0508
‘Tottenham’40.0494‘Newcastle’40.0499‘Tottenham’40.0495‘Tottenham’40.0495‘Tottenham’40.0496‘Tottenham’40.0511
‘Newcastle’50.0499‘Chelsea’50.0479‘Swansea City’50.0503‘Chelsea’50.0481‘Chelsea’50.0482‘Chelsea’50.0487
sd ± 0.0506 sd ± 0.0831 sd ± 0.0843 sd ± 0.0662 sd ± 0.0706 sd ± 0.0805

Appendix B

Below is a supplementary detailed illustration of rating errors for different tests. Samples were obtained from rolling and expanding window approaches.
Figure A1. Rating error at different window times for PageRank, Colley, Massey, Keener, WP, and RPI methods by rolling window approach. E ϕ t measures the spread of the team rating. A lower E ϕ t indicates high prediction power and better rating performance, while larger E ϕ t indicates low prediction power and hence low rating performance.
Figure A1. Rating error at different window times for PageRank, Colley, Massey, Keener, WP, and RPI methods by rolling window approach. E ϕ t measures the spread of the team rating. A lower E ϕ t indicates high prediction power and better rating performance, while larger E ϕ t indicates low prediction power and hence low rating performance.
Information 13 00232 g0a1
Figure A2. Rating error at different window times for PageRank, Colley, Massey, Keener, WP, and RPI methods by the expanding window approach. A lower E ϕ t indicates high prediction power and better rating performance, while larger E ϕ t indicates low prediction power and hence low rating performance.
Figure A2. Rating error at different window times for PageRank, Colley, Massey, Keener, WP, and RPI methods by the expanding window approach. A lower E ϕ t indicates high prediction power and better rating performance, while larger E ϕ t indicates low prediction power and hence low rating performance.
Information 13 00232 g0a2

References

  1. Langville, A.N.; Meyer, C.D. Google’s PageRank and Beyond: The Science of Search Engine Rankings; Princeton University Press: Princeton, NJ, USA, 2011. [Google Scholar]
  2. Rubinstein, A. Ranking the participants in a tournament. SIAM J. Appl. Math. 1980, 38, 108–111. [Google Scholar] [CrossRef]
  3. Bouyssou, D.; Perny, P. Ranking methods for valued preference relations: A characterization of a method based on leaving and entering flows. Eur. J. Oper. Res. 1992, 61, 186–194. [Google Scholar] [CrossRef]
  4. Chebotarev, P.Y.; Shamis, E. Characterizations of scoring methodsfor preference aggregation. Ann. Oper. Res. 1998, 80, 299–332. [Google Scholar] [CrossRef]
  5. Vaziri, B.; Dabadghao, S.; Yih, Y.; Morin, T.L. Properties of sports ranking methods. J. Oper. Res. Soc. 2018, 69, 776–787. [Google Scholar] [CrossRef]
  6. Constantinou, N.E.F.; Neil, M. Pi-football: A bayesian network model for forecasting association football match outcomes. Knowl.-Based Syst. 2012, 36, 322–339. [Google Scholar] [CrossRef]
  7. Barrow, D.; Drayer, I.; Elliott, P.; Gaut, G.; Osting, B. Ranking rankings: An empirical comparison of the predictive power of sports ranking methods. J. Quant. Anal. Sport 2013, 9, 187–202. [Google Scholar] [CrossRef]
  8. Chartier, T.P.; Kreutzer, E.; Langville, A.N.; Pedings, K.E. Sensitivity and stability of ranking vectors. SIAM J. Sci. Comput. 2011, 33, 1077–1102. [Google Scholar] [CrossRef]
  9. Kardos, O.; London, A.; Vinkó, T. Stability of network centrality measures: A numerical study. Soc. Netw. Anal. Min. 2020, 10, 1–17. [Google Scholar] [CrossRef]
  10. Segarra, S.; Ribeiro, A. Stability and continuity of centrality measures in weighted graphs. IEEE Trans. Signal Process. 2015, 64, 543–555. [Google Scholar] [CrossRef] [Green Version]
  11. Costenbader, E.; Valente, T.W. The stability of centrality measures when networks are sampled. Soc. Netw. 2003, 25, 283–307. [Google Scholar] [CrossRef]
  12. Langville, A.N.; Meyer, C.D. Who’s# 1?: The Science of Rating and Ranking; Princeton University Press: Princeton, NJ, USA, 2012. [Google Scholar]
  13. Jiang, X.; Lim, L.H.; Yao, Y.; Ye, Y. Statistical ranking and combinatorial Hodge theory. Math. Program. 2011, 127, 203–244. [Google Scholar] [CrossRef] [Green Version]
  14. Pickle, D.; Howard, B. Computer to Aid in Basketball Championship Selection. NCAA News. 1981, Volume 4. Available online: https://scholar.google.com/scholar?q=Pickle%2C+D.%3B+Howard%2C+B+Computer+to+aid+in+basketball+championship+selection.+NCAA+News%2C+1981%3B+Volume+4.&hl=en&as_sdt=0%2C5&as_ylo=&as_yhi= (accessed on 15 March 2022).
  15. Massey, K. Statistical Models Applied to the Rating of Sports Teams. Unpublished. Bachelor’s Thesis, Bluefield College, Bluefield, VA, USA, 1997. [Google Scholar]
  16. Colley, W. Colleyâs Bias Free College Football Ranking Method; Princeton University Princeton, NJ, USA. 2002. Available online: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Colley%2C+W.+Colley+%C3%A2s+Bias+Free+College+Football+Ranking+Method%3B+2002.&btnG= (accessed on 15 March 2022).
  17. Keener, J.P. The Perron-Frobenius Theorem and the Ranking of Football Teams. SIAM Rev. 1993, 35, 80–93. [Google Scholar] [CrossRef]
  18. Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Technical Report; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
  19. Franceschet, M.; Bozzo, E. The Massey’s method for sport rating: A network science perspective. arXiv 2017, arXiv:1701.03363. [Google Scholar]
  20. Liberti, L.; Lavor, C.; Maculan, N.; Mucherino, A. Euclidean distance geometry and applications. SIAM Rev. 2014, 56, 3–69. [Google Scholar] [CrossRef]
  21. Kendall, M. A new measure of rank correlationâ? Biometrica 1938, 30, 81–93. [Google Scholar] [CrossRef]
  22. HA, D. Ranking the players in a round robin tournament. Rev. Int. Stat. Inst. 1971, 39, 137–147. [Google Scholar]
  23. Borodin, A.; Roberts, G.O.; Rosenthal, J.S.; Tsaparas, P. Link analysis ranking: Algorithms, theory, and experiments. ACM Trans. Internet Technol. 2005, 5, 231–297. [Google Scholar] [CrossRef]
  24. Lasek, J.; Szlávik, Z.; Bhulai, S. The predictive power of ranking systems in association football. Int. J. Appl. Pattern Recognit. 2013, 1, 27–46. [Google Scholar] [CrossRef]
  25. London, A.; Németh, J.; Németh, T. Time-dependent network algorithm for ranking in sports. Acta Cybern. 2014, 21, 495–506. [Google Scholar] [CrossRef] [Green Version]
  26. Avron, H.; Horesh, L. Community Detection Using Time-Dependent Personalized Pagerank. In International Conference on Machine Learning; PMLR; 2015; pp. 1795–1803. Available online: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Avron%2C+H.%3B+Horesh%2C+L.+Community+detection+using+time-dependent+personalized+pagerank.+In+International+Conference+on+Machine+Learning%3B+PMLR%3A+2015%3B+pp.+1795%E2%80%931803&btnG= (accessed on 15 March 2022).
  27. Zhou, Y.; Wang, R.; Zhang, Y.C.; Zeng, A.; Medo, M. Improving PageRank using sports results modeling. Knowl.-Based Syst. 2022, 2022, 108168. [Google Scholar] [CrossRef]
Figure 1. Rating stability based on Euclidean distance by rolling window approach. The figure above shows that the distance value d R W 2 t for PageRank, Colley, Massey, Keener, WP, and RPI methods changes over time at fixed window size (i.e., 10 games per window). Smaller value indicates a higher rating stability.
Figure 1. Rating stability based on Euclidean distance by rolling window approach. The figure above shows that the distance value d R W 2 t for PageRank, Colley, Massey, Keener, WP, and RPI methods changes over time at fixed window size (i.e., 10 games per window). Smaller value indicates a higher rating stability.
Information 13 00232 g001
Figure 2. Rating stability based on Euclidean distance by expanding window approach. The figure above shows that the distance value d E W 2 t for PageRank, Colley, Massey, Keener, WP, and RPI methods increases over time at incremental window size (i.e., an increase of 10 games per window). Smaller distance value indicate higher rating stability.
Figure 2. Rating stability based on Euclidean distance by expanding window approach. The figure above shows that the distance value d E W 2 t for PageRank, Colley, Massey, Keener, WP, and RPI methods increases over time at incremental window size (i.e., an increase of 10 games per window). Smaller distance value indicate higher rating stability.
Information 13 00232 g002
Figure 3. Rank stability of PageRank, Colley, Massey, Keener, WP, and RPI methods using rolling window simulations.
Figure 3. Rank stability of PageRank, Colley, Massey, Keener, WP, and RPI methods using rolling window simulations.
Information 13 00232 g003
Figure 4. Rank stability of PageRank, Colley, Massey, Keener, WP, and RPI methods using expanding window simulations.
Figure 4. Rank stability of PageRank, Colley, Massey, Keener, WP, and RPI methods using expanding window simulations.
Information 13 00232 g004
Table 1. Comparison of top-5 teams rate–rank by rolling window approach.
Table 1. Comparison of top-5 teams rate–rank by rolling window approach.
10–60 Games 20–70 Games 30–80 Games
MethodTeamsRanksRatingTeamsRanksRatingTeamsRanksRating
PageRank‘Man. City’10.0696‘Man. City’10.0894‘Man. City’10.0895
‘Man. United’20.0067‘Arsenal’20.0815‘Arsenal’20.083
‘Arsenal’30.1564‘Man. United’30.0671‘Man. United’30.068
‘Tottenham’40.0544‘Newcastle’40.0612‘Newcastle’40.0655
‘Chelsea’50.1344‘Tottenham’50.0635‘Tottenham’50.0639
sd ± 0.0522 sd ± 0.0116 sd ± 0.0125
Colley‘Man. City’10.1923‘Man. United’1−0.042‘Arsenal’1−0.0368
‘Man. United’2−0.0138‘Man. City’2−0.0579‘Man. City’20.057
‘Arsenal’3−0.1511‘Newcastle’30.1823‘Tottenham’30.2564
‘Tottenham’4−0.0149‘Arsenal’40.0206‘Man. United’4−0.1074
‘Newcastle’50.061‘Tottenham’5−0.0821‘Newcastle’50.1348
sd ± 0.1104 sd ± 0.1267 sd ± 0.1333
Massey‘Man. City’1−0.0696‘Man. City’1−0.1379‘Man. City’1−0.1053
‘Man. United’2−0.0067‘Man. United’2−0.0599‘Man. United’2−0.2243
‘Arsenal’3−0.1564‘Arsenal’30.7011‘Arsenal’30.8573
‘Tottenham’4−0.0544‘Tottenham’4−0.3316‘Tottenham’4−0.314
‘Newcastle ’50.1344‘Newcastle’50.4368‘Chelsea’50.4633
sd ± 0.0333 sd ± 0.0409 sd ± 0.0418
Keener‘Man. City’10.1843‘Man. City’10.1612‘Man. City’10.1617
‘Man. United’20.2164‘Man. United’20.2157‘Man. United’20.2145
‘Arsenal’30.2415‘Tottenham’30.2238‘Tottenham’30.2258
‘Tottenham’40.2348‘Arsenal’40.2496‘Arsenal’40.2465
‘Newcastle’50.2065‘Newcastle’50.1889‘Newcastle’50.1759
sd ± 0.0328 sd ± 0.0434 sd ± 0.0482
WP‘Man. United’10.0513‘Man. United’10.0511‘Man. United’10.0509
‘Man. City’20.0484‘Man. City’20.0485‘Man. City’20.0484
‘Chelsea’30.0514‘Arsenal’30.0518‘Chelsea’30.0513
‘Arsenal’40.0509‘Tottenham’40.0502‘Arsenal’40.0508
‘Tottenham’50.0531‘Newcastle’50.0519‘Tottenham’50.0515
sd ± 0.2418 sd ± 0.2256 sd ± 0.2097
RPI‘Man. City’10.0489‘Man. United’10.0484‘Man. United’10.0483
‘Man. United’20.0490‘Chelsea’20.0510‘Chelsea’20.0514
‘Arsenal’30.0526‘Man. City’30.0510‘Man. City’30.0527
‘Tottenham’40.0482‘Arsenal’40.0514‘Arsenal’40.0508
‘Newcastle’50.0523‘Tottenham’50.0533‘Tottenham’50.0491
sd ± 0.2122 sd ± 0.2819 sd ± 0.3418
Table 2. Comparison of top-5 teams rate–rank by expanding window approach.
Table 2. Comparison of top-5 teams rate–rank by expanding window approach.
After 60 Games After 70 Games After 80 Games
MethodTeamsRanksRatingTeamsRanksRatingTeamsRanksRating
PageRank‘Man. City’10.1217‘Man. City’10.1226‘Man. City’10.1237
‘Chelsea’20.0889‘Chelsea’20.0898‘Chelsea’20.0907
‘Man. United’30.0757‘Man. United’30.0758‘Man. United’30.0759
‘Newcastl’40.0670‘Arsenal’40.0668‘Arsenal’40.0666
‘Tottenham’50.0640‘Tottenham’50.0641‘Tottenham’50.0643
sd ± 0.0165 sd ± 0.0174 sd ± 0.0176
Colley‘Arsenal’10.1923‘Arsenal’10.1815‘Arsenal’10.1846
‘Man. City’2−0.0138‘Man. City’2−0.0205‘Man. City’2−0.0290
‘Man. United’3−0.1511‘Man. United’3−0.1462‘Man. United’3−0.1436
‘Tottenham’4−0.0149‘Tottenham’4−0.0085‘Tottenham’4−0.0113
‘Newcastle’50.0610‘Chelsea’50.0030‘Chelsea’50.0001
sd ± 0.0773 sd ± 0.0754 sd ± 0.0775
Massey‘Man. City’1−0.0696‘Man. City’1−0.0569‘Man. United’10.1560
‘Arsenal’2−0.0067‘Man. United’2−0.0055‘Man. City’2−0.0509
‘Man. United’3−0.1564‘Arsenal’3−0.1278‘Arsenal’3−0.1044
‘Tottenham’4−0.0544‘Tottenham’4−0.0444‘Newcastle’40.0121
‘Newcastle’50.1344‘Newcastle’50.1098‘Chelsea’50.0464
sd ± 0.0242 sd ± 0.0226 sd ± 0.0210
Keener‘Man. United’10.1843‘Man. United’10.1870‘Man. United’10.2382
‘Man. City’20.2164‘Man. City’20.2169‘Man. City’20.2150
‘Arsenal’30.2415‘Arsenal’30.2403‘Chelsea’30.2109
‘Tottenham’40.2348‘Tottenham’40.2341‘Arsenal’40.1791
‘Newcastle’50.2065‘Newcastle’50.2073‘Tottenham’50.1947
sd ± 0.0326 sd ± 0.0408 sd ± 0.0418
WP‘Man. City’10.0555‘Man. City’10.0553‘Man. City’10.0550
‘Man. United’20.0490‘Man. United’20.0490‘Man. United’20.0490
‘Arsenal’30.0483‘Tottenham’30.0497‘Tottenham’30.0497
‘Tottenham’40.0497‘Arsenal’40.0483‘Arsenal’40.0484
‘Newcastle’50.0502‘Newcastle’50.0501‘Newcastle’50.0501
sd ± 0.0971 sd ± 0.0865 sd ± 0.0737
RPI‘Man. City’10.0573‘Man. United’10.0497‘Man. United’10.0497
‘Man. United’20.0497‘Man. City’20.0570‘Man. City’20.0567
‘Arsenal’30.0490‘Arsenal’30.0490‘Arsenal’30.0490
‘Tottenham’40.0494‘Newcastle’40.0499‘Tottenham’40.0495
‘Newcastle’50.0499‘Chelsea’50.0479‘Swansea’50.0503
sd ± 0.0506 sd ± 0.0831 sd ± 0.0843
Table 3. Average rating error for rolling and expanding window simulations.
Table 3. Average rating error for rolling and expanding window simulations.
PageRankColleyMasseyKeenerWPRPI
RMSEsdRMSEsdRMSEsdRMSEsdRMSEstdRMSEsd
RW0.25680.01880.41330.11560.28190.03180.40670.04350.47220.09270.48090.2220
EW0.28260.01930.40250.07650.32370.01720.38590.05730.44460.04890.44250.0692
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ochieng, P.J.; London, A.; Krész, M. A Forward-Looking Approach to Compare Ranking Methods for Sports. Information 2022, 13, 232. https://doi.org/10.3390/info13050232

AMA Style

Ochieng PJ, London A, Krész M. A Forward-Looking Approach to Compare Ranking Methods for Sports. Information. 2022; 13(5):232. https://doi.org/10.3390/info13050232

Chicago/Turabian Style

Ochieng, Peter Juma, András London, and Miklós Krész. 2022. "A Forward-Looking Approach to Compare Ranking Methods for Sports" Information 13, no. 5: 232. https://doi.org/10.3390/info13050232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop