1. Introduction
In competitive sports, ranking systems and tournament structures play a critical role in ensuring the procedural legitimacy of outcomes. Whether used for playoff qualification, seed allocation, or athlete selection, rankings are expected to reflect actual team or player performance while minimizing the influence of random variation and structural imbalance. When a ranking system fails to align with observed performance, it can undermine the transparency, credibility, and predictability of the competition framework. The inaccurate evaluation of team strength distorts the interpretation of comparative performance and limits the utility of rankings as a basis for competition outcomes [
1]. Particularly in team sports at the school level, the robustness and consistency of ranking models carry broader implications, influencing regional athletic development, resource allocation, and the design of sports governance policies [
2,
3].
In most contemporary baseball tournaments, team rankings are typically determined by statistical indicators, such as the number of wins, win percentage (win rate), or run differential. These methods are widely adopted across various levels of competition due to their computational simplicity, interpretability, and institutional transparency [
4]. Among them, the most commonly used approach is win percentage, which calculates a team’s success rate based on the proportion of games won. This method is intuitive and operationally straightforward, offering a basic performance metric. However, it critically depends on a complete and balanced schedule. When the competition structure is highly incomplete or asymmetric, the win percentage fails to capture qualitative differences in the opponents faced and outcomes achieved [
5]. From a representational standpoint, teams with stronger opponents may be penalized in the rankings, while those with weaker schedules may receive disproportionate advantages.
Another traditional method is run differential, which quantifies overall team performance by subtracting runs allowed from runs scored across games [
6]. Although this approach provides a finer-grained metric of offensive and defensive capability, it is highly sensitive to outliers, such as one-sided games or unpredictable results. In short seasons or elimination-based formats, these fluctuations may distort rankings and place resource-constrained teams, which often lack the opportunity to accumulate consistent long-term results, at a disadvantage.
To address the limitations of purely statistical ranking systems, mathematical models such as the Colley and Massey methods have been proposed. These models incorporate opponent strength to mitigate schedule imbalance. The Colley matrix uses systems of linear equations based on pairwise outcomes to generate strength estimates and is noted for its transparency and objectivity [
7]. The Massey method minimizes the squared error between observed and predicted score margins, aiming to produce rankings that reflect relative performance more equitably [
8]. While both methods have demonstrated efficacy in professional or collegiate baseball leagues with long and relatively complete seasons, their application in high school tournaments or single-elimination settings remains limited due to sparse data and potential ranking distortions [
9,
10,
11].
Although these traditional methods remain widely accepted in practice for their simplicity and intuitive logic, they often struggle to account for latent performance differentials and structural biases under incomplete schedules or highly uneven matchups. Thus, there is a growing need to develop or adapt ranking models that can generate more representative and robust outcomes under real-world tournament constraints. The empirical evaluation of such models in authentic competitive environments remains an important direction for further research.
This challenge is especially acute in school-level competitions, where structural limitations are more severe. For example, Taiwan’s largest high school baseball tournament—the CTBC Black Panther Cup National High School Baseball Tournament (hereafter referred to as the Black Panther Cup)—draws nearly 200 teams annually. Due to its scale and logistical constraints, the tournament adopts a single-elimination format. While efficient, this structure introduces substantial difficulties in evaluating team performance and constructing representative rankings. Matchups are highly unbalanced, and strong teams may be prematurely eliminated due to unfavorable draws. Traditional ranking criteria based on win totals or win percentage fail to account for the relative difficulty of opponents, treating all victories as equivalent regardless of competitive context.
This study evaluates the applicability of the PageRank algorithm in modeling team strength within such constrained environments. While PageRank has been applied in various sports contexts [
12,
13,
14], prior research has primarily focused on predictive performance in professional settings, with limited attention to its capacity to generate representative rankings under structurally imbalanced school-level competitions.
To address this gap, we analyze data from Taiwan’s Black Panther Cup (2013–2023), applying the PageRank algorithm to model team strength under incomplete, single-elimination schedules. This study aims to evaluate the following: (1) the validity of PageRank in reflecting relative performance; (2) its consistency with official rankings and predictive stability over time; (3) its feasibility as a low-data, system-level tool for grassroots sports evaluation. The findings aim to inform tournament design and policy development through a performance evaluation framework that does not depend on individual statistics.
To address this gap, this study analyzes data from Taiwan’s Black Panther Cup (2013–2023), applying the PageRank algorithm to estimate team strength under incomplete, single-elimination schedules. The objectives of this research are threefold: (1) to assess the validity of PageRank in capturing relative team performance; (2) to examine its consistency with official rankings across seasons; and (3) to evaluate its feasibility as a low-data tool for predicting match outcomes by leveraging historical data in grassroots sports contexts. The findings aim to inform tournament design and policy development by providing an alternative evaluation framework that does not rely on individual-level statistics.
3. Methodology
3.1. Research Design
This study employs an empirical framework structured in three stages to evaluate the applicability and stability of the PageRank algorithm in ranking teams under incomplete, single-elimination tournament structures. First, a competition network is constructed from historical match outcomes, and PageRank scores are computed to estimate relative team strength. Second, the consistency and temporal stability of the resulting rankings are assessed using statistical correlation metrics across multiple seasons. Third, the predictive validity of the rankings is examined through match-level simulations, where outcomes are forecasted based on pre-match PageRank scores and compared against actual results. The analysis further explores the effect of varying historical data horizons to compute PageRank relative to prediction accuracy.
3.2. Data Collection and Processing
This study utilizes data from the CTBC Black Panther Cup National High School Baseball Tournament in Taiwan, covering the period from its inaugural season in 2013 to 2023. The dataset comprises comprehensive match records from the preliminary rounds to the championship finals, totaling 1914 games. Data were obtained primarily from the official website of the Black Panther Cup (
https://blackpanthercup.tw/, accessed on 4 May 2025) and supplemented by earlier historical records provided by the Chinese Taipei Baseball Association (
http://www.ctba.org.tw/, accessed on 4 May 2025). These sources do not indicate explicit license restrictions for academic use, and all data were utilized exclusively for non-commercial academic research purposes in accordance with fair use principles.
The collected records were processed using standard data-cleaning procedures, including the removal of duplicate entries, imputation of missing values, and correction of inconsistencies to ensure overall data integrity. The cleaned data were then organized into a structured database format to facilitate subsequent analysis. The finalized dataset includes key variables, such as tournament-stage classifications, match schedules, team names for each matchup, game scores, and match outcomes.
3.3. PageRank Modeling
This study employs the PageRank algorithm [
22] to model team strength. As detailed in
Section 2.3, its network-based approach is well-suited for sports. For this research on single-elimination high school tournaments, PageRank was chosen for its ability to handle sparse data and unbalanced schedules and qualitatively assess victories by considering opponent strength. Team rankings are calculated recursively from match outcomes in a directed network.
The PageRank score
r(i) for team
i is computed as follows:
where
Bi is the set of teams defeated by team
i,
Nj is the number of teams defeated by team
j,
n represents the total number of teams, and α denotes the damping factor, conventionally set at 0.85. This formulation ensures that victories against stronger opponents contribute more significantly to a team’s final ranking, capturing both direct and indirect relationships within the tournament structure.
For implementation in this study, match outcomes from the dataset were first structured into a directed competition network, with teams as nodes and match results represented as edges directed from the losing team to the winning team. Using this network, a transition probability matrix based on a Markov chain model was constructed to represent team interactions. The SciPy library (v1.13.1) under Python (v3.12.3) was subsequently employed to perform eigenvector computations via the power iteration method, thereby deriving the steady-state distribution of the transition matrix and calculating PageRank scores for each team. These scores served as the foundation for both team ranking and match outcome prediction in the subsequent stages of analysis.
3.4. Evaluation Procedures
To evaluate the performance of the PageRank rankings, three empirical tests were conducted:
To quantify the ordinal agreement between the PageRank output and the official tournament standings, Kendall’s tau rank correlation coefficient (hereafter, Kendall’s tau) was employed. As an established and widely adopted non-parametric metric, Kendall’s tau remains highly suitable for this analysis because it effectively captures pairwise ranking reversals and provides a robust indicator of structural similarity between rankings. Higher Kendall’s tau values indicate stronger consistency, with values approaching 1 reflecting near-perfect concordance. The coefficient is calculated as follows:
where
C and
D are the numbers of concordant and discordant pairs among
n teams. Statistical significance is assessed via a two-tailed
p-value, with
p < 0.05 indicating a meaningful correlation.
Forecast accuracy is tested by simulating each season’s matches. For every game, the team with the higher pre-match PageRank score is predicted to win. Forecast accuracy is then computed as follows:
This metric is calculated on a per-season basis to assess the model’s out-of-sample predictive reliability.
To evaluate the predictive utility of PageRank in estimating team strength, a series of empirical tests was conducted using historical match data. An n-step-ahead prediction framework was implemented, in which PageRank scores were calculated prior to each game based on match results from the current season and up to five preceding seasons.
Each tournament season follows a single-elimination format comprising four stages: preliminary (P), quarterfinals (Q), semifinals (S), and final (F). After the preliminary round, eight teams advance to four quarterfinal matchups, followed by two semifinal games (three in 2013 and 2014 due to format variations) and culminating in the championship final. To illustrate the n-step-ahead prediction framework, a one-year historical window was employed to forecast the 2015 quarterfinal outcomes. PageRank scores were computed using all matches from the 2014 season (including the preliminary, quarterfinal, semifinal, and final stages), along with the 2015 preliminary results. Similarly, to predict the semifinals, both the complete 2014 season data and the 2015 preliminary and quarterfinal matches were used to estimate PageRank scores prior to those games.
To evaluate temporal robustness, two complementary analyses were performed using data from the 2013 to 2023 seasons. First, Kendall’s tau was calculated between PageRank rankings and official standings in consecutive years to assess the year-to-year consistency of estimated team strengths. Second, predictive accuracy was evaluated across different historical data horizons, as defined in the simulation framework. These assessments demonstrate the stability and reliability of the PageRank rankings across tournament seasons.
4. Empirical Results and Discussion
4.1. Overview of PageRank Rankings
Based on data from the 2013 to 2023 tournaments, average PageRank scores were computed for all teams across seasons.
Figure 1 presents the top 10 teams by their average PageRank, along with their corresponding win percentages (win rate), which are defined as wins divided by the total games played.
Teams such as Pingzhen High School (HS), Kupao Home Economics and Commerce HS, and Kaoyuan Vocational HS consistently ranked among the top three across both metrics. Their PageRank values more clearly distinguish them from other teams, reflecting sustained competitive dominance. Notably, Pingzhen HS leads with a long-term average PageRank of 0.609 and secured the most Black Panther Cup titles over the study period.
Several teams exhibit notable discrepancies between their PageRank and win-rate rankings. For instance, Taoyuan Agricultural & Industrial HS ranks fourth in PageRank but fifth by win rate, suggesting stronger opponents defeated and a higher performance value that is not reflected in the raw win percentage. In contrast, Yingge Vocational HS ranks higher by win rate but lower in PageRank, indicating that their success may have come against weaker teams.
Similarly, Shiyuan HS ranks 6th in PageRank but only 10th in win rate, likely due to impactful wins over high-strength opponents despite a modest overall record. Conversely, Mailiao HS ranks 6th in win rate but 10th in PageRank, reflecting inflated results from favorable matchups. These cases demonstrate PageRank’s capacity to capture performance quality beyond win–loss totals, offering a more nuanced measure of team strength in structurally imbalanced tournaments.
To further analyze longitudinal performance patterns, all 17 teams that reached the quarterfinals at least once during the 2013–2023 seasons were identified, and their annual PageRank scores were visualized in a heatmap (see
Figure 2).
The heatmap illustrates a clear contrast between teams exhibiting sustained excellence and those demonstrating episodic success. Teams such as Pingzhen HS, Kupao Home Economics & Commerce HS, and Kaoyuan Vocational HS consistently display high PageRank values across multiple seasons, underscoring their long-term dominance and structural centrality within the tournament network. These stable patterns reflect not only frequent advancement into later rounds but also victories over strong opponents, which PageRank effectively captures through its recursive weighting mechanism.
Conversely, teams such as Pumen HS, Yingge Vocational HS, and NCHU Affiliated Agricultural & Industrial Vocational HS show marked spikes in PageRank scores during isolated years, suggesting that transient competitive surges are likely tied to cohort effects, favorable draws, or exceptional individual players. This episodic behavior contrasts with their generally lower presence in other seasons, indicating weaker overall program stability.
Furthermore, the matrix suggests that only about 3 to 4 teams consistently attain high PageRank scores across multiple years, while most other teams remain low-ranked. This concentration indicates a relatively uneven competitive landscape, with dominance clustered among a few programs. Nevertheless, the occasional emergence of 6 to 8 different teams reaching high scores in select years also reflects a degree of mobility and turnover, characteristic of youth-level tournaments where talent development and cohort variability play significant roles.
4.2. Ranking Consistency Analysis
Kendall’s tau rank correlation coefficient was employed to evaluate the relationship between PageRank rankings and the official final standings. Using the top eight teams recognized by the tournament’s official awards each year as the reference sample, Kendall’s tau was computed for each season from 2013 to 2023, as shown in
Table 1. In 10 out of the 11 seasons, Kendall’s tau values exceeded 0.70, and in most cases, the corresponding
p-values were below 0.05, indicating statistically significant correlations. Notably, the values for 2014 and 2018 reached 0.929 and 0.905, respectively, suggesting near-perfect agreement between PageRank rankings and official results. The lowest correlation occurred in 2019, with a Kendall’s tau of 0.643.
Overall, the consistently high Kendall’s tau values demonstrate the reliability and validity of PageRank as an objective ranking method. In particular, the method proves to be useful in single-elimination formats where official rankings are not assigned beyond the top eight, offering a robust and interpretable estimate of relative team strength in the lower ranks as well.
4.3. Match Outcome Prediction
The predictive performance of PageRank rankings was evaluated using six temporal horizons, ranging from the current season alone to the inclusion of up to five preceding seasons.
Table 2 summarizes the results. When relying solely on current-season data, prediction accuracy was low (41.8%). In contrast, incorporating one to five years of prior results significantly improved performance, with accuracy ranging from 76.2% to 83.7%.
Figure 3 illustrates how accuracy varied by tournament stage. For quarterfinal predictions, using data from the current season and two previous years yielded the best results. Semifinal predictions required a broader historical window of three to five years to achieve optimal accuracy. Moreover, we observed that in high-stakes rounds such as the semifinals and final, predictive accuracy consistently increased with longer historical input.
The decline in accuracy when using 5-year historical data (76.2%) compared to 4-year data (83.7%) reflects several external influences that affect the tournament’s competitive dynamics over time. These include tournament format changes (such as the additional semifinal match in 2013–2014 seasons and the elimination of seeded bracket protections in 2020), the temporal decay of information due to natural program evolution and player turnover cycles typical in high school sports, and COVID-19 impacts on the 2020–2021 tournaments, where key players were unable to participate due to infection or quarantine protocols. Conversely, the exceptionally high Kendall coefficients in 2014 (0.929) and 2018 (0.905) suggest periods of greater competitive stability, where PageRank rankings aligned particularly well with tournament outcomes, possibly indicating more predictable competitive hierarchies during these seasons.
This analysis suggests that 3–4 years represents the optimal balance between historical depth and temporal relevance for PageRank-based predictions in this context. The deterioration in predictive performance beyond this window indicates that older data may introduce noise rather than signals, particularly when structural changes to the tournament format or external disruptions alter the underlying competitive dynamics. In the later knockout stages, where competition is more selective and match outcomes are harder to infer, multi-year performance data provides more reliable signals of underlying team strength. The inclusion of longitudinal data mitigates the sparsity inherent in single-elimination structures and enhances the model’s ability to distinguish relative competitiveness. This underscores the importance of historical depth when applying network-based ranking methods to incomplete or asymmetric competitions.
5. Conclusions and Suggestions
5.1. Summary of Findings
This study examined the utility of the PageRank algorithm as a ranking and predictive tool in high school baseball tournaments characterized by incomplete and single-elimination formats. Drawing on a decade of empirical data from Taiwan’s Black Panther Cup (2013–2023), this study assessed whether PageRank can validly estimate team strength, align with official results, and offer forecasting capabilities using minimal input data.
The findings show that PageRank successfully identified top-performing teams and provided rankings highly consistent with official outcomes, as evidenced by strong Kendall’s tau correlations in 10 out of 11 seasons. Furthermore, the model yielded credible match predictions, with accuracy reaching as high as 92.9% when historical data from preceding seasons were incorporated.
While these results are derived from a single tournament context, they provide valuable insights into PageRank’s behavior under structurally constrained conditions typical of school-level competitions. The case-based approach allows for the detailed examination of algorithmic performance within a specific competitive framework, offering methodological contributions that may inform applications in similar educational or resource-limited settings.
Overall, the findings support PageRank as a consistent and data-efficient approach for ranking and forecasting in incomplete or asymmetric competitions. Its low data requirements and structural flexibility make it particularly well-suited for school-level tournaments, where detailed statistics are often unavailable. More broadly, these results underscore the potential of network-based methods to generate interpretable and stable assessments of relative team strength in constrained competition environments.
5.2. Contribution of the Work
This study contributes to the field of sports analytics by extending the application of the well-established PageRank algorithm beyond traditional benchmarking toward evaluating team strength in incomplete tournament environments. Rather than proposing methodological innovations, the research demonstrates how existing network-based techniques can be effectively adapted and validated in data-sparse, structurally constrained educational sports settings. Specifically, this study highlights PageRank’s utility as a longitudinal assessment tool for high school baseball tournaments, addressing the unique analytical challenges posed by unbalanced match schedules and limited performance data.
The research establishes empirically grounded temporal parameters for youth sports contexts, identifying 3–4 years as the optimal historical window for PageRank-based predictions. This finding reflects the unique characteristics of high school athletics, including rapid player turnover and program evolution, where extending historical windows beyond four years introduces noise rather than signals due to structural changes and external disruptions.
These contributions distinguish this work from prior studies that primarily focus on inter-model comparisons. This study prioritizes algorithm interpretability and stakeholder acceptance—considerations that are crucial for real-world implementation in educational settings but often overlooked in comparative algorithmic studies.
5.3. Practical Implications
The application of PageRank in this study presents several practical implications for sports administrators and tournament organizers. First, the algorithm provides a low-cost and transparent method for ranking teams in the absence of detailed player-level statistics or complete round-robin results, thereby contributing to greater perceived fairness in youth-level competitions. This is particularly relevant for school-level tournaments, where budget and data infrastructure are often limited. Second, the approach can support more equitable seeding procedures and post-tournament evaluations by capturing the quality of victories rather than relying solely on their quantity. Finally, the identification of long-term strong programs through PageRank trends may aid in talent development and resource allocation across regions or institutions.
Additionally, the interpretability of PageRank rankings provides practical value for non-technical stakeholders such as coaches and school administrators. For coaches, the rankings can inform pre-match preparation by providing a data-based assessment of opponent strength, allowing for more strategic resource allocation, roster planning, and training focus. In high-stakes matchups, the rankings may also serve as a motivational tool.
For school administrators, these rankings provide a transparent and quantifiable measure of team performance that can support internal evaluations, justify athletic resource planning, and inform decisions related to coaching support or program development. The longitudinal use of such rankings can also help identify performance trends across years and guide institutional strategies for interscholastic sports development.
5.4. Limitations and Future Research
This study also has several limitations. The PageRank model used here is based solely on binary win–loss outcomes, without incorporating contextual factors such as the margin of victory, home-field advantage, or player-level variables. Furthermore, this study’s empirical scope is confined to a single tournament in Taiwan, which may limit the generalizability of its conclusions to other sports contexts or educational systems. Cross-validation against alternative sports or geographical contexts would strengthen external validity and broaden the applicability of these findings.
Additionally, this study did not conduct formal statistical robustness testing such as sensitivity analysis, bootstrap validation, or perturbation studies. While our findings demonstrate consistent performance across multiple seasons and temporal horizons, formal robustness evaluation remains an important direction for future research to establish PageRank’s reliability under data uncertainty in tournament settings.
Future studies may expand upon this work in several directions. Methodologically, incorporating weighted metrics such as score differentials or recency-weighted outcomes may enhance the sensitivity of PageRank in capturing real-time team dynamics. Time-decay variants or dynamic PageRank adaptations may also better accommodate performance evolution throughout a season. Additionally, comparative evaluations with alternative ranking models, including Elo, Colley, or Massey, would help clarify the strengths and trade-offs of different algorithms. Finally, applying these methods across different sports, competition levels, or international school systems can further test the scalability and versatility of network-based ranking approaches in sports analytics.