A Symmetry-Aware Predictive Framework for Olympic Cold-Start Problems and Rare Events Based on Multi-Granularity Transfer Learning and Extreme Value Analysis

Wang, Yanan; Fei, Yi; Zhang, Qiuyan

doi:10.3390/sym17111791

Open AccessArticle

A Symmetry-Aware Predictive Framework for Olympic Cold-Start Problems and Rare Events Based on Multi-Granularity Transfer Learning and Extreme Value Analysis

by

Yanan Wang

,

Yi Fei

and

Qiuyan Zhang

^*

College of Applied Mathematics, Chengdu University of Information Technology, Chengdu 610225, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(11), 1791; https://doi.org/10.3390/sym17111791

Submission received: 13 September 2025 / Revised: 11 October 2025 / Accepted: 15 October 2025 / Published: 23 October 2025

(This article belongs to the Special Issue Applications Based on Symmetry in Machine Learning and Data Mining)

Download

Browse Figures

Versions Notes

Abstract

This paper addresses the cold-start problem and rare event prediction challenges in Olympic medal forecasting by proposing a predictive framework that integrates multi-granularity transfer learning with extreme value theory. The framework comprises two main components, a Multi-Granularity Transfer Learning Core (MG-TLC) and a Rare Event Analysis Module (RE-AM), which address multi-level prediction for data-scarce countries and first medal prediction tasks. The MG-TLC incorporates two key components: Dynamic Feature Space Reconstruction (DFSR) and the Hierarchical Adaptive Transfer Strategy (HATS). The RE-AM combines a Bayesian hierarchical extreme value model (BHEV) with piecewise survival analysis (PSA). Experiments based on comprehensive, licensed Olympic data from 1896–2024, where the framework was trained on data up to 2016, validated on the 2020 Games, and tested by forecasting the 2024 Games, demonstrate that the proposed framework significantly outperforms existing methods, reducing MAE by 25.7% for data-scarce countries and achieving an AUC of 0.833 for first medal prediction, 14.3% higher than baseline methods. This research establishes a foundation for predicting the 2028 Los Angeles Olympics and provides new approaches for cold-start and rare event prediction, with potential applicability to similar challenges in other data-scarce domains such as economics or public health. From a symmetry viewpoint, our framework is designed to preserve task-relevant invariances—permutation invariance in set-based country aggregation and scale robustness to macro-covariate units—via distributional alignment between data-rich and data-scarce domains and Olympic-cycle indexing. We treat departures from these symmetries (e.g., host advantage or event-program changes) as structured asymmetries and capture them with a rare event module that combines extreme value and survival modeling.

Keywords:

Olympic prediction; multi-granularity transfer learning; rare event analysis; cold-start problem; extreme value theory; symmetry; invariance

1. Introduction

As the world’s largest comprehensive sporting event, the Olympic Games and their medal prediction have been important topics in sports analytics, economic research, and data science. However, existing prediction methods primarily focus on data-rich countries, with limited predictive capability for data-scarce countries, especially those with zero medals. Statistics show that across 200 countries participating in the Olympics, over 40% have fewer than 10 historical medals, and approximately 30% have never won an Olympic medal. These data-scarce countries face severe cold-start challenges in prediction.

The cold-start problem in Olympic prediction manifests in two main aspects: first, the prediction difficulties caused by data scarcity, particularly for countries that have never won medals; second, the challenge of ensuring multi-level prediction consistency, where national-level predictions often conflict with specific event predictions. Meanwhile, predicting when a zero-medal country will win its first medal is a typical rare event prediction problem, characterized by low frequency, high uncertainty, and extreme value characteristics.

Existing research primarily addresses these issues from two perspectives: one employs economic models using macroeconomic indicators such as GDP and population to predict Olympic performance [1,2]; the other utilizes machine learning techniques such as random forests and gradient boosting [3]. However, these methods perform poorly when dealing with data-scarce countries and lack specialized modeling for rare events.

To address these challenges, we propose a predictive framework that integrates multi-granularity transfer learning with extreme value theory. The framework includes two core components: a MG-TLC and a RE-AM. The MG-TLC achieves knowledge transfer from data-rich countries to data-scarce countries through DFSR and the HATS; the RE-AM combines a BHEV with PSA to provide precise predictions for rare events such as “first medals”.

Our innovation in addressing Olympic cold-start and rare event challenges lies in establishing a theoretical bridge between multi-granularity transfer learning and extreme value theory. This combination leverages complementary advantages: multi-granularity transfer learning can handle data-scarce samples, while extreme value theory offers theoretical support for long-tail distributions and rare events. Unlike existing methods, our framework considers not only horizontal domain transfer (from data-rich countries to data-scarce countries) but also vertical temporal transfer (utilizing Olympic development stage theory), forming a novel “bidirectional transfer” paradigm—a new attempt in the field of sports prediction. Additionally, our proposed mechanism to ensure hierarchical consistency resolves the problem of multi-level prediction inconsistency in existing Olympic prediction methods, a challenge that previous research has not adequately addressed.

The main contributions of this study include (1) proposing an Olympic prediction framework that combines multi-granularity transfer learning with extreme value theory; (2) designing a consistency guarantee mechanism for multi-level predictions; and (3) developing a prediction method capable of handling long-term historical Olympic data spanning multiple eras, demonstrating its robustness to significant temporal changes in Olympic structures and participation patterns. Experimental results demonstrate that the framework significantly outperforms existing methods in prediction accuracy for data-scarce countries and rare event prediction capability.

We organize the remainder of our paper as follows: Section 2 reviews related work on Olympic prediction, transfer learning, and extreme value theory. Section 3 presents the problem definition, notation, and key data challenges. Section 4 details the proposed methodology, introducing our framework architecture (MG-TLC and RE-AM) and feature design principles. Section 5 describes the experimental setup, including data sources, feature engineering, and evaluation methods. Section 6 presents experimental results, covering performance comparisons, model evaluations, case studies, and forecasts for the 2028 Olympics. Finally, Section 7 concludes with a discussion of future research directions.

In our setting, symmetry refers to transformations that leave the predictive task unchanged: (i) permutations of countries or regions (rankings should be invariant to reordering), (ii) scale transformations of macro covariates (unit- or currency-based conversions), and (iii) calendar re-indexing by Olympic cycles (equivariance in discrete time). Our design is intended to preserve these through set-based aggregation (permutation invariance), feature standardization and ratio- and per-capita features (scale robustness), distributional alignment across data-rich and data-scarce domains, and cycle-indexed discrete-time modeling. Symmetry-breaking factors—such as host advantage or event program modifications—are modeled explicitly by a rare event component combining extreme value and survival modeling, providing a principled link to the journal’s focus on symmetry.

2. Related Work

2.1. Olympic Prediction Research

Olympic prediction research is primarily divided into two directions: regression models based on economic variables and prediction models based on machine learning. Bernard and Busse [1] proposed using GDP and population to predict Olympic medal counts, demonstrating the significant influence of economic factors on Olympic performance. Subsequently, Forrest et al. [4] introduced new variables measuring GDP volume and home advantage effects, and combined them with judgmental adjustments to improve prediction accuracy. Johnson and Ali [2] analyzed the impact of participation continuity on Olympic performance, finding that countries with long-term Olympic participation have significant advantages.

With the development of machine learning technologies, researchers have begun applying more complex models in recent years. Zhu et al. [3] used random forests and XGBoost to predict Olympic medals and found that feature engineering is crucial for prediction accuracy. Moolchandani et al. [5] predicted 2024 Olympic medals through various machine learning models, emphasizing the potential of data-driven methods but not addressing the data scarcity problem. Huimin et al. [6] used interpretable random forest models to predict Olympic medals, revealing the importance of socioeconomic features but not addressing the challenge of rare event prediction. Liu and Suen [7] combined time series analysis to propose a dynamic prediction model for Olympic performance. However, these studies mainly focus on data-rich countries, with limited predictive capability for data-scarce and zero-medal countries.

2.2. Transfer Learning

Transfer learning aims to transfer knowledge from source domains to target domains, addressing the problem of insufficient target domain data. In domain adaptation, Ganin et al. [8] proposed Domain-Adversarial Neural Networks (DANNs), which learn domain-invariant feature representations through Gradient Reversal Layers. Chang and Li [9] designed Domain-Specific Batch Normalization (DSBN), learning independent normalization parameters for different domains to improve transfer effects further.

TrAdaBoost, proposed by Dai et al. [10], is another classic transfer learning method that achieves knowledge transfer by adjusting sample weights. Long et al. [11] designed Deep Adaptation Networks (DANs), reducing distribution differences between source and target domains through Maximum Mean Discrepancy (MMD) loss. Recent multi-granularity transfer learning research, such as that of Zhao et al. [12], inspired our method, which achieves knowledge transfer through multi-scale feature fusion. Jiang et al. [13] demonstrated the potential of transfer learning through hierarchical multi-granularity classification, applicable to complex data scenarios. However, these transfer learning methods rarely consider multi-level prediction consistency issues and do not specifically model extreme events. Methodologically, recent work on hierarchical transfer learning with Transformers shows that cross-level knowledge transfer can yield substantial gains; this supports our design choice to transfer across levels in multi-granularity settings [14].

2.3. Extreme Value Theory and Rare Event Prediction

Extreme value theory is a mathematical branch that studies probability distributions of extreme events, widely applied in financial risk, natural disasters, and rare event prediction. Coles [15] systematically introduced extreme value statistical models, including the Generalized Extreme Value (GEV) distribution and the Generalized Pareto Distribution (GPD), and subsequent developments on predictive peaks-over-threshold have formalized one-step-ahead prediction and predictive densities for extremes, offering a principled basis for uncertainty quantification [16]. In rare event prediction, zero-inflated models, such as zero-inflated Poisson and negative binomial models, are commonly used to handle count data with excess zeros.

Survival analysis is another approach for handling rare events, focusing on occurrence time and hazard rates. Kleinbaum and Klein [17] introduced basic methods and applications of survival analysis, such as Cox proportional hazards models and piecewise survival analysis. However, research applying extreme value theory and survival analysis to Olympic first medal prediction is relatively lacking, especially considering the long timespan and multi-level structure of Olympic data.

3. Preliminaries and Problem Definition

3.1. Notation

To ensure consistency and clarity of notation throughout the paper, Table 1 lists the main symbols used and their meanings.

3.2. Problem Definition

This research addresses two core problems: Olympic medal prediction for data-scarce countries and first medal prediction for zero-medal countries.

Problem 1: Multi-Level Prediction for Data-Scarce Countries

Given a set of data-rich countries

R

and a set of data-scarce countries

S

, where each country v has a feature vector

X_{v}

, our goal is to predict the following for each Olympics t:

Country level: The total medal count $y_{v, t}^{ctry}$ for country v at Olympics t;
Sport type level: The medal count $y_{v, t, c}^{spt}$ for country v in sport type c at Olympics t;
Event level: The medal count $y_{v, t, c, e}^{evt}$ for country v in specific event e of sport type c at Olympics t.

Multi-level predictions must satisfy the consistency constraint:

y_{v, t}^{ctry} = \sum_{c \in C_{v}} y_{v, t, c}^{spt} = \sum_{c \in C_{v}} \sum_{e \in E_{v, c}} y_{v, t, c, e}^{evt}

(1)

Problem 2: First Medal Prediction for Zero-Medal Countries

Given a set of zero-medal countries

Z

, for each country

v \in Z

, we need to predict the following:

Binary classification task: Whether country v will win an Olympic medal in the future $P (v \in M)$ , where $M$ is the set of countries winning medals in the future.
Time prediction task: If $v \in M$ , predict the time $T_{v}$ when the first medal will be obtained.

3.3. Data Challenges

Olympic data presents the following challenges, making prediction tasks particularly difficult:

Data Scarcity: Over 40% of countries have fewer than 10 historical medals, and approximately 30% have never won medals, causing traditional models to perform poorly on these countries.
Multi-Level Structure: Olympic prediction involves three levels, country level, sport type level, and event level, requiring consistency among predictions at different levels.
Long Timespan: Olympic data spans from 1896 to 2024, encompassing socioeconomic transformations, political events, rule changes, and other complex factors. Figure 1 shows the changes in key indicators over this long timespan and marks the impact of key historical events.
Rare Event Characteristics: Winning a first medal is a typical rare event with low frequency, high uncertainty, and extreme value distribution characteristics.
Non-Stationarity: Olympic event settings, rules, and the number of participating countries change over time, leading to evident non-stationarity in the data.

4. Proposed Method

This section details the proposed prediction framework, including the overall architecture, Multi-Granularity Transfer Learning Core, Rare Event Analysis Module, and feature design principles.

4.1. Framework Overview

Our proposed prediction framework contains two core components:

MG-TLC: Solves the multi-level prediction problem for data-scarce countries by transferring knowledge from data-rich countries while ensuring multi-level prediction consistency.
RE-AM: Addresses the first medal prediction problem for zero-medal countries, combining extreme value theory and survival analysis to handle the special distribution of rare events.

As shown in Figure 2, the framework has two core components: on the left is the MG-TLC, containing DFSR and the HATS as two key modules; on the right is the RE-AM, containing the BHEV and PSA as two core modules. Arrows in the figure indicate data flow, with different colors distinguishing different processing stages. The top represents input data (historical Olympic data and engineered country features), while the bottom represents output results (multi-level predictions for data-scarce countries and first medal predictions for zero-medal countries). The two major components exchange information and work collaboratively: The MG-TLC provides foundational predictions for the RE-AM, and the RE-AM’s results are fed back to the MG-TLC for model fine-tuning, further optimizing predictions for zero-medal countries. The framework employs end-to-end training, optimizing overall performance through a joint loss function. For step-by-step details, the pseudocode and the corresponding algorithmic flowchart are provided in Appendix A (Algorithm A1 and Figure A1).

The framework’s design promotes stable training through several key architectural components: (1) The core MMD and structural preservation losses in DFSR act as forms of regularization that constrain the feature mapping. (2) The use of Domain-Specific Batch Normalization (DSBN) helps mitigate covariate shift and stabilizes training across different data domains. (3) The Bayesian hierarchical model (BHEV) naturally incorporates regularization through its hierarchical priors, which prevent overfitting on rare events. Together, these architectural choices create a regularized learning environment intended to ensure stable training and support the framework’s viability for this complex prediction task.

We choose an end-to-end approach to co-adapt the transfer core and the rare event module, facilitating knowledge transfer from data-rich sources to data-scarce targets while jointly learning population-level trends and unit-specific patterns, an idea consistent with modern Bayesian longitudinal practice in sports analytics [18].

From a symmetry standpoint, the MG-TLC implements permutation-invariant aggregation and transfer across editions and granularities, whereas the RE-AM explicitly addresses symmetry-breaking factors (host advantage, new/modified events) that drive long-tail and rare outcomes.

4.2. Multi-Granularity Transfer Learning Core

The core idea of the MG-TLC is to transfer knowledge from data-rich countries to data-scarce countries through feature space reconstruction and adaptive transfer strategies while ensuring multi-level prediction consistency. This idea of borrowing strength across related units is consistent with Bayesian longitudinal approaches that jointly learn population-level trends and individual trajectories [18].

4.2.1. Dynamic Feature Space Reconstruction

DFSR aims to map the source domain (data-rich countries) and target domain (data-scarce countries) to a shared feature space, making knowledge transfer more effective. Considering the multi-level nature of Olympic prediction, DFSR learns specific feature mapping functions for each prediction level.

Given source domain data

D_{s}

and target domain data

D_{t}

, for prediction level l ∈ {country level, sport type level, event level}, DFSR learns a feature mapping function ϕ_l that maps original features x to new features ϕ_l (x). The mapping function is determined by optimizing the following objective:

min_{ϕ_{l}} L_{MMD} (D_{s}^{l}, D_{t}^{l}) + α L_{struct} (D_{s}^{l}, D_{t}^{l}) + β R (ϕ_{l}) + γ L_{consist} (l) + δ L_{adv} (D_{s}^{l}, D_{t}^{l})

(2)

where

L_{MMD}

is the Maximum Mean Discrepancy loss, which reduces distribution differences between source and target domains:

L_{MMD} (D_{s}^{l}, D_{t}^{l}) = {∥\frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} ϕ_{l} (x_{i}^{s}) - \frac{1}{n_{t}} \sum_{j = 1}^{n_{t}} ϕ_{l} (x_{j}^{t})∥}_{H}^{2}

(3)

we use a Gaussian RBF kernel

k (z, z^{'}) = exp (- ∥ z - z^{'} ∥_{2}^{2} / (2 σ^{2}))

, with the bandwidth

σ

set via the median heuristic on pairwise distances in each mini-batch.

L_{adv}

is the adversarial loss, implemented via a domain classifier D to encourage domain-invariant features:

L_{adv} (D_{s}^{l}, D_{t}^{l}) = - \sum_{i = 1}^{n_{s}} log D (ϕ_{l} (x_{i}^{s})) - \sum_{j = 1}^{n_{t}} log (1 - D (ϕ_{l} (x_{j}^{t})))

(4)

L_{struct}

is the structure preservation loss, ensuring that the mapping preserves structural relationships from the original feature space:

L_{struct} (D_{s}^{l}, D_{t}^{l}) = \sum_{i, j} |d (x_{i}, x_{j}) - d (ϕ_{l} (x_{i}), ϕ_{l} (x_{j}))|

(5)

we grid-searched

α \in {0.2, 0.5, 0.8, 1.0, 1.2}

: for each value, models were trained on 1896–2016 and evaluated on the 2020 validation set. We selected the setting that minimized the target-domain MAE;

α = 0.8

was used thereafter. Here,

α

is the weight on the structure-preserving term

L_{struct}

. Similarly, we grid-searched

β \in {0.01, 0.1, 0.5, 1.0}

for the regularization term

R (ϕ_{l})

(implemented as L2 weight decay) and

γ \in {0.5, 1.0, 1.5, 2.0}

for the consistency loss, using the same training–validation setup, selecting

β = 0.1

and

γ = 1.0

based on minimizing MAE. For the adversarial term, we grid-searched

δ \in {0.5, 1.0, 1.5, 2.0}

and selected

δ = 1.0

using the same criterion.

R

is a complexity regularization term preventing overfitting; and

L_{consist}

is the hierarchical consistency loss, ensuring mapping relationships between different levels:

L_{consist} (l) = \sum_{l^{'} \in N (l)} {∥ϕ_{l} (x) - T_{l, l^{'}} (ϕ_{l^{'}} (x))∥}^{2}

(6)

where

N (l)

represents levels adjacent to level l, and

T_{l, l^{'}}

is a level transformation function.

We adopt MMD-based alignment with structural regularization and an adversarial term to explicitly reduce cross-domain distribution shift while preserving task-relevant geometry. This combination is well suited to transferring knowledge from data-rich to data-scarce countries—MMD provides an architecture-agnostic discrepancy reduction [11], while adversarial training promotes domain-invariant features with minimal overhead [8]; the cross-level consistency term keeps the learned space coherent across country/sport/event granularities.

To facilitate understanding, we use the United States (source domain) and Nigeria (target domain) as examples to illustrate the working process of DFSR. In the original feature space, these two countries show significant differences in dimensions such as GDP and participation history, making direct comparison difficult. Through DFSR mapping, we obtain new feature representations that make the two countries more comparable regarding relative advantage event patterns and sports development trajectories. For example, the proportion of track and field events in global medals is about 4.2% for the United States and about 2.1% for Nigeria; after mapping, the distance between the two countries in the “relative sports event layout” dimension is reduced from 0.87 to 0.31, achieving feature space alignment.

We also created visualizations to intuitively understand this process, as shown in Figure 3:

Figure 3 uses t-SNE dimensionality reduction to project high-dimensional features into a two-dimensional space, comparing feature distributions before reconstruction (left) and after reconstruction (right). Blue points represent the source domain (data-rich countries), and red points represent the target domain (data-scarce countries). In Figure 3a, the original feature space, source, and target domain countries show evident separation, with data-rich countries clustered in the upper right region and data-scarce countries scattered in the left and lower regions; in Figure 3b, in the feature space after DFSR reconstruction, the two types of countries show significant overlap, with the distribution difference between source and target domains markedly reduced, especially in the dimensions of event layout and development trajectory. The figure marks position changes of typical countries, such as the United States (US), China (CN), Nigeria (NG), and Cambodia (KH), allowing observation of the relative position changes of these countries in the feature space before and after reconstruction. While t-SNE provides an intuitive visualization, we acknowledge that it is a non-linear technique that may distort global distances; to complement this, we quantitatively assess alignment via the MMD metric, which decreased from 0.45 to 0.12 after reconstruction, confirming effective domain alignment.

To implement the adversarial component as part of the optimization, we introduced the Gradient Reversal Layer (GRL) and Domain-Specific Batch Normalization (DSBN) techniques.

Gradient Reversal Layer: The GRL acts as an identity function in forward propagation but reverses the gradient and multiplies it by coefficient

λ

during backpropagation. By adding a domain classifier D and a GRL, we can implement adversarial training, encouraging the feature extractor to learn domain-invariant features.

Domain-Specific Batch Normalization: Unlike traditional batch normalization that uses the same scaling and shifting parameters for all data, DSBN learns independent normalization parameters for source and target domains:

DSBN (x, d) = γ_{d} \frac{x - μ_{B}}{\sqrt{σ_{B}^{2} + ϵ}} + β_{d}

(7)

where

d \in {s, t}

indicates domain identity,

μ_{B}

and

σ_{B}

are the mean and standard deviation within the batch, respectively, and

γ_{d}

and

β_{d}

are learnable scaling and shifting parameters for domain d.

The GRL encourages domain-invariant representations without altering the forward pass [8], and DSBN mitigates covariate shift by decoupling normalization statistics across domains [9]. Both are lightweight yet effective additions that stabilize transfer under heterogeneous country distributions.

4.2.2. Hierarchical Adaptive Transfer Strategy

The HATS is a core component of the MG-TLC and is responsible for implementing knowledge transfer in the reconstructed feature space. Our HATS method draws inspiration from the bidirectional knowledge transfer framework of Jiang et al. [13] to optimize predictions for data-scarce countries. The HATS is based on two key ideas: first, different country groups should have different transfer strategies; and second, transfer learning should span multiple levels, including the country level, sport type level, and event level.

The HATS first clusters countries (from both source and target domains) into K groups, each representing a typical Olympic development pattern, such as the “all-around type,” “single-event specialization type,” etc. Clustering is performed in the reconstructed feature space using the spectral clustering algorithm:

min_{{C_{1}, . . ., C_{K}}} \sum_{k = 1}^{K} \frac{\sum_{i, j \in C_{k}} w_{i j}}{\sum_{i \in C_{k}, j \notin C_{k}} w_{i j}}

(8)

where

w_{i j}

is the similarity between countries i and j in the reconstructed feature space.

Spectral clustering leverages the affinity graph in the reconstructed space to capture non-linear country similarities; it is well suited to heterogeneous Olympic development patterns and does not presuppose convex clusters. Grouping both source and target countries in this space provides a natural scaffold for targeted transfer within each cluster.

The HATS establishes two models for each cluster C_k: a source domain base model

f_{k}^{s}

and a cross-domain adjustment model g_k. The source domain base model is trained directly on source domain data:

f_{k}^{s} = arg min_{f} \sum_{i \in C_{k} \cap D_{s}} L (f (x_{i}), y_{i})

(9)

The cross-domain adjustment model learns corrections from source domain predictions to target domain true values:

g_{k} = arg min_{g} \sum_{i \in C_{k} \cap D_{t}} L (g (f_{k}^{s} (x_{i})), y_{i})

(10)

During prediction, for a country v in the target domain, the HATS first determines its cluster

C_{k}

and then calculates the predicted value through the source domain base model and cross-domain adjustment model:

{\hat{y}}_{v} = g_{k} (f_{k}^{s} (x_{v}))

(11)

A single transfer model is insufficient for the diverse Olympic landscape. Clustering countries into groups with similar development patterns (e.g., all-around vs. single-event specialists) enables targeted transfer. The source-base model

f_{k}^{s}

captures well-estimated signals from data-rich sources, while the cross-domain adjustment

g_{k}

corrects cluster-specific distributional shifts and biases, allowing data-scarce targets to borrow strength from matched sources without washing out heterogeneity.

For example, consider Georgia (target domain), whose feature pattern is similar to that of “single-event-specialization-type” countries like Bulgaria and Hungary. The HATS assigns Georgia to this cluster and applies the specific transfer strategy for this type of country. In the 2024 prediction, the source domain base model gave an initial prediction of 6.3 medals, and the cross-domain adjustment model corrected it to 8.1 medals based on historical performance, close to the actual result (seven medals).

Figure 4 shows radar charts of feature patterns for five country types in the HATS.

Each radar chart represents the feature pattern of a country type, with axes including economic strength, historical performance, event diversity, gender balance, participation stability, and event specialization index. (a) “All-around-type” countries (e.g., the United States, China) are strong across all dimensions, especially economic strength and historical performance; (b) “single-event-specialization-type” countries (e.g., Georgia, Hungary) have high event specialization indices but low event diversity; (c) “team-event-dominant-type” countries (e.g., Serbia, Croatia) excel at team events; (d) “emerging development-type” countries (e.g., Qatar, Azerbaijan) have strong economic strength but weak historical performance; and (e) “initial exploration-type” countries (e.g., Cambodia, Andorra) are weak across all dimensions but improving in participation. Each type is annotated with 2–3 representative countries as examples.

4.2.3. Multi-Level Prediction Consistency Guarantee Mechanism

To achieve multi-level prediction consistency, we employ a three-stage optimization strategy:

1. Independent Training Stage: Separately train country-level, sport type-level, and event-level prediction models to obtain initial prediction results

{\hat{y}}_{v}^{c t r y, i n i t}

,

{\hat{y}}_{v, c}^{s p t, i n i t}

, and

{\hat{y}}_{v, c, e}^{e v t, i n i t}

.

2. Consistency Optimization Stage: Introduce an explicit hierarchical consistency loss function to optimize models at all three levels jointly:

L_{consist} = λ_{1} \sum_{v} {∥{\hat{y}}_{v}^{ctry} - \sum_{c \in C_{v}} {\hat{y}}_{v, c}^{spt}∥}^{2} + λ_{2} \sum_{v} \sum_{c \in C_{v}} {∥{\hat{y}}_{v, c}^{spt} - \sum_{e \in E_{v, c}} {\hat{y}}_{v, c, e}^{evt}∥}^{2} .

(12)

where

λ_{1}

and

λ_{2}

are tuned via a 2D grid search on the 2020 validation set with

λ_{1} \in {0.4, 0.6, 0.8, 1.0, 1.2}

and

λ_{2} \in {0.2, 0.4, 0.5, 0.6, 0.8}

. The selection criterion is a weighted combination of event-level MAE and the L2 residuals of cross-level consistency (i.e.,

∥ {\hat{y}}_{v}^{ctry} - \sum_{c \in C_{v}} {\hat{y}}_{v, c}^{spt} ∥_{2}

and

∥ {\hat{y}}_{v, c}^{spt} - \sum_{e \in E_{v, c}} {\hat{y}}_{v, c, e}^{evt} ∥_{2}

). The optimal values are

λ_{1} = 0.8

and

λ_{2} = 0.5

.

3. Post-Processing Calibration Stage: To enforce strict hierarchical consistency, we perform a final calibration on the optimized prediction results. We adopt a top-down, importance-weighted adjustment method, where higher-level predictions are assumed to be more robust and are used to calibrate lower-level estimates.

{\hat{y}}_{v, c, e}^{e v t, f i n a l} = {\hat{y}}_{v, c, e}^{evt} \cdot \frac{{\hat{y}}_{v, c}^{spt}}{\sum_{e^{'} \in E_{v, c}} {\hat{y}}_{v, c, e^{'}}^{evt}} \cdot \frac{{\hat{y}}_{v}^{ctry}}{\sum_{c^{'} \in C_{v}} {\hat{y}}_{v, c^{'}}^{spt}}

(13)

Then, sport type-level predictions are recalculated based on the adjusted event-level results:

{\hat{y}}_{v, c}^{s p t, f i n a l} = \sum_{e \in E_{v, c}} {\hat{y}}_{v, c, e}^{e v t, f i n a l}

(14)

The country-level prediction, serving as the anchor, remains unchanged:

{\hat{y}}_{v}^{c t r y, f i n a l} = {\hat{y}}_{v}^{ctry}

(15)

Through this three-stage approach, we ensure that final predictions strictly satisfy hierarchical consistency constraints:

{\hat{y}}_{v}^{c t r y, f i n a l} = \sum_{c \in C_{v}} {\hat{y}}_{v, c}^{s p t, f i n a l} = \sum_{c \in C_{v}} \sum_{e \in E_{v, c}} {\hat{y}}_{v, c, e}^{e v t, f i n a l}

(16)

We acknowledge that this deterministic post hoc scaling represents a design trade-off. While it enforces logical consistency—a critical feature for the usability of the forecast—it may alter the probabilistic calibration of the initial, unconstrained model outputs. However, this trade-off is justified, as the process empirically improves the overall accuracy of the point estimates. On the validation set, after adopting this mechanism, the MAE of event-level predictions decreased by a significant 15.2%. This suggests that the information from more stable, higher-level forecasts provides a valuable constraint that refines the more granular predictions, leading to a net improvement in predictive performance. A detailed analysis of the impact on probabilistic calibration is left for future work.

4.2.4. Olympic Development Stage Feature Analysis

In addition to horizontal knowledge transfer, the MG-TLC also considers vertical development stage features. By establishing an Olympic development stage model, different countries can learn from the experiences of countries at similar stages.

We divide a country’s Olympic development into three stages, the initial stage, growth stage, and mature stage, and extract specific feature patterns for each stage:

Initial Stage: Feature patterns include the participation event count, athlete count, GDP growth rate, and other composite features.
Growth Stage: Feature patterns include the medal growth rate, event diversification index, and other composite features.
Mature Stage: Feature patterns include the medal stability index, event specialization index, and other composite features.

Through time series analysis, we identify which development stage a country is in and apply corresponding feature patterns and transfer strategies. For example, for a target domain country in the initial stage, we can focus on learning from source domain countries that were once in the initial stage and successfully transitioned to the growth stage.

4.2.5. Computational Complexity Analysis

To assess the efficiency of the proposed method, we analyzed the computational complexity of various framework components:

DFSR: Time complexity is $O (n^{2} d)$ , where n is the number of training samples and d is the feature dimension. Space complexity is $O (n d)$ .
HATS: Clustering stage complexity is $O (n^{2} log n)$ , and model training stage complexity is $O (n_{s} \cdot i_{s} + n_{t} \cdot i_{t})$ , where $n_{s}$ and $n_{t}$ are the number of source and target domain samples, respectively, and $i_{s}$ and $i_{t}$ are the corresponding iteration numbers.
BHEV: MCMC sampling complexity is $O (p \cdot m \cdot k)$ , where p is the number of parameters, m is the sample count, and k is the sampling iteration count.
PSA: Cox model training complexity is $O (n \cdot d \cdot i)$ , where i is the optimization iteration count.

Compared to baseline methods, our framework requires more computational resources during the training phase, especially the MCMC sampling process of the BHEV component. However, during the prediction phase, computational complexity reduces to

O (d)

, comparable to other machine learning methods. In our experimental environment (32 GB RAM, NVIDIA RTX 3090), the complete framework’s training time was approximately 3.5 h, while prediction time was only a few seconds, meeting practical application requirements.

4.3. Rare Event Analysis Module

The RE-AM specifically addresses the first medal prediction problem, with the core idea of combining extreme value theory and survival analysis to simultaneously handle “whether a medal will be won” and “when a medal will be won”.

4.3.1. Bayesian Hierarchical Extreme Value Model

The BHEV reduces overfitting risk by introducing a hierarchical Bayesian framework and better captures parameter uncertainty. Our BHEV model was inspired by Melina et al. [19], who reviewed the combination of machine learning and extreme value theory in rare event prediction.

The choice of the Generalized Extreme Value (GEV) distribution is theoretically motivated, as extreme value theory provides a solid foundation for modeling rare occurrences like a country winning its first-ever medal. The BHEV is based on the following assumptions: (1) some zero-medal countries may never win Olympic medals (a zero-inflated component); and (2) for countries that might win medals, the distribution of the time to the first medal is characterized by the GEV distribution, with parameters linked to country-specific features.

The GEV family follows from classical extreme value limit laws and naturally accommodates heavy-tailed, low-frequency outcomes [15]. A hierarchical prior couples country-level tail parameters

(μ_{v}, σ_{v}, ξ_{v})

to enable shrinkage under scarce positives while retaining cross-country heterogeneity, yielding more stable and robust inference for zero-medal countries.

The BHEV adopts a three-layer Bayesian structure:

1.: Data Generation Layer: First medal acquisition time t follows a mixture distribution

$p (t | v) = p_{0} (v) \cdot δ (t = \infty) + (1 - p_{0} (v)) \cdot GEV (t | μ_{v}, σ_{v}, ξ_{v})$

(17)

where $p_{0} (v)$ denotes the probability that country v will never obtain an Olympic medal; $δ (t = \infty)$ is the Dirac delta function centered at infinity, representing the event of the perpetual absence of medals; and GEV is the Generalized Extreme Value distribution parameterized by location $μ_{v}$ , scale $σ_{v}$ , and shape $ξ_{v}$ , characterizing the distribution of medal acquisition times for countries that will eventually obtain medals.
2.: Parameter Generation Layer: GEV distribution parameters $(μ, σ, ξ)$ and $p_{0}$ are generated through country features:

$μ_{v} \sim N (β_{μ}^{T} x_{v}, τ_{μ}^{2})$

(18)

$log (σ_{v}) \sim N (β_{σ}^{T} x_{v}, τ_{σ}^{2})$

(19)

$logit (ξ_{v}) \sim N (β_{ξ}^{T} x_{v}, τ_{ξ}^{2})$

(20)

$logit (p_{0} (v)) \sim N (β_{p}^{T} x_{v}, τ_{p}^{2})$

(21)
3.: Hyperparameter Layer: Prior distributions for regression coefficients $β$ and variance parameters $τ^{2}$ :

$β_{*} \sim N (0, λ_{*}^{- 1} I)$

(22)

$τ_{*}^{2} \sim InvGamma (a_{*}, b_{*})$

(23)

where $λ$ , a, and b are preset hyperparameters. The prior strength $λ$ was selected via a grid over ${0.2, 0.4, 0.6, 0.8, 1.0}$ on the 2020 validation set, using negative log-likelihood (primary) and C-index (secondary); $λ = 0.6$ performed best.

We place weakly-informative Normal priors on the transformed parameters (log

σ

and logit

ξ

) for computational stability and simplicity: operating in an unconstrained space with Gaussian priors yields well-behaved geometry for NUTS sampling and avoids introducing extra hyperparameters (e.g., degrees of freedom for Student-t). Heavier-tailed priors are compatible with our framework, but we keep Normal priors to minimize complexity while retaining adequate robustness via wide variances.

The BHEV performs posterior inference through Markov Chain Monte Carlo (MCMC) methods, simultaneously obtaining parameters’ point estimates and uncertainty intervals. Specifically, the No-U-Turn Sampler (NUTS) implements efficient sampling [20].

To enhance understanding, we use Cambodia as an example to illustrate the application of the BHEV. Cambodia’s feature vector includes a GDP growth rate of 5.1%, athlete participation growth rate of 12.3%, and event diversity index of 0.35. Applying the BHEV model, we first obtain parameter posterior distributions:

\begin{matrix} μ_{Cambodia} & \sim N (2028.6, {1.4}^{2}) \\ log (σ_{Cambodia}) & \sim N (0.75, {0.25}^{2}) \\ logit (ξ_{Cambodia}) & \sim N (- 0.3, {0.15}^{2}) \\ logit (p_{0} (Cambodia)) & \sim N (- 0.76, {0.38}^{2}) \end{matrix}

Drawing samples from these distributions, we can determine that Cambodia has a 0.683 probability of winning its first medal in 2028 and a 0.317 probability of never winning a medal. The BHEV not only provides point estimates but also gives complete probability distributions, quantifying prediction uncertainty. For example, the 95% confidence interval for Cambodia winning a medal at the 2028 Olympics is [0.452, 0.819].

To more intuitively understand the Bayesian hierarchical extreme value model, we plotted the distribution of historical first medal acquisition times and the BHEV model fit, as shown in Figure 5.

The histogram shows the distribution of the number of Olympic editions a country participates in before winning its first medal. The curve represents the theoretical GEV distribution fitted by the BHEV model. The empirical distribution exhibits a clear right-skewed and long-tailed shape, which is visually consistent with the characteristics of a GEV distribution, thus providing empirical support for our model choice.

The figure highlights typical cases: most countries (about 45%) win their first medal after participating in 1–3 editions; about 25% of countries need 4–6 editions; about 15% of countries need 7–10 editions; and a few countries (about 15%) require more than 10 editions to win their first medal. Special markers show some country cases: South Korea (two editions), Brazil (three editions), Nigeria (five editions), Qatar (eight editions), and Bahrain (twelve editions). The small figure on the right shows the five zero-medal countries with the highest predicted probability of winning their first medal in 2028 and their confidence intervals.

4.3.2. Piecewise Survival Analysis

PSA is the second core component of the RE-AM, focusing on analyzing how medal-winning hazard rates change over time. PSA views the first medal acquisition problem as a survival analysis problem, where “survival” refers to the state of not having won a medal and “event” refers to winning the first medal.

PSA adopts a Cox proportional hazards model and introduces time-varying covariates:

h (t | X (t)) = h_{0} (t) exp (β^{T} X (t))

(24)

where

h_{0} (t)

is the baseline hazard function,

X (t)

is a feature vector that changes over time, and

β

is a coefficient vector.

A standard Cox model relies on the proportional hazards (PH) assumption, which posits that the effect of a covariate is constant over time. Recognizing that this assumption may be too strict for the long-term, dynamic process of national sports development, we specifically employ a more flexible framework. By using a piecewise model with time-varying covariates, we effectively relax the strict PH assumption. This design allows the impact of features to differ across various time segments, enabling the model to capture the non-proportional hazards inherent in the data and thus enhancing its robustness.

Time-varying covariates are crucial in first medal prediction, as factors such as the GDP growth rate and participation event count change over time, affecting a country’s chance of winning its first medal. PSA divides time into segments, allowing these features to have different effects in different periods:

X (t) = \{\begin{matrix} X_{1}, & if t \in [0, t_{1}) \\ X_{2}, & if t \in [t_{1}, t_{2}) \\ ⋮ \\ X_{m}, & if t \in [t_{m - 1}, \infty) \end{matrix}

(25)

We identified key time-varying features, such as participation intensity growth, GDP growth rate, and gender balance changes, which evolve as countries deepen their Olympic participation and significantly impact first medal acquisition.

A piecewise Cox model relaxes the strict proportional hazards assumption and aligns the time scale with the quadrennial Olympic cycle, allowing covariate effects to vary across segments and improving robustness over long spans [17].

Using the Maldives as an example to illustrate PSA application, we analyzed its time-varying feature changes from 2000 to 2024. To ensure a clear and consistent time scale, our analysis uses the Olympic edition as the fundamental time unit. We found that the Maldives’ participation intensity growth rate increased from 7.5% to 18.3%, and gender balance improved from 0.32 to 0.78. Based on the PSA model, the Maldives’ “medal-winning hazard rate” per Olympic Games rose from 0.021 in 2000 to 0.103 in 2024, corresponding to a 0.476 probability of winning a medal in 2028. Through PSA, we can capture trends in hazard rates over time, providing a dynamic perspective for prediction.

We plotted the medal-winning hazard rate curves over time (2000–2028) for typical zero-medal countries, as shown in Figure 6. The horizontal axis represents the calendar year, while the vertical axis represents the medal-winning hazard rate per Olympic Games.

Different colored lines represent different countries: the Maldives (blue), Cambodia (red), Andorra (green), and Luxembourg (purple). Data from 2000–2024 represents historical data fitting, while 2028 (dashed portion) represents predicted values. The figure shows that the Maldives’ hazard rate rapidly increased after 2012; Cambodia saw accelerated growth in 2016; Andorra maintained relatively stable growth; and Luxembourg showed fluctuating upward trends. The shaded area around each curve represents the 95% confidence interval. Annotations in the figure show key events influencing hazard rate changes, such as the Maldives’ sports reform plan implemented in 2016 and Cambodia’s increased sports investment in 2018.

4.3.3. Model Fusion and Calibration

The RE-AM combines predictions from the BHEV and PSA through model fusion and probability calibration to obtain more accurate first medal predictions:

P (T \leq t | X) = w_{1} \cdot P_{BHEV} (T \leq t | X) + w_{2} \cdot P_{PSA} (T \leq t | X)

(26)

where

w_{1}

and

w_{2}

are weights determined based on validation set performance.

To ensure prediction probability accuracy, we use isotonic regression for probability calibration [21]:

P_{cal} (Y = 1 | p) = g (p)

(27)

where p is the probability predicted by the model, and g is a monotonically non-decreasing function learned through isotonic regression.

Due to the special nature of extreme events, simple calibration may not be sufficient to handle long-tail distributions. We introduced an extreme value calibration method, with particular attention to calibration accuracy at high quantiles:

L_{cal} = \sum_{i = 1}^{n} {(y_{i} - P_{cal} (Y = 1 | p_{i}))}^{2} + λ \sum_{p_{i} > τ} {(y_{i} - P_{cal} (Y = 1 | p_{i}))}^{2}

(28)

where

τ

is a probability threshold, and

λ

is an additional penalty weight for high probability regions; this tail-aware calibration aligns with predictive peaks-over-threshold (POT) formulations for calibrated exceedance probabilities and predictive densities [16].

4.4. Feature Design Principles

Our feature design follows five core principles: multi-dimensionality, time sensitivity, cross-scale properties, transferability, and hierarchical relevance.

Multi-Dimensionality: The feature system encompasses multiple dimensions, including economic development, demographic structure, sports systems, historical performance, and time series, comprehensively representing a country’s Olympic development status. Different dimensional features vary with a country’s development stage; for example, economic and demographic features are more important for countries in the initial stage, while historical performance and sports structure features are more influential for countries in the mature stage.

Time Sensitivity: The Olympics are held every four years, and during these intervals, countries’ economic and sports development status significantly changes. We designed dynamic time window features capturing changes at different time scales: short-term windows (4–8 years) capture recent trends, medium-term windows (12–20 years) reflect stable development patterns, and long-term windows (24+ years) represent historical accumulation effects.

Cross-Scale Properties: Olympic prediction involves three levels: country, sport type, and specific event. We designed cross-scale features, such as the impact of country-level GDP on event-level performance and the feedback of event-level historical results on country-level predictions, ensuring complementarity and consistency among features at different levels.

Transferability: To promote knowledge transfer, we focused on which features have better transferability. For example, economic structure features have similar influence patterns across different countries and are easy to transfer, while historical performance features are more country-specific with weaker transferability. We enhanced the weight of transferable features through feature space reconstruction, improving transfer learning effects.

Hierarchical Relevance: Countries at different development stages should focus on different feature combinations. We designed specific feature patterns for countries in initial, growth, and mature stages, and considered development stage similarity in the transfer learning process, making knowledge transfer more targeted and effective.

Remark 1

(Design properties). Permutation invariance. Let A denote a set aggregation operator (for example, mean, sum, or median) that is commutative and associative. For any permutation π of countries

{x_{i}}_{i = 1}^{n}

, we have

A ({x_{i}}) = A ({x_{π (i)}})

.

Scale robustness under standardization. With standardized features

z = (x - μ) / σ

, any positive rescaling

x^{'} = c x

(a unit or currency change) yields

z^{'} = (x^{'} - c μ) / (c σ) = z

.

These design-level properties align with the symmetry notions defined in the Introduction.

5. Experimental Setup

This section details the experimental setup, including data sources and processing, feature engineering implementation, evaluation metrics, and baseline methods.

5.1. Data Sources and Processing

This study uses complete Olympic data from 1896 to 2024, covering 33 Summer Olympic Games, more than 200 countries/regions, and containing all athlete information and medal distribution data. GDP data comes from the Maddison Project Database (1896–1959) and the World Bank Database (1960–2024) [22].

Data Cleaning Rules: We paid special attention to the problem of chaotic event classification in the early Olympic Games, adopting the following rules for processing:

Event Standardization: Using the Olympic event classification system established by the International Olympic Committee in 2012, we reclassified all historical events. For example, gymnastic competitions originally counted as part of “athletics” in the early years were reclassified into the “gymnastics” event.
Demonstration Event Processing: Many “demonstration events” in the 1900–1908 Olympic Games awarded medals at the time but are not now considered official competitions, such as the balloon competition at the 1900 Paris Olympics. We only retained medal records for events currently officially recognized by the International Olympic Committee.
National Entity Change Processing:
- We established a historical country mapping table to handle country splits, mergers, name changes, etc.
- For dissolved countries like the Soviet Union and Yugoslavia, we established historical continuity scores based on factors such as athletes, coaches, and training facilities to determine inheritance relationships
- For countries after independence, only post-independence records were counted, not including history from the original combined country
Special Case Processing:
- The 1906 Athens “Intercalated Games” are not officially recognized by the International Olympic Committee, so the data was excluded.
- The 1916, 1940, and 1944 Olympics canceled due to wars were treated as missing values in time series analysis.
- Data anomalies in the 1980 Moscow and 1984 Los Angeles Olympics caused by political boycotts were processed with special markers.

5.2. Feature Engineering Implementation

Based on the previously mentioned feature design principles, we constructed the following specific features.

5.2.1. Economic and Demographic Features

For economic features, we examined GDP-related indicators for countries, including the absolute GDP value, GDP growth rate, and GDP volatility. The absolute GDP value reflects economic scale, GDP growth rate reflects development momentum, and GDP volatility reflects economic stability through the standard deviation of growth rates. For demographic features, we selected three key indicators: GDP per capita, total population, and population growth rate, respectively reflecting resource level, talent base, and population development trends. These economic and demographic features form the macroeconomic foundation for a country’s Olympic development.

5.2.2. Sports Structure Features

1.: Event Diversity Index: Using Shannon entropy [23] to calculate the distribution diversity of a country’s participation in events:

$H = - \sum_{i} p_{i} log (p_{i})$

(29)

where $p_{i}$ is a country’s participation in event i (such as the proportion of athletes in that event to total athletes).
2.: Event Specialization Index: Using the Herfindahl index to calculate medal concentration:

$H H I = \sum_{i} s_{i}^{2}$

(30)

where $s_{i}$ is the proportion of medals the country won in event i to its total medals.
3.: Gender Balance: Calculating the ratio of male to female athletes, reflecting gender inclusivity:

$G B = 1 - |\frac{N_{m} - N_{f}}{N_{m} + N_{f}}|$

(31)

where $N_{m}$ and $N_{f}$ are the numbers of male and female athletes, respectively.
4.: Revealed Comparative Advantage (RCA): Identifying a country’s advantage events:

$R C A_{i j} = \frac{M_{i j} / M_{i}}{M_{j} / M_{T}}$

(32)

where $M_{i j}$ is the medal count of country i in event j, $M_{i}$ is the total medal count of country i, $M_{j}$ is the global medal count for event j, and $M_{T}$ is the global total medal count.
5.: Athlete Participation: Total number of participating athletes and growth rate.

5.2.3. Historical Performance Features

Historical performance features are constructed from three dimensions: medal trends, participation stability, and special situational effects. Medal trends include historical medal counts and growth rates, average performance in the last three editions, historical best performance, and gold/silver/bronze medal ratio structure; participation stability includes participation continuity, event stability, the period from first participation to the present, and event expansion speed; and special situational effects mainly consider home advantage effects (medal boost for host countries is approximately 30%). Participation stability features are particularly important for data-scarce countries because, although these countries have few medals, their participation patterns may already show precursors to future success.

5.2.4. Time Series Features

Time series features are mainly divided into decomposition features and dependency features. Decomposition features extract long-term trend components, periodic patterns, and residual fluctuations through time series decomposition, respectively reflecting overall development direction, periodic changes, and uncertainty; dependency features include medal sequence autocorrelation coefficients and host year features, the former measuring continuity of performance across editions, the latter analyzing special performance patterns in host years. These features capture static performance and reveal dynamic development patterns, providing a basis for predicting future trends.

5.2.5. Development Stage Features

Based on historical medal counts and participation, we classify countries into three stages, the initial stage, growth stage, and mature stage, corresponding to countries with medal counts less than threshold

T_{1}

, between

T_{1}

and

T_{2}

with rapid growth, and above

T_{2}

with relative stability. Each stage focuses on different feature combinations: the initial stage focuses on the participation event count, athlete count, and GDP growth rate; the growth stage focuses on the medal growth rate and event diversification index; the mature stage focuses on the medal stability index and event specialization index. After considering development stage features, the MAE for data-scarce countries decreased by approximately 12.3%.

5.2.6. Breakthrough Features

Breakthrough features designed for the first medal prediction task mainly focus on participation intensity growth, calculated as follows:

I G_{athletes} = \frac{N_{t} - N_{t - 1}}{N_{t - 1}}

(33)

I G_{events} = \frac{E_{t} - E_{t - 1}}{E_{t - 1}}

(34)

where

N_{t}

and

E_{t}

are the numbers of athletes and participation events in edition t, respectively. We found that countries’ participation intensity growth rates averaged over 85% of their historical peak values in the two editions before winning their first medal. This feature makes it excellent for identifying countries about to win their first medal.

5.2.7. Feature Correlation Analysis

We conducted a correlation analysis on all features collected in our study to understand their interrelationships. For clarity of visualization, Figure 7 presents a correlation heatmap showing only selected representative features from each category, including GDP, population size, event diversity index, etc.

The colors in the figure range from deep blue (strong negative correlation) to deep red (strong positive correlation). Economic features show high positive correlations with historical performance features, verifying the influence of economic foundation on Olympic performance. Development stage features display moderate correlations with sports structure features, reflecting the coordinated evolution between a country’s Olympic development stage and sports structure. The low correlations observed between certain features indicate that they capture information from different dimensions.

In the transfer learning process, we found that combinations of high-correlation features (such as GDP and historical medal counts) and low-correlation features (such as event specialization index and gender balance) could maximize knowledge transfer effects, becoming an important basis for HATS module design.

5.3. Evaluation Metrics and Baseline Methods

Experimental Setup: Data was divided into a training set (1896–2016), validation set (2020), and test set (2024). This division reflects the actual prediction scenario of using historical data to predict future Olympics and provides a validation basis for hyperparameter tuning and model selection.

Evaluation Metrics: The main evaluation metrics include Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Area Under the Receiver Operating Characteristic Curve (AUC), and Area Under the Precision–Recall Curve (PR-AUC). We also use the Concordance index (C-index) to evaluate time prediction accuracy.

Baseline Methods: We selected several types of methods as baselines for comparison. For economic models, we used the GDP model based solely on GDP and population for medal prediction, and the Bernard & Busse model [1], combining GDP, population, and home advantage effects. Machine learning models included the standard random forest, XGBoost gradient boosting tree model, and DeepFM deep factorization machine. For transfer learning, we adopted TrAdaBoost [10], adaptive boosting transfer learning, and Domain-Adversarial Neural Networks (DANNs) [8]. We included the zero-inflated Poisson (ZI-Poisson) model and Long-Zero Phase Hazard (LZPH) model for rare event modeling. All baseline methods used the same features and training/testing data division to ensure fair comparison.

Significance testing: We assessed per-metric improvements with paired tests over the test set countries (n defined as the number of paired country-level observations for that metric). Reported p-values in Table 2 were adjusted for the four parallel metrics using Holm’s method; nonparametric Wilcoxon signed-rank tests provided consistent conclusions.

6. Experimental Results and Analysis

6.1. Overall Performance Comparison

Table 2 shows the overall performance comparison of different methods on the test set. Our method significantly outperforms baseline methods on all metrics, and statistical significance tests indicate that these differences are statistically meaningful.

As shown in Table 2, our method outperforms the strongest baselines across all metrics: overall MAE is reduced by 11.9% (vs. DANN), the MAE for data-scarce countries by 25.7% (vs. DANN), and the first medal AUC and PR-AUC by 11.4% and 34.0% (vs. LZPH), respectively. Paired t-tests confirm that all gains are significant at the 0.01 level; nonparametric Wilcoxon signed-rank tests lead to the same conclusion, indicating that the improvements do not rely on distributional assumptions. This consistency across tests supports the effectiveness of combining multi-granularity transfer learning with extreme value analysis. All p-values reported for Table 2 are Holm-adjusted across the four metrics; n denotes the number of paired test set countries for each metric (thus

d f = n - 1

for the paired t-tests).

We next connect these statistical findings to the design-level invariance checks implemented in our pipeline.

In addition, consistent with the design properties described in the Methods Section, reordering countries within aggregation windows leaves set-level predictions unchanged, and unit or currency conversions leave standardized features numerically unchanged; we include deterministic implementation checks for both during preprocessing and aggregation.

6.2. Feature Importance Analysis

For clarity, “data-rich” countries in our analysis roughly correspond to medal-rich countries, while “data-scarce” countries correspond to medal-poor ones; accordingly, the observed importance patterns are contrasted across these two groups.

We analyzed the importance of features through feature ablation experiments to understand the contributions of different feature categories to predictions. Our experimental results demonstrate that economic and historical performance features are the most important for data-rich countries. For data-scarce countries, sports structure features and time series features are more crucial. For the first medal prediction of zero-medal countries, participation intensity growth and the event diversity index are the most impactful features. This indicates that predictions for different types of countries should focus on different feature dimensions, validating the necessity of a multi-granularity approach. Having established the importance of a comprehensive feature set, we ablate the core architectural components of our model in the following sections.

All component ablations presented in Section 6.3, Section 6.4 and Section 6.5 reuse the same Olympic-cycle splits (train up to 2016, validate on 2020, and test on 2024) and the identical early-stopping rule. To avoid confounds, we only grid-search the weights of the ablated component (with others fixed), and we apply the same multiple-comparison correction as in the main results for significance testing.

6.3. Multi-Granularity Transfer Learning Effect Analysis

To ablate the contributions of our transfer components in the MG-TLC, we compare variants with vs. without each module—disabling kernel–MMD alignment, removing the structure preservation loss, and removing the cross-level consistency term (each by zeroing its loss weight)—while keeping all other settings unchanged. We also report strategy-level baselines (No Transfer, Direct Transfer, TrAdaBoost, DANN, and HATS) under the same splits for completeness. Quantified results and significance (using the identical Olympic-cycle splits, early stopping, and multiple-comparison correction as in the main results) are reported in Table 3.

Results show that the HATS significantly improves transfer learning effects, and performance improves after adding the GRL and DSBN. Improvements are particularly notable when handling countries with large distribution differences between source and target domains. Statistical significance tests show that all transfer strategies significantly improve compared to “No Transfer,” with the HATS and HATS + GRL + DSBN showing significant improvements at the 0.01 level.

We further analyzed the impact of different cluster numbers on HATS performance. We determined the optimal number of clusters K by performing a grid search over a range of K from 2 to 10. For each K, we evaluated the model’s overall Mean Absolute Error (MAE) on the validation set. On the validation set, performance is optimal with cluster number

K = 5

, corresponding to five country development patterns: “all-around-type,” “single-event-specialization-type,” “team-event-dominant-type,” “emerging development-type,” and “initial exploration-type.” This clustering based on development patterns is more effective than simple geographic or economic clustering, validating the HATS’s fine-grained modeling capability for Olympic country characteristics.

6.4. Bayesian Hierarchical Extreme Value Model Analysis

We conduct an ablation study for the Rare Event Analysis Module (RE-AM) by contrasting variants with vs. without the EV/survival branch (removing the BHEV/PSA vs. enabling them) and by comparing alternative EV specifications (the Standard GEV, Zero-Inflated GEV, BHEV, and BHEV + PSA) under identical splits and early stopping. Table 4 shows comparison results for the first medal probability prediction, with AUC differences assessed for statistical significance using the DeLong test.

We plotted the ROC and PR curves for the first medal prediction, as shown in Figure 8. The left figure shows ROC curves, with the False Positive Rate (FPR) on the horizontal axis and the True Positive Rate (TPR) on the vertical axis; the right figure shows PR curves, with recall on the horizontal axis and precision on the vertical axis. Each figure contains four curves representing different methods: the Standard GEV (green), Zero-Inflated GEV (blue), BHEV (purple), and BHEV + PSA (red). The figures note the area under each curve: the BHEV + PSA achieves an AUC of 0.833 and PR-AUC of 0.583, significantly higher than other methods. The dashed lines in both figures represent baselines for random guessing (ROC figure) and positive class proportion in the dataset (PR figure).

Particularly note the PR curve: since first medals are rare events, PR curves better reflect model performance on minority classes than ROC curves. Our method shows more pronounced advantages on the PR curve, indicating that it achieves higher recall while maintaining high precision.

Results show that the Bayesian hierarchical structure significantly improves the prediction accuracy of extreme value models, reducing overfitting risk. After combining with PSA, predictive capability further improves, especially in predicting medal-winning times. Through DeLong tests, the AUC improvements in the BHEV and BHEV + PSA are significant at the 0.01 level, indicating that improvements are systematic rather than coincidental.

Notably, the BHEV model can capture different first medal acquisition patterns across countries. For example, some small countries with special sports traditions often have their first medals in specific events, such as short-distance running for Jamaica or judo for Georgia. The BHEV effectively captures these event-specific patterns through its Bayesian hierarchical structure, improving prediction accuracy.

6.5. Consistency Guarantee Mechanism Effect Analysis

To quantify the effect of the multi-level consistency guarantee, we perform an ablation analysis by comparing the model with vs. without the consistency term, using the identical Olympic-cycle splits, early-stopping rule, and significance protocol as in the main results. Table 5 reports validation set comparisons.

Results show that the complete consistency guarantee mechanism ensures hierarchical consistency of predictions and improves prediction accuracy at all levels, with improvements particularly significant at the event and sport type levels. This validates the effectiveness of our proposed three-stage optimization strategy.

A typical example is France’s prediction for the 2024 Paris Olympics. We created a comparison chart of consistency guarantee mechanism effects, as shown in Figure 9:

The left side shows prediction results without the consistency mechanism: country-level prediction is 62 medals (blue), the sum of sport type-level predictions is 68.5 medals (green), and the sum of event-level predictions is 71.2 medals (orange), with evident inconsistency among the three levels. The right side shows prediction results after using the complete consistency mechanism: predictions at all three levels are 65 medals (purple), strictly satisfying consistency constraints. The dashed line represents France’s actual medal count in 2024 (64 medals). The bar charts below show prediction breakdowns by major sport types: without the consistency mechanism, swimming events are predicted at 12.5 medals while the actual count was 11, and athletics events are predicted at 9.8 medals while the actual count was 10; after using the consistency mechanism, predictions for all sport types are closer to actual values. This demonstrates that consistency constraints ensure logical consistency of predictions and improve overall prediction accuracy through inter-level information complementarity.

6.6. Case Study: 2024 Prediction Review

To intuitively demonstrate our model’s predictive capability, we plotted a comparison scatter plot of predicted versus actual values for the 2024 Olympics, as shown in Figure 10:

The horizontal axis represents predicted medal counts, and the vertical axis represents actual medal counts, with points ideally distributed along the Y = X line (black dashed line). Different colors represent different types of countries: data-rich countries (blue, >100 historical medals), medium-data countries (green, 10–100 historical medals), data-scarce countries (orange, <10 historical medals), and zero-medal countries (red). The small inset in the upper right corner shows an enlarged view of the region with medal counts <10, where predictions for data-scarce countries generally fall close to the Y = X line. The lower right corner of the figure presents the Mean Absolute Error (MAE) for each type of country—4.3 medals for data-rich countries, 2.1 medals for medium-data countries, and 0.9 medals for data-scarce countries—indicating that the model has good predictive capability for different types of countries, especially for data-scarce countries. The figure also highlights some special cases where predictions deviated from actual results, such as France (host country), Australia (under-predicted by seven medals), and Rwanda (first medal not predicted).

Our framework successfully predicted medal counts for all countries participating in the 2024 Paris Olympics, with high accuracy. Table 6 lists prediction versus actual results for selected countries. Due to space limitations, we present only a representative subset from the complete prediction results, including top medal-winning nations and several data-scarce countries, to illustrate the model’s consistent performance across different data conditions.

Our model maintained high accuracy in predictions for most countries. For traditionally strong countries, the average error is around four medals; for data-scarce countries, the average error decreases to within one medal. Notably, we successfully predicted San Marino would maintain its medal, but failed to predict Rwanda’s first medal (probability predicted as 0.32, not reaching the 0.5 threshold).

Analysis of prediction failures is particularly important. Rwanda won its first Olympic medal in 2024 (bronze in women’s 48 kg judo), but our model gave a winning probability of only 0.32, not reaching the 0.5 prediction threshold. This represents a typical “black swan event,” posing a significant challenge to prediction models based on historical patterns.

In-depth analysis reveals that Rwanda’s uniqueness manifests in three main aspects:

Anomalous Development Trajectory: Rwanda’s sports investment showed a leap-like growth during 2020–2024 (289% higher than the historical average growth rate), a sudden change exceeding the predictive capability based on historical patterns.
Policy Support Intensity: Rwanda’s government launched the “Vision 2030” sports development plan in 2022, providing beyond-expected resource input, information not fully reflected in our feature engineering.
Individual Athlete Factors: The medal-winning athlete’s personal development trajectory differed significantly from the country’s overall development pattern, reflecting the importance of individual factors in rare event prediction.

This case provides valuable insights: future work should strengthen the ability to identify anomalous development patterns, enhance model sensitivity to sudden policy changes and individual factors, and consider adjusting the rare event model threshold from 0.5 to a lower value (such as 0.3) to improve detection capability for these non-conventional development paths.

6.7. Sensitivity Analysis and Model Stability

We conducted a detailed sensitivity analysis to assess model sensitivity to hyperparameter changes. Table 7 shows the impact of key hyperparameters on model performance.

Results show that the model is most sensitive to the prior strength

λ

in the BHEV, with an optimal value of 0.6. When

λ

is too small, the model easily overfits; predictive capability decreases when it is too large. Performance is optimal with cluster number

K = 5

, validating the rationality of our selection. The optimal values for MMD weight

α

and consistency weight

λ_{1}

are 0.8.

Overall, the model performs stably within reasonable parameter ranges, and even with suboptimal settings, performance remains significantly better than baseline methods, demonstrating the robustness and generalization capability of the proposed framework.

6.8. 2028 Los Angeles Olympics Prediction

Based on our framework, we generated comprehensive medal predictions for all countries expected to participate in the 2028 Los Angeles Olympics. Table 8 presents prediction results for selected representative countries. Similar to the 2024 analysis, we include only a subset of countries from our complete prediction results due to space constraints, while ensuring representation across different country categories.

We also predicted zero-medal countries that might win their first medal in 2028. Table 9 shows the five countries with the highest first medal probabilities.

These predictions provide useful guidance for relevant countries and sports organizations, which can be used to formulate targeted preparation and development strategies.

7. Conclusions and Future Work

In this paper, we propose a prediction framework integrating multi-granularity transfer learning with extreme value theory, providing a new solution for cold-start and rare event prediction problems in Olympic medal forecasting. Experimental results show that the framework significantly outperforms existing methods in prediction accuracy for data-scarce countries and rare event prediction capability. The introduction of gradient reversal and Domain-Specific Batch Normalization further improves transfer learning effects, while the Bayesian hierarchical extreme value model effectively reduces overfitting risk in extreme value analysis.

Compared with prior work, our framework offers significant advancements. Traditional economic models, such as Bernard and Busse [1], and standard machine learning approaches, like the random forest/XGBoost pipelines used by Zhu et al. [3], show limited capability for data-scarce countries. Our Multi-Granularity Transfer Learning Core (MG-TLC) directly addresses this cold-start limitation, achieving a

25.7 %

reduction in MAE for these nations relative to the strongest baseline (see Table 2). Furthermore, prior work has not specifically modeled the “first medal” phenomenon as a rare event; our Rare Event Analysis Module (RE-AM), which combines extreme value theory with survival analysis, lifts the first medal AUC to

0.833

—an

11.4 %

improvement over the best rare event baseline (LZPH, AUC

0.748

; see Table 4)—and also improves PR-AUC to

0.583

(a

34.0 %

gain over LZPH). Finally, while our study builds on general transfer-learning ideas, it introduces a cluster-aware, feature-space alignment with a hierarchy-consistent objective that resolves prediction conflicts across country/sport/event granularities—an aspect not adequately handled in previous Olympic prediction or hierarchical transfer settings [8,10].

From a symmetry standpoint, this study formalizes task-relevant invariances, namely permutation invariance in set-based country aggregation and scale robustness via per-capita and ratio features, and operationalizes them through distributional alignment between data-rich and data-scarce domains and Olympic-cycle indexing in discrete time. Symmetry-breaking factors such as host advantage and event program changes are modeled explicitly by a rare event component that combines extreme value and survival modeling, ensuring that the main conclusions are not driven by unit, labeling, or alignment artifacts. As a natural extension, future work will report concept-aligned checks, for example, rank correlations and calibration deltas under unit conversions or label permutations, to complement these design-level guarantees.

Key findings include the following: (1) prediction for data-scarce countries requires transferring knowledge from countries with similar development trajectories; (2) first medal acquisition times follow extreme value distribution characteristics with evident long-tail properties; (3) multi-level prediction consistency is crucial for accurate prediction; and (4) countries at different development stages should focus on different feature dimensions.

Our framework provides country-level, sport type-level, and event-level predictions, together with first medal probabilities for zero-medal countries (see Section 6 and the associated tables). These outputs are actionable for three stakeholder groups: (i) National Olympic Committees and sports ministries, who can prioritize disciplines and allocate training budgets toward sport types with improving participation intensity and rising medal prospects; (ii) national federations and coaching staffs, who can plan qualification pathways and peak-year staffing using event-level predictions and hierarchy-consistency signals; and (iii) development programs in emerging or resource-constrained countries, who can identify feasible “entry-point” disciplines based on first medal probabilities and participation trajectories. This mapping clarifies who uses which outputs to make what decisions, without requiring additional data or new assumptions.

While the framework was designed for Olympic prediction, its underlying principles may offer insights for problems that share data scarcity and rare event characteristics. Any cross-domain use would require domain-specific adaptation and light retraining (e.g., feature re-engineering and covariate specification). To illustrate potential directions, the conceptual approach could be explored for selected problems in the following areas:

Economics: For analyzing development trajectories of emerging markets with limited historical data (e.g., indicators in data-scarce regions).
Healthcare: For risk or early-signal analysis of rare diseases in regions with scarce monitoring data.
Natural Disaster Analysis: For modeling extreme weather event risks (e.g., once-in-a-century floods or hurricanes) in areas with sparse historical records.
Finance: For studying volatility patterns in newly listed companies or emerging financial markets with limited histories.

We do not claim general applicability; feasibility in each domain would need to be verified through objective, domain-appropriate evaluation protocols in future work.

Due to data limitations, we could not incorporate potentially valuable features, such as detailed sports investment data and effects from neighboring regional countries. We will explore broader data sources and richer features in future work and extend the model to more cold-start and rare event prediction scenarios. As a straightforward extension, we will incorporate public proxies of facility intensity (e.g., venue density or team-event shares) and report group-wise errors for high- vs. low-cost sports. Meanwhile, we also plan to use this framework to provide fine-grained predictions and in-depth analysis for the 2028 Los Angeles Olympics.

Author Contributions

Conceptualization, Y.W. and Y.F.; methodology, Y.W.; software, Y.W.; validation, Y.W., Y.F. and Q.Z.; formal analysis, Y.W.; investigation, Y.W.; resources, Y.F.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, Y.F. and Q.Z.; visualization, Y.W.; supervision, Q.Z.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (12101090), the Sichuan Natural Science Foundation (2023NSFSC0071 and 2023NSFSC1362), the Sichuan Province Science and Technology Support Program (2023ZYD0001 and 2021ZYD0009), the Chengdu University of Information Technology Science and Technology Innovation Capability Enhancement Plan Innovation Team Key Project (KYTD202322 and KYTD202226), the General Projects of Local Science Technology Development Funds Guided by the Central Government (2022ZYD0005), and the Talent Introduction Program of Chengdu University of Information Technology (KYTZ202185).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to copyright and licensing restrictions imposed by the International Olympic Committee and licensed third-party data providers.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

Appendix note. This pseudocode gives the step-by-step procedure that operationalizes the joint objective described in Section 4; it corresponds to the feedback loop used for fine-tuning.

Algorithm A1 Joint training of MG-TLC (DFSR + HATS + Consistency) and RE-AM

Require: Historical data $D_{s}$ (source), $D_{t}$ (target), engineered features X
Ensure: Trained MG-TLC and RE-AM models
- Phase 1: Initialize and train MG-TLC (feature mappers ϕ_l)
- 1: Initialize $ϕ_{l}$ for $l \in {country, sport, event}$ , domain classifier D; enable DSBN
- 2: for $e = 1$ to $E_{init}$ do
- 3: for mini-batch $(B_{s}, B_{t})$ do
- 4: $L_{DFSR}^{l} \leftarrow L_{MMD} (B_{s}^{l}, B_{t}^{l}) + α L_{struct} (B_{s}^{l}, B_{t}^{l}) + β R (ϕ_{l}) + γ L_{consist} (l) + L_{adv} (D (ϕ_{l} (\cdot)))$
- 5: Update $ϕ_{l}$ w.r.t. $\nabla_{ϕ_{l}} L_{DFSR}^{l}$ ▹ GRL applies coefficient $λ$ to the gradient hitting $ϕ_{l}$
- 6: Update D by minimizing domain classification loss (standard direction)
- 7: end for
- 8: if $e mod T = 0$ then ▹ periodic refresh due to clustering cost
- 9: Recompute features; build k-NN graph in reconstructed space; spectral clustering with fixed $K = 5$ to obtain ${C_{k}}$
- 10: Train $f_{k}^{s}$ on $D_{s} \cap C_{k}$ ; freeze $f_{k}^{s}$ and fit $g_{k}$ on $D_{t} \cap C_{k}$
- 11: end if
- 12: Validate on 2020 set for early stopping and model selection
- 13: end for
- Phase 2: Train RE-AM (rare-event analysis)
- 14: Use frozen MG-TLC to produce multi-level predictions ${\hat{y}}_{v, t, c, e}$ and summary counts $μ_{v} \leftarrow \sum_{c, e} {\hat{y}}_{v, t, c, e}$
- 15: Fit RE-AM (BHEV/PSA) on $(X, μ_{v})$ to obtain $p_{v} = P (first medal ∣ X)$
- Phase 3: Joint fine-tuning with feedback
- 16: for $j = 1$ to $E_{ft}$ do
- 17: for all $v \in Z$ do ▹ Z: countries with zero historical medals
- 18: $μ_{v} \leftarrow \sum_{c, e} {\hat{y}}_{v, t, c, e}$ ▹ predicted medal count
- 19: $q_{v} \leftarrow 1 - exp (- μ_{v})$ ▹ Poisson link: prob(at least one medal)
- 20: $L_{fb} (v) \leftarrow - [p_{v} log q_{v} + (1 - p_{v}) log (1 - q_{v})]$
- 21: end for
- 22: $L_{final} \leftarrow L_{predict} + λ_{fb} \sum_{v \in Z} L_{fb} (v)$
- 23: Update MG-TLC parameters by minimizing $L_{final}$
- 24: end for

Appendix A.2

Appendix note. This flowchart summarizes the operation process (inputs → MG-TLC → consistency → outputs) and the rare event path (RE-AM); the feedback arrow corresponds to Phase 3 in Algorithm A1.

Figure A1. Research flowchart corresponding to Algorithm A1.

References

Bernard, A.B.; Busse, M.R. Who wins the Olympic Games: Economic resources and medal totals. Rev. Econ. Stat. 2004, 86, 413–417. [Google Scholar] [CrossRef]
Johnson, D.K.N.; Ali, A. A tale of two seasons: Participation and medal counts at the Summer and Winter Olympic Games. Soc. Sci. Q. 2004, 85, 974–993. [Google Scholar] [CrossRef]
Zhu, Z.; Sheng, J.; Yu, M.; Liang, C. Application of Data-Driven Prediction and Strategic Optimization in Olympic Medal Distribution. Int. J. Adv. Sci. 2025, 1. [Google Scholar] [CrossRef]
Forrest, D.; Sanz, I.; Tena, J.D. Forecasting national team medal totals at the Summer Olympic Games. Int. J. Forecast. 2010, 26, 576–588. [Google Scholar] [CrossRef]
Moolchandani, J.; Chole, V.; Sahu, S.; Kumar, R.; Shukla, A.; Kumar, A. Predictive Analytics in Sports: Using Machine Learning to Forecast Outcomes and Medal Tally Trends at the 2024 Summer Olympics. In Proceedings of the 2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 13–15 November 2024; IEEE: New York, NY, USA, 2024; pp. 1987–1992. [Google Scholar]
Huimin, S.H.I.; Dongying, Z.; Yonghui, Z. Can Olympic Medals Be Predicted? Based on the Interpretable Machine Learning Perspective. J. Shanghai Univ. Sport 2024, 48, 26–36. [Google Scholar]
Lui, H.K.; Suen, W. Men, money, and medals: An econometric analysis of the Olympic Games. Pac. Econ. Rev. 2008, 13, 1–16. [Google Scholar] [CrossRef]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
Chang, W.G.; You, T.; Seo, S.; Kwak, S.; Han, B. Domain-specific batch normalization for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7354–7362. [Google Scholar]
Dai, W.; Yang, Q.; Xue, G.R.; Yu, Y. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 193–200. [Google Scholar]
Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning PMLR, Lille, France, 6–11 July 2015; pp. 97–105. [Google Scholar]
Zhao, W.; Zhao, H. Hierarchical long-tailed classification based on multi-granularity knowledge transfer driven by multi-scale feature fusion. Pattern Recognit. 2024, 145, 109842. [Google Scholar] [CrossRef]
Jiang, J.; Yang, J.; Zhang, W.; Zhang, H. Hierarchical multi-granularity classification based on bidirectional knowledge transfer. Multimed. Syst. 2024, 30, 207. [Google Scholar] [CrossRef]
Chen, M.; Li, L. Hierarchical Transfer Learning with Transformers to Improve Semantic Segmentation in Remote Sensing Land Use. Remote Sens. 2025, 17, 290. [Google Scholar] [CrossRef]
Coles, S.; Bawa, J.; Trenner, L.; Dorazio, P. An Introduction to Statistical Modeling of Extreme Values; Springer: London, UK, 2001. [Google Scholar]
Padoan, S.A.; Rizzelli, S. Statistical Prediction of Peaks over a Threshold. arXiv 2025, arXiv:2504.04602. [Google Scholar] [CrossRef]
Kleinbaum, D.G.; Klein, M. Survival Analysis a Self-Learning Text; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
Griffin, J.E.; Hinoveanu, L.C.; Hopker, J.G. Bayesian modelling of elite sporting performance with large databases. J. Quant. Anal. Sport. 2022, 18, 253–267. [Google Scholar] [CrossRef]
Melina, M.; Sukono; Napitupulu, H.; Mohamed, N.A. Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review. Big Data 2025, 13, 161–180. [Google Scholar] [CrossRef] [PubMed]
Hoffman, M.D.; Gelman, A. The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
Niculescu-Mizil, A.; Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 625–632. [Google Scholar]
World Bank. World Development Indicators; The World Bank: Washington, DC, USA, 2024; Available online: https://databank.worldbank.org/source/world-development-indicators (accessed on 16 May 2025).
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]

Figure 1. Key Olympic metrics trends (1896–2024).

Figure 2. Detailed framework architecture.

Figure 3. Dynamic Feature Space Reconstruction (DFSR) effect visualization.

Figure 4. Feature pattern radar charts for five country types in HATS.

Figure 5. Historical first medal acquisition time distribution and BHEV model fit.

Figure 6. Medal-winning hazard rate curves over time for typical zero-medal countries (2000–2028).

Figure 7. Correlation heatmap of main features.

Figure 8. ROC and PR curves for first medal prediction.

Figure 9. Consistency guarantee mechanism effect comparison (France).

Figure 10. The 2024 Olympics predicted vs. actual medal count comparison scatter plot.

Table 1. Notation.

Symbol	Meaning
$R$	Set of data-rich countries
$S$	Set of data-scarce countries
$Z$	Set of zero-medal countries
$M$	Set of countries obtaining medals in the future
$C_{v}$	Set of sport types country v participates in
$E_{v, c}$	Set of specific events country v participates in for sport type c
$X_{v}$	Feature vector of country v
$y_{v, t}^{ctry}$	Total medal count of country v in Olympics t
$y_{v, t, c}^{spt}$	Medal count of country v in sport type c at Olympics t
$y_{v, t, c, e}^{evt}$	Medal count of country v in specific event e of sport type c at Olympics t
$T_{v}$	Time when country v obtains its first medal
$ϕ_{l}$	Feature mapping function for prediction level l
$D_{s}$ , $D_{t}$	Source domain data and target domain data
$L$	Loss function
$λ$ , $α$ , $β$ , $γ$	Loss function weight parameters
$μ_{v}$ , $σ_{v}$ , $ξ_{v}$	GEV distribution parameters for country v
$p_{0} (v)$	Probability that country v never wins a medal
$h (t \| X (t))$	Conditional hazard function

Table 2. Overall performance comparison.

Method	Overall MAE	Data-Scarce Country MAE	First Medal AUC	First Medal PR-AUC
GDP Model	8.73	3.62	-	-
Bernard & Busse	7.42	3.18	-	-
Random Forest	5.31	2.84	0.682	0.312
XGBoost	4.96	2.67	0.704	0.385
DeepFM	4.83	2.52	0.718	0.402
TrAdaBoost	4.59	2.43	0.702	0.376
DANN	4.37	2.21	0.715	0.394
ZI-Poisson	-	-	0.729	0.412
LZPH	-	-	0.748	0.435
Our Method	3.85 **	1.65 **	0.833 **	0.583 **
Improvement Percentage	11.9%	25.7%	11.4%	34.0%

**: Holm-adjusted p_adj < 0.01 (paired tests vs. the strongest baseline; df = n − 1, where n is the number of paired test set countries per metric). Wilcoxon signed-rank tests lead to the same conclusion.

Table 3. Transfer learning strategy comparison (data-scarce country MAE).

Method	No Transfer	Direct Transfer	TrAdaBoost	DANN	HATS	HATS + GRL + DSBN
MAE	2.84	2.67 *	2.43 *	2.21 **	1.91 **	1.65 **

*: p < 0.05; **: p < 0.01, compared with “No Transfer”.

Table 4. Extreme value model comparison (first medal prediction).

Method	Standard GEV	Zero-Inflated GEV	BHEV	BHEV + PSA
AUC	0.683	0.729 *	0.796 **	0.833 **
PR-AUC	0.378	0.412 *	0.527 **	0.583 **
C-index	0.625	0.648 *	0.712 **	0.764 **

*: p < 0.05; **: p < 0.01, compared with Standard GEV.

Table 5. Consistency guarantee mechanism effect comparison (validation set MAE).

Method	Country-Level MAE	Sport Type-Level MAE	Event-Level MAE
Without Consistency Mechanism	4.32	1.87	0.92
With Consistency Loss Only	4.18 *	1.63 **	0.84 **
Complete Consistency Mechanism	4.15 *	1.58 **	0.78 **

*: p < 0.05; **: p < 0.01, compared with “Without Consistency Mechanism”.

Table 6. The 2024 Paris Olympics prediction cases.

Country	Predicted Medals	Actual Medals	Absolute Error
United States	124	126	2
China	87	91	4
Japan	52	47	5
France	65	64	1
Australia	46	53	7
…	…	…	…
Georgia	8	7	1
Ethiopia	5	4	1
Puerto Rico	2	2	0
San Marino	1	1	0
Rwanda	0	1	1

Table 7. Key hyperparameter sensitivity analysis (bold indicates optimal value).

Hyperparameter	Minimum Value	Optimal Value	Maximum Value	Performance Range (MAE)
MMD Weight ( $α$ )	0.2	0.8	1.2	1.65–1.83
Cluster Number (K)	3	5	7	1.65–1.78
BHEV Prior Strength ( $λ$ )	0.2	0.6	1.0	1.65–1.86
Consistency Weight ( $λ_{1}$ )	0.4	0.8	1.2	1.65–1.79

Table 8. The 2028 Los Angeles Olympics prediction.

Country	Predicted Medals	Gold Prediction	Silver Prediction	Bronze Prediction
United States	133	42	45	46
China	95	38	32	25
Japan	54	18	17	19
Australia	56	19	18	19
…	…	…	…	…
Seychelles	0	0	0	0
Maldives	0	0	0	0

Table 9. The 2028 first medal prediction (zero-medal countries).

Country	First Medal Probability	Predicted Sport Type	95% Confidence Interval
Cambodia	0.683	Taekwondo	[0.452, 0.819]
Maldives	0.476	Swimming	[0.328, 0.612]
Luxembourg	0.452	Cycling	[0.287, 0.597]
Andorra	0.417	Shooting	[0.262, 0.573]
Nepal	0.385	Archery	[0.241, 0.536]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Fei, Y.; Zhang, Q. A Symmetry-Aware Predictive Framework for Olympic Cold-Start Problems and Rare Events Based on Multi-Granularity Transfer Learning and Extreme Value Analysis. Symmetry 2025, 17, 1791. https://doi.org/10.3390/sym17111791

AMA Style

Wang Y, Fei Y, Zhang Q. A Symmetry-Aware Predictive Framework for Olympic Cold-Start Problems and Rare Events Based on Multi-Granularity Transfer Learning and Extreme Value Analysis. Symmetry. 2025; 17(11):1791. https://doi.org/10.3390/sym17111791

Chicago/Turabian Style

Wang, Yanan, Yi Fei, and Qiuyan Zhang. 2025. "A Symmetry-Aware Predictive Framework for Olympic Cold-Start Problems and Rare Events Based on Multi-Granularity Transfer Learning and Extreme Value Analysis" Symmetry 17, no. 11: 1791. https://doi.org/10.3390/sym17111791

APA Style

Wang, Y., Fei, Y., & Zhang, Q. (2025). A Symmetry-Aware Predictive Framework for Olympic Cold-Start Problems and Rare Events Based on Multi-Granularity Transfer Learning and Extreme Value Analysis. Symmetry, 17(11), 1791. https://doi.org/10.3390/sym17111791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Symmetry-Aware Predictive Framework for Olympic Cold-Start Problems and Rare Events Based on Multi-Granularity Transfer Learning and Extreme Value Analysis

Abstract

1. Introduction

2. Related Work

2.1. Olympic Prediction Research

2.2. Transfer Learning

2.3. Extreme Value Theory and Rare Event Prediction

3. Preliminaries and Problem Definition

3.1. Notation

3.2. Problem Definition

3.3. Data Challenges

4. Proposed Method

4.1. Framework Overview

4.2. Multi-Granularity Transfer Learning Core

4.2.1. Dynamic Feature Space Reconstruction

4.2.2. Hierarchical Adaptive Transfer Strategy

4.2.3. Multi-Level Prediction Consistency Guarantee Mechanism

4.2.4. Olympic Development Stage Feature Analysis

4.2.5. Computational Complexity Analysis

4.3. Rare Event Analysis Module

4.3.1. Bayesian Hierarchical Extreme Value Model

4.3.2. Piecewise Survival Analysis

4.3.3. Model Fusion and Calibration

4.4. Feature Design Principles

5. Experimental Setup

5.1. Data Sources and Processing

5.2. Feature Engineering Implementation

5.2.1. Economic and Demographic Features

5.2.2. Sports Structure Features

5.2.3. Historical Performance Features

5.2.4. Time Series Features

5.2.5. Development Stage Features

5.2.6. Breakthrough Features

5.2.7. Feature Correlation Analysis

5.3. Evaluation Metrics and Baseline Methods

6. Experimental Results and Analysis

6.1. Overall Performance Comparison

6.2. Feature Importance Analysis

6.3. Multi-Granularity Transfer Learning Effect Analysis

6.4. Bayesian Hierarchical Extreme Value Model Analysis

6.5. Consistency Guarantee Mechanism Effect Analysis

6.6. Case Study: 2024 Prediction Review

6.7. Sensitivity Analysis and Model Stability

6.8. 2028 Los Angeles Olympics Prediction

7. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI