Tennis Game Dynamic Prediction Model Based on Players’ Momentum

Wang, Lechuan; Chen, Puning; Sabir, Qurat Ul An

doi:10.3390/appliedmath5030077

Open AccessArticle

Tennis Game Dynamic Prediction Model Based on Players’ Momentum

by

Lechuan Wang

^1,2,

Puning Chen

³ and

Qurat Ul An Sabir

^2,*

¹

School of Statistics and Data Science, Capital University of Economics and Business, Beijing 100026, China

²

Department of Mathematics, University of Arizona, Tucson, AZ 85721, USA

³

School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

AppliedMath 2025, 5(3), 77; https://doi.org/10.3390/appliedmath5030077 (registering DOI)

Submission received: 17 May 2025 / Revised: 17 June 2025 / Accepted: 20 June 2025 / Published: 26 June 2025

Download

Browse Figures

Versions Notes

Abstract

Psychological momentum dynamics in tennis have triggered interest for a long time, but measuring their impact presents substantial obstacles. In this paper, we present an approach to quantify momentum that combines real-time winning probabilities, leverage, and an exponentially weighted moving average (EWMA). We test the method on a high-profile match between Carlos Alcaraz and Novak Djokovic, demonstrating how changes in leverage affect momentum. Furthermore, we use feature extraction methods from time series analysis to derive momentum-related characteristics, which are critical inputs for creating an eXtreme Gradient Boosting (XGBoost) binary classification model to predict game winners. The algorithm has an average accuracy of 84% and provides real-time predictions of each player’s chances of winning the match. Our findings indicate that momentum is a somewhat relevant element in forecasting match outcomes, highlighting its potential value in improving match prediction systems.

Keywords:

tennis; momentum; quantification; match prediction; XGBoost

1. Introduction

Professional tennis matches require a great combination of the player’s technical performance, tactics, endurance, and psychological conditions. Many players have profound skills, but it is hard to win the match due to hidden factors. Therefore, it is important to develop models to help players and coaches formulate corresponding game strategies to improve the winning rate.

Momentum in sports refers to the positive or negative shifts in cognition, emotions, physiology, and behavior that result from an event or series of events, potentially influencing performance and competition outcomes [1]. This concept is often described as the “Hot Hand” or being “In the Zone”.

Previous studies have shown that tactical momentum and psychological momentum both exist in tennis matches, especially at some clutch points. Tactical Momentum (TM) is the advantage gained as a result of a strategy or tactical adjustment employed by a player or team during a match. On the other hand, Psychological Momentum (PM) refers to psychological energy that affects players’ performance when it is positive or harmful [2,3], When a player is controlling the match, the positive PM will provide a physiological boost involving a positive change in activity level, rhythm, posture, or frequency [1]. It is harder to quantify.

Specifically, previous research mainly focused on proving the existence of PM. Dietl and Nesseler used the binary approach and established the OLS and Logit model to demonstrate the effect of PM [2]. Through analyzed break-serve points and serve points, statistical hypothesis tests were also used to verify the significance of players’ performance before and after key points [4]. Based on the results of the theoretical model, direct testing, and indirect testing, Depken et al. concluded that the historical score has some influence on the current winning percentage [5].

There are also studies focused on the prediction of the tennis match outcome before the match. Gu et al. employ the Analytic Network Process (ANP) to combine tangible and intangible variables to better assess match outcomes. The model’s overall prediction accuracy reached 84%, significantly higher than the accuracy of traditional statistical methods, which was around 70% [6].

However, limited research has been conducted to combine real-time momentum and match outcome prediction. Focusing on this gap is significant and helpful in developing players’ tactics, mindset, and performance.

In this paper, our analysis mainly focuses on PM. First, we develop the tennis momentum quantification methodology and apply it to predict the outcome of matches. Based on the changes in winning and losing probability, we calculate momentum and provide the visualization of momentum, which is crucial for the analysis of a match.

In our match outcomes prediction, we establish the classification model of machine learning and use it to predict the game winner. Additionally, we calculate several error indicators to evaluate the accuracy of the model and interpret whether the momentum is an important factor in the prediction.

In this study, we address the gap in combining real-time momentum quantification with match outcome prediction in professional tennis. The thesis of our research is that psychological momentum, when quantified dynamically using real-time match data, significantly enhances the accuracy of game outcome predictions. The object of the study is the dynamic flow of tennis matches, focusing on men’s and women’s singles at major tournaments (Wimbledon 2023 and Paris 2024 Olympics). The subject is the psychological momentum of players, measured through winning probabilities, leverage, and an Exponentially Weighted Moving Average (EWMA). Our work includes (1) developing a methodology to quantify momentum, (2) creating visualizations to analyze momentum shifts, (3) training an eXtreme Gradient Boosting (XGBoost) model to predict game winners based on momentum and other features, and (4) evaluating the model’s performance across multiple matches. The purpose is to provide a robust, real-time prediction system that aids players and coaches in strategic decision-making, thereby improving performance and contributing to the advancement of professional tennis analytics.

This approach offers practical applications for players and coaches, enabling them to analyze momentum trends, adjust tactics during matches, and prepare for opponents. By integrating real-time data with machine learning, our study aims to enhance the understanding of psychological factors in tennis and promote data-driven strategies in competitive sports.

2. Data and Methods

2.1. Data Collection

Source 1: Wimbledon 2023

The data is collected from the playing record of Men’s singles at Wimbledon 2023. This tournament includes a lot of surprises, especially in the final match, Carlos Alcaraz against Novak Djokovic. The dramatic match flow impresses everyone and seems to imply the existence of the concept of momentum in tennis. As a result, we collect the data from this tournament to analyze the player’s momentum.

The dataset contains 31 matches in total and recorded detailed information on each point for every match in this tournament [7]. Furthermore, according to Colino et al, the surface can affect players’ performance significantly hence, our analysis mainly fits the grassland surface [8].

Source 2: Paris 2024 Olympics

To further illustrate our model and test whether our model can be applied to Women’s matches, we collect the data from the match of Zheng Qinwen against Donna Vekić (Table 1). We record the data by watching the match and filling them into Excel forms [9].

2.2. Methods

To quantify the momentum, our core idea is based on probabilities of serving and receiving. To reflect the dynamic flow, we elicit the quantity of leverage and use it to compute momentum [10]. After that, the momentum graph can significantly reflect the psychological and tactical changes. Finally, we extract and select features to establish the eXtreme Gradient Boosting (XGBoost) classification model to predict players’ winning rates in the current game (as shown in Figure 1) [11].

2.2.1. Real-Time Winning Probability

First of all, due to the characteristics of tennis, the server has significant advantages in winning the current point which should be reflected in the model [12].

In accordance with the Experiments of Depken et al. [5], we set window size = 20 as the denominator. We calculate the winning probability for each player at a specific point based on the now and previous 19 points served and received successful rates. When player A is serving, there are two situations, mathematically:

\begin{matrix} P_{1} = P_{A S r v W i n} = P_{B R c v L o s e} = \frac{n_{A P r i v i o u s S r v W i n} + 1}{20} \end{matrix}

(1)

\begin{matrix} P_{2} = P_{A S r v L o s e} = P_{B R c v W i n} = \frac{n_{B P r i v i o u s R c v W i n} + 1}{20}, \end{matrix}

(2)

where n is the number of wins, P is the probability, and

P_{1} + P_{2} = 1

.

2.2.2. Leverage Based on Counterfactual Prediction Framework

The counterfactual prediction framework can reflect the dynamic flow. The main idea is to calculate the “leverage”. Leverage is a measurement that refers to how the probability will vary after winning or losing a point [10]. We define the leverage as a positive quantity, and the player will gain the leverage only if he wins the current point.

Mathematically, the player’s leverage is as follows:

\begin{matrix} L_{t} = \{\begin{matrix} l_{t}, i f l_{t} > 0 \\ 0, i f l_{t} \leq 0 \end{matrix}, \end{matrix}

(3)

where

\begin{matrix} l_{t} = \{\begin{matrix} P_{w i n} (t) - P_{l o s e} (t), i f p l a y e r w i n s t h e p o i n t \\ 0, i f p l a y e r l o s e s t h e p o i n t \end{matrix} . \end{matrix}

(4)

2.2.3. Momentum and Visualization

To convert discrete leverage values into a continuous momentum flow, we apply an Exponentially Weighted Moving Average (EWMA) [13] to the leverage gained by a player. The momentum is defined as follows:

\begin{matrix} M_{X} (t) = \frac{L_{t} + (1 - α) L_{t - 1} + {(1 - α)}^{2} L_{t - 2} + \dots + {(1 - α)}^{t - 1} L_{1}}{1 + (1 - α) + {(1 - α)}^{2} + \dots + {(1 - α)}^{t - 1}}, \end{matrix}

(5)

where

L_{t}

is the leverage at point

t

, and

α

is the smoothing factor. The large

α

implies the large magnitude of smoothness. A player gains leverage when he wins a point, and it decreases when he loses. Recent points are weighted more in the momentum calculation [13].

3. Momentum Quantification Results

This section presents the results of applying our momentum quantification methodology to the Wimbledon 2023 final (Carlos Alcaraz vs. Novak Djokovic, 3:2) and the Paris 2024 Olympics women’s singles match (Zheng Qinwen vs. Donna Vekić, 2:0).

3.1. Wimbledon 2023 Final

Based on our calculation, the winning probability of each player is shown in Figure 2.

In Figure 2, the blue line refers to Player 1, Carlos Alcaraz, and the orange line refers to Player 2, Novak Djokovic. When one player’s winning probability is rising, the other’s will drop. It is obvious to figure out which player has more chance to win the current point. However, the momentum should reflect the changes in the flow. For example, if the player who has been trailing suddenly wins several crucial points in a row, even if they are still behind in the overall score, the momentum shift can provide them with a psychological boost. Hence, we elicit the concept of “leverage” next.

For visualization purposes, we assign the two players’ leverage to the opposite sign (positive versus negative). The leverage gained by each player can be shown in Figure 3. The blue and orange bars are the leverage of Alcaraz and Djokovic, respectively, and L_t = 0 is the turning point.

As shown in Figure 4, we start with

α = 1

. When we increase the magnitude of

α

, the momentum graph tends to display more subtle details and level changes of a match. On the other hand, we may not be able to capture the main trend of the match if

α

is too large. For visualization purposes, we utilize

α = 3.4

after all the calculations. The blue area stands for Alcaraz’s momentum, and the orange area stands for Djokovic’s momentum. When the player’s momentum is approximately zero but not zero, he is still not in the total downdraft. On the other hand, a momentum swing is more likely to occur when one player demonstrates positive momentum while the other’s momentum approaches zero. In that case, the player has less pressure from his rival and has more chance to win or swing the match.

The momentum graph is generally consistent with the trend of the match. To be more specific, the situations of score versus momentum can be described as follows.

Djokovic’s Dominance

In the first set, Djokovic seemed destined to win easily as he dominated 6–1. The orange area from points 0 to 45 shows Djokovic’s momentum, despite fluctuations, was significantly overwhelming Alcaraz.

First Swing

The second set was tense and finally won by Alcaraz in a tie-breaker 7–6. The curve section from points 46 to 139 shows the momentum of both players fluctuating frequently towards the end, reflecting the intensity of the situation.

Alcaraz’s Fight Back

In the third set, Djokovic struggled with a low momentum, while Alcaraz continued his high momentum. From points 140 to 209, the curve shows that, after some initial intersections, the young Spaniard’s momentum significantly leads over Djokovic, mirroring the actual scoreline.

Second Swing

In the final two sets, our model showcased its superiority. In the fourth set, the young Spaniard seemed in total control as the set started, but Djokovic took complete control to win the set 6–3. This is reflected in our model by the orange curve overcoming the blue.

Alcaraz’s Victory

Carrying the edge from the fourth set, Djokovic seemed poised to maintain the lead, but a shift occurred, and Alcaraz gained control, winning 6–4. In our model, the blue curve overtakes the orange in the midsection, eventually surpassing Djokovic’s representation.

As a result, the momentum somewhat reflects the match’s scoreline. Compared with the current scoring condition, momentum has a more evident trend with our quantification. It can be used to analyze the player and his rival in a tennis match.

Utilizing our momentum graph is crucial. For example, we can analyze the important factors that affect the momentum swing (or turning point).

We find that the break-serve point affects the momentum significantly, as shown in Figure 5. The blue points stand for Alcaraz’s break points, and the orange points stand for Djokovic’s. The sets and game boundaries are all marked in the plot. It is obvious that when a player wins the break-serve point, he is more likely to have a rapid increase in his momentum, especially for Alcaraz. As a result, break-serve points play an important role in players’ wins in the next few points.

In this situation, the momentum shift is crucial because it not only impacts the immediate game but can also influence the overall match dynamics. Carlos’s recent performance might make Novak more cautious, causing him to adjust his strategy and potentially leading to a different outcome than expected based solely on the current score.

3.2. 2024 Paris Olympics Tennis Women’s Single

To demonstrate the usefulness of Women Single, we record the data by watching the match and filling them into Excel forms. Our Momentum Quantification model performs well (Figure 6), and we can interpret the momentum graph as follows.

First Set Opening Sparks: Zheng Qinwen’s Momentum

Right from the start, Zheng Qinwen took control of the match, showing remarkable momentum. The momentum graph in the first 20 points clearly reveals a significant rise in the blue area, marking Zheng’s early dominance. Her serves were powerful, and her baseline returns were precise, launching a series of continuous attacks. Meanwhile, Donna Vekić (represented by the orange area) was in a negative momentum state, struggling to respond to Zheng’s intense aggression and falling into a defensive position. Vekić’s momentum briefly showed a slight increase but quickly dipped again, illustrating her struggle in the opening phase. Zheng’s consistency and proactive gameplay allowed her to control the pace entirely in the early phase, while Vekić found it difficult to break through, frequently on the defensive.

Mid-Set Battle: Vekić’s Brief Counterattack

As the match progressed, Donna Vekić gradually stabilized, with some momentum fluctuations appearing between points 30 and 40. Although Zheng Qinwen still maintained an advantage, Vekić displayed stronger resistance in certain rallies, with her momentum slightly rising as she attempted to shift the tide. During this stage, both players engaged in several intense baseline exchanges, creating a fiercely competitive rally at times, with Vekić managing to counterattack in key points. However, it is worth noting that Vekić often struggled to recover quickly whenever her momentum dipped, which frequently disrupted her rhythm. In contrast, Zheng Qinwen demonstrated a remarkable ability to regain control after brief momentum drops, showcasing better adaptability and emotional resilience.

Late in the First Set: Zheng Qinwen’s Momentum Rebound

In the closing phase of the first set (around points 45 to 60), Zheng Qinwen’s momentum showed a noticeable rebound. During this period, she gradually regained her rhythm, demonstrating a strong ability to adjust and respond. Around point 45, Zheng’s momentum curve began to rise from a low point, signaling her effort to recalibrate and recover her dominance toward the end of the first set.

Meanwhile, Donna Vekić’s momentum declined. Although she had shown some dominance earlier, she lost control during this phase. Vekić seemed somewhat strained on the court, unable to sustain the momentum she had built, which allowed Zheng to gradually overtake her at crucial moments. Although Zheng’s momentum did not reach its peak, the recovery was sufficient to help her consolidate her advantage by the end of the first set, laying a solid foundation for the entire match.

Opening of the Second Set: Zheng Qinwen Extends Her Advantage

Entering the second set, Zheng Qinwen’s momentum remained strong. Between points 60 and 80, her momentum curve consistently stayed in the positive range, indicating that she continued to control the pace in the second set. Meanwhile, Donna Vekić struggled to break through; her momentum remained low, hovering in the negative range without any significant improvement. In this phase, Zheng demonstrated excellent stability, leveraging her powerful serves and precise baseline shots to once again put Vekić on the back foot.

Tense Rally: Vekić Throws Her Racket in Frustration

At points 110 to 111, the match reached a highlight with an intense rally that captivated everyone’s attention. Both players engaged in a long baseline exchange, each shot filled with power and accuracy. Zheng Qinwen and Donna Vekić repeatedly traded attacks, neither willing to concede. Spectators held their breath as they watched the fierce back-and-forth. In the end, Zheng clinched the point with an aggressive backhand shot. Frustrated, Donna Vekić lost her cool, furiously throwing her racket on the ground to vent her disappointment. This episode not only highlighted the pressure Vekić felt during crucial points but also showed her difficulty in regaining momentum after it dropped. In contrast, Zheng thrived in such intense moments, showcasing her resilience and composure.

End of the Second Set: Zheng Qinwen’s Steady Victory

After this intense exchange, Zheng Qinwen’s momentum gradually recovered and continued to rise. From points 100 to 120, her momentum consistently remained positive, securing her control of the match. Although Donna Vekić tried to counterattack during this period, she could not break Zheng’s dominance. With calmness and precision, Zheng closed out the match in a stable and decisive manner, ultimately claiming a clear victory.

4. Game Winner Prediction

We selected XGBoost over other forecasting models, such as logistic regression, random forest, and support vector machines (SVM), due to its superior performance in handling complex, non-linear relationships and its robustness to noisy, high-dimensional datasets [11]. XGBoost’s gradient boosting framework iteratively improves predictions by minimizing a loss function, offering better accuracy than logistic regression, which assumes linear relationships. Compared to random forest, XGBoost provides greater flexibility in tuning hyperparameters and captures sequential dependencies more effectively, which is critical for time-series-like tennis data. In a preliminary benchmark using our dataset, XGBoost achieved an accuracy of 85.2% in five-fold cross-validation, outperforming random forest (82.1%), logistic regression (78.3%), and SVM (80.5%). Additionally, XGBoost’s interpretability through feature importance scores (e.g., SHAP values) facilitates analysis of momentum’s role in predictions, making it ideal for this study.

4.1. Model Training

4.1.1. Training and Testing Set Splitting

The target variable for the XGBoost binary classification model is the game winner, encoded as 0 for Player 1 winning the game and 1 for Player 2 (the rival) winning the game. To assess the class distribution, we analyzed the training dataset, which contains 31 matches. The class distribution was approximately balanced, with 52% of games won by Player 1 and 48% by Player 2, indicating no significant class imbalance. This balance ensures that the model’s performance metrics, such as accuracy and F1 score, are not biased by skewed class representation.

All 31 matches are used in our model establishment. As shown in Figure 7, we select 28 matches to train our model and 3 matches to test the model. In the training sets, we proceed with a five-fold cross-validation to turn the best parameters.

4.1.2. Feature Engineering

To optimize the XGBoost model’s performance, we selected a subset of variables based on domain knowledge and statistical analysis, focusing on features that capture match dynamics and player performance. The selected variables contain:

Momentum: M(t) = M₁(t) − M₂(t);
Distance Run Difference (D_DR): p1_distance_run–p2_distance_run;
Served Score (SrvScr): the cumulative points won when p1 served in the game;
Received Score (RcvScr): the cumulative points won when p1 received in the game;
Score Difference (D_Scr): p1_score–p2_score;
Game won Difference (D_Gm): p1_game–p2_game;
Others: Serve (Srv), Set number (St), Game number (Gm), Point number (Pt), and Point Victor (PtVct).

These variables were chosen for their relevance to momentum and match outcomes, as informed by prior studies [5,10] and correlation analysis with the target variable (game winner).

To evaluate the impact of variable selection, we compared the model’s performance using all available variables (e.g., including ace, break point won, double fault, and rally count) versus the selected subset. Using a five-fold cross-validation, the model with all variables achieved an accuracy of 82.5%, while the selected subset improved accuracy to 85.2% (Table 2). This improvement suggests that the selected variables better capture relevant patterns without introducing noise from less impactful features.

The time series decomposition was applied after initial variable selection to capture temporal patterns in the selected variables, such as momentum fluctuations and score trends, which are critical for modeling dynamic match flow. This two-stage process—selecting raw variables followed by deriving temporal features—is a standard approach in feature engineering to ensure that only relevant variables are processed, reducing computational complexity and noise. To maintain fairness and avoid bias, we used time series decomposition with a fixed window size of five points to generate features like mean, kurtosis, skewness, numerical derivative mean, and numerical integral. Subsequently, we applied feature selection using XGBoost’s feature importance scores to retain only the most predictive features, ensuring an unbiased selection process.

We performed parameter optimization for the XGBoost model using Bayesian optimization (BO) [14] to identify the best hyperparameter configuration. The optimized parameters included learning rate, maximum tree depth, minimum child weight, and subsample ratio, which were tuned to maximize model accuracy on the validation set. We combined BO results with domain expertise to finalize the parameter settings, ensuring robust performance across the five-fold cross-validation process.

Subsequently, we obtained the best parameter combination and selected suitable features. The feature engineering process is outlined in the flowchart provided below (Figure 8).

As shown in Figure 9, at each step, we calculate the real-time features and long-term features with a window size of 5. That’s why we call the model a dynamic model. Using this method, our model can take into account the subtle changes and information in a match. In the next part, we will evaluate which feature is more important.

4.2. Model Accuracy

4.2.1. Training Set Accuracy

We compute the Accuracy, Precision, Recall, F1 score, and AUC value [15,16], with a five-fold CV(coefficient of variation) [17]. The result in Table 2 suggests that our model performs well with an accuracy of 85.2026%, and other metrics are all greater than 80%, which ensures good performance in the training set.

To analyze the model’s accuracy, we generate the confusion matrix, as shown in Figure 10. In the confusion matrix, “0” stands for Player 1 winning, and “1” stands for Player 2 (rival) winning. The True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN) sample numbers are all shown in the confusion matrix. The number of TP cases is similar to TN cases, but the FP number is slightly higher than FN, which is acceptable.

In the right plot of Figure 10, the dashed line stands for the random situation, and the green line is the ROC curve. In ideal situations, the ROC curve should be as far away from the dashed line as possible. From the plot, we acquire an ROC curve that is smooth and close to the upper left, which implies that the model performs well.

After training the model, we analyze the importance of the feature. As shown in the SHapley Additive exPlanation (SHAP) value [18] graph (Figure 11), momentum is the top fifth important feature in the XGBoost classification model. Its SHAP value is 0.215892. However, the serve is the most important factor for the game winner, with the dominant SHAP value.

As a result, momentum is a medium important factor in winning the game in general. The most important features are the serve, different scores, and point victor; hence, the model can identify the serve wins and receive wins by highly important features. As a result, serving and receiving are the key factors that determine the outcome of a match. Psychological factors are not the dominant factor for a game winner, but they play a role and contribute to the prediction.

4.2.2. Testing Set Accuracy

In the testing set, our model performs well with an accuracy of 79.1367%, and the recall value is 86% (Table 2). A high Recall rate indicates that the model performs well in detecting positive samples, which can help coaches make conservative decisions. However, the accuracy is lower than the validation set, which means there may still be hidden factors that are out of our model (Figure 12).

Our model is impacted by unexpected factors, which bring some uncertainty to the prediction, as psychological reversal [5] suggested by Depken et al. “Psychological Reversal (PR)” refers to a significant change in an athlete’s mental state during a competition, resulting in a performance that is contrary to the previous trend.

To be more specific, as Figure 13 suggested, if the performance of two players has significant differences, their momentum will reflect their future performance. However, the performance of our model will drop when PR happens. Most of the time, by the end of the game, the model can correct its earlier incorrect predictions. Our model becomes more accurate as the game goes on.

To assess the strength of our model, we divided the data into training, validation, and testing sets across five different splits. Each time, we calculated metrics to evaluate model accuracy and then computed the average performance. The resulting data suggests a high degree of reliability, as we can see in Table 3.

A visual depiction of the metrics is shown below. In Figure 14, the blue and red bars represent the mean values of the validation and test sets, respectively. The accuracy, F1 score, and AUC values are similar, hanging around 84%. In terms of precision and recall, the validation and test set means differ by only 1%. The validation set mean appears to lack consistency, whereas the test set mean remains stable at 84%. The difference in the validation set mean can be attributable to our match selection; heated matches and one-sided matchups may have influenced overall performance. In summary, the model scores between 83% and 84% on average across all measures.

5. Discussion

We made progress in several areas of our analysis. First, our quantification method has shown some improvements. Compared to linear approaches and logit models [2], our probability-based momentum quantification represents players’ real-time psychological status more explicitly, dynamically, and intuitively. In addition, we discovered that break-serve scores can increase momentum through visualization, which is consistent with Moss and O’Donoghue’s [4] research, confirming the effectiveness of our strategy. As a result, this strategy can assist tennis coaches in better understanding their players’ psychological situation and cultivating their mentality.

For match prediction, our model can predict the winner by using momentum, which considers the match’s dynamic aspects. Unlike typical pre-match forecasts, which rely solely on historical data [6], our system evaluates players’ game performance in real time. This method can more accurately track the chance of winning during the game, giving a more nuanced understanding of how little changes affect the outcome.

Although our game-winning prediction model shows good accuracy, there are still some misclassifications due to unaccounted factors. Future research can focus on optimizing the momentum quantification system and further improving the prediction accuracy. Specifically, combining historical data and the characteristic factors of both players may be helpful. Future analysis of fixed players can lead to better prediction results compared to relying solely on the generative model.

6. Conclusions

This study introduces a novel approach to quantifying real-time psychological momentum in tennis matches by integrating winning probabilities, leverage, and an Exponentially Weighted Moving Average (EWMA). The resulting momentum graphs provide an intuitive visualization of players’ psychological and tactical shifts, with break-serve points identified as critical drivers of momentum swings. By incorporating momentum and other match-related features into an XGBoost binary classification model, we achieved an average accuracy of 84% in predicting game winners, demonstrating the model’s effectiveness in capturing dynamic match flow. The SHAP analysis revealed that momentum is a moderately important predictor, complementing key factors like serve performance and score differences.

Our approach advances existing methods by offering real-time predictions, unlike traditional pre-match models that rely solely on historical data. This dynamic system can assist players and coaches in analyzing performance, adjusting strategies during matches, and preparing for opponents based on momentum trends. However, the model’s performance is limited by occasional misclassifications, potentially due to unaccounted factors like psychological reversal or player-specific traits.

The proposed methodology, which quantifies momentum through real-time probabilities and leverages machine learning for prediction, is adaptable to other sports with point-based or momentum-driven dynamics. Sports like badminton, table tennis, or volleyball, where players alternate serves and experience momentum swings, share structural similarities with tennis, making our approach applicable with minor adjustments (e.g., recalibrating leverage calculations for sport-specific scoring systems). For team sports like basketball, the methodology could be extended by aggregating individual player momentum into team-level metrics. However, sports with continuous play (e.g., soccer) may require significant modifications to account for fluid dynamics.

Future research could enhance the momentum quantification system by incorporating additional features, such as player fatigue or historical performance against specific opponents. Combining real-time and historical data may further improve prediction accuracy. Additionally, extending the model to other court surfaces or team-based sports could broaden its applicability. Overall, this study underscores the potential of momentum-based models to enhance tennis match analysis and prediction, offering valuable insights for players, coaches, and analysts.

Author Contributions

Conceptualization, L.W. and Q.U.A.S.; Methodology, L.W., P.C. and Q.U.A.S.; Formal analysis, L.W. and P.C.; Investigation, L.W. and P.C.; Data curation, L.W., Q.U.A.S. and P.C.; Writing—original draft, L.W.; Writing—review and editing, Q.U.A.S. and L.W.; Visualization, L.W. and P.C.; Supervision, Q.U.A.S.; Project administration, Q.U.A.S.; Funding acquisition, Q.U.A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

I am grateful for the support and guidance I have received while finishing this, which marks the culmination of my undergraduate journey at UA. Apart from my group members, I’d like to acknowledge the individuals who contributed significantly to this topic. Firstly, I want to acknowledge my colleague, Puning Chen, for his knowledgeable support and patient companionship. Secondly, I would like to thank my teammates, Xue Pan and Tianrui Hao, who initiated this topic and put a lot of effort into the early version of this project. Finally, I am very thankful to my advisor, Qurat Ul An Sabir, for her guidance at every step of the paper.

Conflicts of Interest

All authors have no conflicts of interest.

References

Crust, L.; Nesti, M. A review of psychological momentum in sports: Why qualitative research is needed. Athl. Insight 2006, 8, 1–15. [Google Scholar]
Dietl, H.; Nesseler, C. Momentum in Tennis: Controlling the Match; UZH Business Working Paper Series 365; Department of Business Administration at the University of Zurich: Zürich, Switzerland, 2017. [Google Scholar]
Meier, P.; Flepp, R.; Ruedisser, M.; Franck, E. Separating psychological momentum from strategic momentum: Evidence from men’s professional tennis. J. Econ. Psychol. 2020, 78, 102269. [Google Scholar] [CrossRef]
Moss, B.; O’Donoghue, P. Momentum in US Open men’s singles tennis. Int. J. Perform. Anal. Sport 2015, 15, 884–896. [Google Scholar] [CrossRef]
Depken, C.A.; Gandar, J.M.; Shapiro, D.A. Set-level strategic and psychological momentum in best-of-three-set professional tennis matches. J. Sports Econ. 2022, 23, 598–623. [Google Scholar] [CrossRef]
Gu, W.; Saaty, T.L. Predicting the outcome of a tennis tournament: Based on both data and judgments. J. Syst. Sci. Syst. Eng. 2019, 28, 317–343. [Google Scholar] [CrossRef]
Wimbledon. Wimbledon Official Website. 2024. Available online: https://www.wimbledon.com/index.html (accessed on 24 May 2025).
Colino, E.; García-Unanue, J.; Felipe, J.L.; Quintana-García-Milla, I. Mechanical properties influencing athlete–surface interaction on tennis court surfaces. Sports Eng. 2024, 27, 18. [Google Scholar] [CrossRef]
Paris 2024 Olympics. 2024 Paris Olympics Tennis Women Single. 2024. Available online: https://olympics.com/en/paris-2024 (accessed on 24 May 2025).
Seidl, R.; Lucey, P. Live counter-factual analysis in women’s tennis using automatic key-moment detection. In Proceedings of the MIT Sloan Sports Analytics Conference, Boston, MA, USA, 4–5 March 2022. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Gillet, E.; Leroy, D.; Thouvarecq, R.; Stein, J.F. A notational analysis of elite tennis serve and serve-return strategies on slow surface. J. Strength Cond. Res. 2009, 23, 532–539. [Google Scholar] [CrossRef] [PubMed]
Hunter, J.S. The Exponentially Weighted Moving Average. J. Qual. Technol. 1986, 18, 203–210. [Google Scholar] [CrossRef]
Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 2012, 25, 2951–2959. [Google Scholar]
Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall, and F-score, with implications for evaluation. In Advances in Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3408. [Google Scholar]
Huang, J.; Ling, C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar] [CrossRef]
Browne, M.W. Cross-validation methods. J. Math. Psychol. 2000, 44, 108–132. [Google Scholar] [CrossRef] [PubMed]
SHAP. SHAP Documentation. 2024. Available online: https://shap.readthedocs.io/en/latest/api.html (accessed on 24 May 2025).

Figure 1. Flowchart of the Model.

Figure 2. Real-Time Winning Probability for Two Players.

Figure 3. Leverage Gained for Two Players.

Figure 4. Momentum Graph and Illustrations (Alcaraz 3:2 Djokovic).

Figure 5. Interpretation of Dynamic Momentum Graph with Break Points (Alcaraz 3:2 Djokovic).

Figure 6. Momentum Graph (Zheng Qinwen 2:0 Donna Vekić).

Figure 7. Training and Testing Set Splitting.

Figure 8. Feature Engineering Flow Chart.

Figure 9. Computation Demonstration Step-by-Step.

Figure 10. Confusion Matrix and ROC Curve in the Validation Set.

Figure 11. SHAP Value Bar Chart.

Figure 12. Confusion Matrix and ROC Curve in the testing set.

Figure 13. Predicted probability of XGBoost classification model in the testing set.

Figure 14. Average Metrics Bar Chart.

Table 1. Variables Description.

Variable	Symbol	Description
sets	St	Number of sets won by Player 1/2
games	Gm	Number of games won by Player 1/2
score	Scr	Scores of Player 1/2
serve	Srv	Serve by Player 1/2
points	Pt	Number of points won by Player 1/2
point_victor	PtVct	Point victor is Player 1/2
ace	Ace	Ace by Player 1/2
break_pt_won	BPtW	Break points won by Player 1/2
double_fault	DF	Double fault made by Player 1/2
rally_count	Ra	The number of rallies
distance_run	DR	The meters of running distance for Player 1/2

Table 2. Model Performance Metrics for Training and Testing Sets.

Set	Accuracy	Precision	Recall	F1	AUC
Training	0.852026	0.838200	0.866872	0.852295	0.852246
Testing	0.791367	0.737113	0.869301	0.797768	0.795306

Table 3. Sensitive Analysis in Different Sets (Note that Training stands for five-fold CV results in the training sets).

Multiple Splitting		Accuracy	Precision	Recall	F1	AUC
Set 1	Training	0.831157	0.819723	0.838059	0.82879	0.831323
Set 1	Testing	0.876129	0.878109	0.882500	0.880299	0.875917
Set 2	Training	0.843791	0.829513	0.858286	0.843654	0.844046
Set 2	Testing	0.812048	0.794258	0.825871	0.809756	0.812468
Set 3	Training	0.840432	0.825561	0.851446	0.838304	0.840736
Set 3	Testing	0.818008	0.848921	0.816609	0.832451	0.818176
Set 4	Training	0.835654	0.827597	0.842481	0.834972	0.835742
Set 4	Testing	0.893191	0.881020	0.891117	0.886040	0.893059
Set 5	Training	0.848803	0.834629	0.860853	0.847538	0.849081
Set 5	Testing	0.788827	0.800905	0.778022	0.789298	0.789011

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, L.; Chen, P.; Sabir, Q.U.A. Tennis Game Dynamic Prediction Model Based on Players’ Momentum. AppliedMath 2025, 5, 77. https://doi.org/10.3390/appliedmath5030077

AMA Style

Wang L, Chen P, Sabir QUA. Tennis Game Dynamic Prediction Model Based on Players’ Momentum. AppliedMath. 2025; 5(3):77. https://doi.org/10.3390/appliedmath5030077

Chicago/Turabian Style

Wang, Lechuan, Puning Chen, and Qurat Ul An Sabir. 2025. "Tennis Game Dynamic Prediction Model Based on Players’ Momentum" AppliedMath 5, no. 3: 77. https://doi.org/10.3390/appliedmath5030077

APA Style

Wang, L., Chen, P., & Sabir, Q. U. A. (2025). Tennis Game Dynamic Prediction Model Based on Players’ Momentum. AppliedMath, 5(3), 77. https://doi.org/10.3390/appliedmath5030077

Article Menu

Tennis Game Dynamic Prediction Model Based on Players’ Momentum

Abstract

1. Introduction

2. Data and Methods

2.1. Data Collection

2.2. Methods

2.2.1. Real-Time Winning Probability

2.2.2. Leverage Based on Counterfactual Prediction Framework

2.2.3. Momentum and Visualization

3. Momentum Quantification Results

3.1. Wimbledon 2023 Final

3.2. 2024 Paris Olympics Tennis Women’s Single

4. Game Winner Prediction

4.1. Model Training

4.1.1. Training and Testing Set Splitting

4.1.2. Feature Engineering

4.2. Model Accuracy

4.2.1. Training Set Accuracy

4.2.2. Testing Set Accuracy

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI