1. Introduction
The prediction of sports outcomes is typically treated as a classification problem, where one of three outcomes is predicted: win, lose, or draw [
1]. As high-level basketball, such as in the EuroLeague, increasingly relies on data, researchers have focused on predictive models based on in-game performance statistics, further emphasizing the importance of data-driven insights [
2]. By examining metrics such as points scored, rebounds, assists, turnovers, and shooting efficiency and distribution, coaches and analysts can gain valuable insights into the factors that influence game outcomes [
3]. These game-related statistics are not only descriptive but also predictive, providing a basis for developing models that can accurately predict results.
The application of performance indicators to understand the determinants of basketball success has long been a focus in sports science. Early research relied primarily on conventional statistical methods, such as discriminant analysis, correlation, and linear regression, to identify critical variables associated with winning and losing. Rebounding, turnover control, assists, and shooting efficiency have appeared as key indicators in multiple studies [
4,
5,
6,
7]. At the EuroLeague level, common findings underscore the predictive value of shooting efficiency, defensive rebounding, assists, and ball control. Ektirici [
8] emphasized field goal shooting, defensive rebounds, and assists as performance enhancers, while Mikolajec et al. [
9] linked team success to assists, fouls, and made shots. Correspondingly, Özmen [
10] highlighted defensive rebounds and shooting percentages as decisive performance factors.
With increasingly advanced computational tools, greater data availability, and algorithmic progress, machine learning (ML) applications for predicting sports outcomes have become a powerful alternative [
11]. ML techniques can analyze non-linear interactions and high-dimensional data, providing a more detailed understanding of how variable combinations influence game results [
12]. Unlike linear models, ML approaches reveal complex hidden patterns that might be overlooked with traditional methods.
A growing body of research has applied supervised-learning pipelines to predict basketball outcomes, progressing from interpretable regressions to sophisticated ML models. In professional basketball, studies have employed logistic regression [
13], tree-based ensembles [
14], and support vector machines [
15] to predict outcomes. Research has also identified performance metrics such as effective field goal percentage (eFG%), rebounds, and turnovers as strong predictors [
16], while others have examined contextual features (home advantage, fatigue) to enhance predictions [
17,
18]. Research on basketball outcome prediction using ML spans both the NBA and European leagues, with an increasing focus on model accuracy, feature importance, and tactical implications.
In the NBA context, Zadravec [
19] used ML to predict the winner of the 2024 NBA Championship, revealing that the critical predictors were the quality rating of the team and the performance metrics of the players. Horvat et al. [
20] introduced a structured team efficiency index, achieving 78% predictive accuracy. Wang [
2] compared several algorithms, and Deep Neural Network (DNN) and Random Forest emerged as the most efficacious, particularly when using field goal percentage (FG%) as the key performance variable. A study by Tsagris et al. [
21] further demonstrated the value of halftime data in performance forecasting.
European basketball has also seen notable contributions. Lampis et al. [
22] applied ML algorithms in four leagues, improving accuracy by 3–5% using advanced metrics. Plakias et al. [
23] compared key performance indicators (KPIs) in the EuroLeague and national leagues, identifying context-dependent influences on outcomes, and confirmed the external validity of tree-based feature rankings. They highlighted offensive rating, defensive rebounds, and turnover ratio as key KPIs. Giasemidis [
24] showed that gradient boosting with possession-efficiency features achieved 72% accuracy in predicting regular-season winners, outperforming logistic regression. More recently, Foteinakis et al. [
25] investigated decision-making under pressure in EuroLeague games, identifying shot range, defensive pressure, offensive possession time, and current game status as key determinants of clutch shot success, which affects scoring efficiency and game outcomes.
Methodological advancements have further enriched this field. Bunker and Thabtah [
12] reviewed Artificial Neural Network-based models and proposed a foundational ML framework. Papageorgiou et al. [
26] compared 14 ML models, with tree-based models demonstrating superior predictive capabilities. Additional studies by Li [
27] and Horvat et al. [
28] emphasized model robustness and feature engineering using the Shapley Additive Explanations (SHAP) model, while Ou-Yang et al. [
29] focused on the Chinese Basketball Association (CBA), identifying differences in KPI’s importance, such as the dominance of two-point efficiency and offensive metrics during playoff phases. Collectively, these studies highlight the effectiveness of ML in basketball analytics and illustrate the increasing sophistication of predictive models and feature engineering across leagues and contexts.
Despite these contributions, several gaps remain. First, few studies have conducted comparative analysis in different ML algorithms using only game-related performance statistics. Second, most predictive modeling efforts are focused on the NBA, national, and collegiate levels, while the EuroLeague, known for its tactical complexity, distinct rules, and defensive style, remains largely underrepresented in ML research. Third, many existing studies prioritize predictive accuracy over model interpretability, often employing complex ML algorithms without sufficiently clarifying how game-related features impact outcomes [
29]. This lack of explainability limits the practical use of such models by coaches and teams, who rely on clear, actionable insights to inform their decisions. As such, interpretable ML techniques are essential for translating analytical outputs into meaningful operational strategies [
30].
To address these gaps, the purpose of the present study was to evaluate and compare the performance of four supervised ML algorithms, Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB), in predicting the outcomes of EuroLeague basketball games, offering interpretability of predictions and actionable insights, using game-related statistics from the 2024–25 season. The study seeks to answer two questions: (a) Which model delivers the highest predictive accuracy? and (b) Which game-related variables have the most significant impact on predicting EuroLeague basketball outcomes among the four ML models analyzed?
2. Materials and Methods
2.1. Data Acquisition and Preparation
We used a dataset of game-related statistics from the publicly available website Hack-a-Stat (
https://hackastat.eu/, accessed on 2 June 2025), a repository of data from major European basketball tournaments, including the EuroLeague, where statistics are calculated from box scores available on the official EuroLeague website (
https://www.euroleaguebasketball.net/euroleague/, accessed on 2 August 2025). This repository has been utilized in previous studies [
31,
32]. The dataset included a CSV file reporting each EuroLeague game with two observations per game—one for each team—totaling 660 observations. This file provided sufficient data to explore classification ML algorithms and validate their results. The dataset comprised all 330 games from the 2024–25 EuroLeague season, covering the complete 306-game regular season and 24 playoff, play-in, and final four games completed as of the data access date. The CSV file contained standard box-score team statistics, such as three-point shots made and three-point percentage, as well as advanced composite metrics like Offensive Rating (ORTG), eFG%, and True Shooting Percentage (TS%), along with spatially derived statistics, including shot zone and shooting distribution. It also included game-specific details, such as the game result (win = 1 or loss = 0), which was used as the response (target) variable in this study.
Initially, the database comprised 127 raw statistical variables per team. Data preprocessing involved systematic cleaning and quality control: non-numeric entries and placeholders were standardized or converted to missing values, incomplete or inconsistent records were inspected, and redundant or highly correlated variables were reviewed and removed. To avoid introducing information leakage, the inadvertent use of outcome-dependent variables during model training, two independent experts (P.F. and A.C.) in basketball analytics examined all candidate variables and excluded those directly reflecting or derived from the final game outcome (e.g., point differential, Net Rating, and other composite efficiency metrics).
After this preprocessing and expert review, the final dataset consisted of 28 predictor variables (
Table 1) describing team performance and one binary outcome variable indicating game result (win/loss). This curated feature set provided a balanced and leakage-free foundation for subsequent machine learning analysis.
2.2. Data Splitting, Validation and Reliability Procedure
To maintain the precision and reliability of the analysis, 33 (10%) randomly selected game statistics from the dataset were cross-referenced with the EuroLeague official box-score stats (
https://www.euroleaguebasketball.net/euroleague/, accessed on 2 August 2025) to confirm their validity. Minor discrepancies were resolved through a systematic review of the original data. The use of validated and reliable data is foundational for analytical accuracy, ensuring that derived insights reflect true performance patterns. In sports performance analysis, reliability (consistency) and validity (accuracy) are critical for minimizing errors that can skew predictive modeling and decision-making [
33].
All subsequent analyses were conducted in Python (version 3.10) using scikit-learn. To obtain an unbiased estimate of predictive performance while preventing overfitting, a hold-out plus cross-validation strategy was adopted. The dataset was first split into training–validation (80%) and independent test (20%) subsets using stratified random sampling to preserve the win/loss ratio. [
28]. Within the training–validation set, model development and tuning were performed through stratified 5-fold cross-validation [
34]. Each pipeline consisted of: (i) median imputation of missing values (SimpleImputer), (ii) standardization of predictors (StandardScaler), (iii) wrapper-based Recursive Feature Elimination (RFE), and (iv) the classification algorithm.
Hyperparameters for both the feature selection (RFE) step and the classifier were jointly optimized using Grid Search Cross-Validation with ROC-AUC as the optimization metric. The best-performing configuration (RFE + classifier hyperparameters) was refitted on the full training–validation portion, and its predictive performance was finally evaluated on the held-out 20% test set. This procedure ensured that feature selection, preprocessing, and tuning were fully nested within the cross-validation loop and that the test data remained unseen until final evaluation.
2.3. Feature Selection
All candidate variables (excluding the outcome) were subjected to a wrapper-based Recursive Feature Elimination (RFE) procedure designed to identify the most informative and non-redundant predictors of game outcome. RFE was executed independently for each classifier, ensuring model-specific and unbiased feature selection.
The number of retained predictors (nfeatures_to_select) was treated as a hyperparameter and systematically tuned over a fine grid ranging from 1 to 28 features, in single-feature increments, depending on the total number of available variables. This exhaustive search allowed precise identification of the optimal subset size for each classifier and enabled direct comparison of model performance across all feasible feature dimensionalities.
RFE was implemented within each pipeline as follows: Logistic Regression (LR) used an L2-regularized LR as the wrapper estimator, and the selected features were then used to train the final LR classifier. Random Forest (RF) employed a Random Forest wrapper, ranking features by Gini importance, with the same model type serving as the final classifier. The Support Vector Machine (SVM) with a radial-basis-function (RBF) kernel used a linear SVM (LinearSVC) as the ranking estimator, after which the RBF SVM was trained on the selected subset. For the Gaussian Naïve Bayes (NB) model, which lacks intrinsic featureimportance measures, a Random Forest wrapper was employed to rank variables, and the final NB classifier was trained on the features retained by this process.
By embedding RFE directly within each cross-validation fold, the featureselection step was re-executed during every training iteration, thereby avoiding information leakage. This approach ensured that each model’s selected features were determined solely from the training folds, producing classifier-specific, data-driven subsets that enhance both accuracy and interpretability.
2.4. Machine Learning Algorithms
Four supervised machine learning classifiers were trained to predict game outcomes (win = 1, loss = 0): LR, RF, NB, and SVM with an RBF kernel. All models were implemented in Python using the scikit-learn library and were evaluated through the unified pipeline described above, which incorporated preprocessing, feature selection, and hyperparameter optimization within nested cross-validation to prevent information leakage.
LR is a statistical method that analyzes the relationship between independent variables (features) to estimate the probability that a dependent binary variable is a success (in our case, a win). LR has proven effective in sports analytics due to its ability to handle binary classification problems [
35], such as predicting wins and losses, and provides interpretable results that are essential for understanding feature importance [
36]. In this task, LR served as the linear and interpretable baseline model. It estimates the probability of a binary outcome through a logistic function, providing both predictive accuracy and model transparency. An L2-regularized form of LR was adopted to mitigate overfitting, while the inverse regularization parameter C and the number of RFE-selected features were tuned during the cross-validation procedure.
NB is a probabilistic algorithm that assumes feature independence and is frequently used for classification tasks in sports events [
37]. It effectively handles small or imbalanced datasets, offers strong computational efficiency, and only needs a small amount of training data to estimate classification parameters [
28]. Since NB does not inherently yield feature importance rankings suitable for RFE, a Random Forest wrapper was employed to identify the most informative predictors before fitting the final NB model. This ensured methodological consistency across all algorithms, with wrapper-based RFE embedded inside the cross-validation loop for every classifier.
SVM performs well on high-dimensional data and can define both linear and non-linear decision boundaries through kernel functions. It is less susceptible to overfitting compared to other classifiers and has shown robust performance in sports outcome prediction [
2,
38]. Thus, the SVM classifier with an RBF kernel was employed to model potential non-linear decision boundaries between winning and losing performances. Feature selection for SVM was performed via RFE using a linear SVM (LinearSVC) as the ranking estimator, after which the final RBF kernel SVM was trained on the selected subset of features. The penalty parameter
C, kernel width
γ, and RFE subset size were tuned jointly within the same nested cross-validation framework to achieve the optimal balance between margin maximization and generalization.
RF is a powerful ML algorithm suitable for both classification and regression tasks, capable of managing complex datasets and providing feature importance rankings. RF is an ensemble method that combines numerous decision trees. The final prediction is derived from the combination of the predictions from all decision trees in the forest [
2]. It is widely used in classification because it reduces prediction variance and prevents overfitting, thereby enhancing overall model accuracy [
39]. Hence, in this task, RF was used to capture non-linear interactions and complex dependencies between game-related statistics. By averaging the predictions of multiple trees, RF produces robust, low-variance estimates that generalize well to unseen data. In this study, RF acted both as a classification model and as the wrapper estimator in the RFE procedure. The number of trees, maximum depth, and node-splitting criteria, together with the optimal feature subset size, were determined using stratified five-fold cross-validation.
These models were selected based on their proven effectiveness in sports analytics and their ability to handle binary classification problems [
35]. For each classifier, model performance was assessed on the independent 20% test set using accuracy, precision, recall, F1-score, and ROC-AUC. This comprehensive set of metrics evaluated both overall classification quality and class-specific discrimination ability.
Specifically, the ROC-AUC assesses the ability to distinguish between classes and summarizes how well a model can generate relative scores to differentiate positive and negative cases. Calibration and class-specific metrics (precision, recall, and F1) complemented this evaluation. Accuracy indicates the percentage of correct predictions. Precision measures the ratio of true positives (TPs) to all positive predictions (TP + FP), reflecting the ability to correctly identify positive cases. Recall is the ratio of true positives (TPs) to the total of true positives and false negatives (FNs), showing how often a model correctly detects positive instances among all actual positives. The F1 score is the harmonic mean of precision and recall.
2.5. Interpretation
To enhance interpretability, SHAP values were used to analyze the contribution of each feature to the model’s predictions. SHAP is a game-theoretic approach that explains the output of any ML model by assigning each feature an importance value for a specific prediction. It is based on Shapley values from cooperative game theory [
40], ensuring fair and consistent attribution of each feature’s contribution to the final prediction. SHAP summary plots illustrated the relative importance and direction of influence of each predictor, providing a transparent interpretation of how model features contributed to win/loss classification. This approach improved the explainability of the predictive models and provided actionable insights into the factors most strongly associated with team success in EuroLeague competition [
41].
3. Results
3.1. Model’s Performance
We evaluated four supervised machine learning algorithms, LR, RF, SVM with an RBF kernel, and NB, to predict EuroLeague game outcomes using the selected game-related performance variables. Model discrimination was assessed using accuracy, precision, recall, F1-score, and the AUC. All models achieved predictive accuracies above the random 50% baseline, demonstrating meaningful discriminatory capacity between winning and losing teams (
Table 2).
Among the evaluated models, the SVM (RBF kernel, RFE wrapper) model yielded the most consistent and robust performance across all evaluation metrics. SVM achieved accuracy of 0.841, AUC = 0.922, precision = 0.836, recall = 0.848, and F1 = 0.842. This demonstrates that SVM effectively captured non-linear relationships within the dataset and maintained excellent generalization on unseen data. The SVM model selected 18 key features through recursive feature elimination, confirming the predictive relevance of a compact and well-generalized subset of variables.
The LR (RFE wrapper) model exhibited comparably strong performance, with an accuracy of 0.818 and the highest AUC of 0.933, reflecting its strong discriminative ability in distinguishing between wins and losses. It also achieved precision = 0.828, recall = 0.803, and F1 = 0.815, indicating balanced performance across both sensitivity and positive predictive power.
The RF (RFE wrapper) model achieved an accuracy of 0.758 and AUC = 0.854, with precision = 0.736, recall = 0.803, and F1 = 0.768. While performing slightly below the linear and kernel-based models, RF still provided reliable predictive capability and offered valuable insights into feature importance, benefiting from its ensemble nature and robustness against noise.
By contrast, the NB (RFE–RF wrapper) model demonstrated the lowest performance, with an accuracy of 0.652 and AUC = 0.789. Although it attained a relatively high precision of 0.917, a recall of 0.333 and an F1 of 0.489 indicated limited sensitivity in detecting winning cases. These results are consistent with the simplifying independence assumptions of the NB algorithm, which may not fully capture the interdependencies among basketball performance variables.
Overall, the results indicate that SVM and LR offered the best generalization and discriminative performance, followed by Random Forest, while Naïve Bayes underperformed relative to the other models. The ROC curves for all classifiers, presented in
Figure 1, further confirm the superior classification and separability achieved by the SVM and LR models.
Table 3 presents the means and standard deviations for the 18 most informative game-related performance variables differentiating winning and losing EuroLeague teams. These features were identified through the recursive feature elimination (RFE) process combined with the Random Forest-based feature selection, which together determined the most relevant predictors of game outcomes. The selected variables encompass multiple dimensions of team performance, including shooting efficiency, rebounding, ball security, and spatial shot distribution, offering a comprehensive and data-driven representation of the key performance determinants in elite basketball competition.
3.2. RF Interpretation
To quantify the relative influence of each predictor on game outcomes, SHAP values were computed for the trained models.
Figure 2 illustrates the SHAP summary plot for the SVM (RBF, RFE wrapper) classifier, which achieved one of the highest AUC scores among the evaluated models. Each point in the plot represents an individual game observation, where the
Y-axis lists the predictive variables ranked by their mean absolute SHAP values (indicating their overall contribution to the model’s predictions), and the
X-axis shows the SHAP values themselves, quantifying the direction and magnitude of each variable’s effect on the predicted probability of a win. Positive SHAP values correspond to features that increase win likelihood, while negative values reduce it. The color gradient reflects the original feature values, with red indicating high values and blue indicating low values, thereby allowing simultaneous interpretation of both the feature’s value and its directional effect on predictions.
The SHAP analysis revealed that DR, TS%, and ST were among the most influential predictors of game outcome, with higher values in each strongly associated with an increased probability of winning. These findings emphasize the critical importance of rebounding efficiency, shooting effectiveness, and defensive pressure in determining success at the EuroLeague level. Similarly, higher values of 3PTM and total FGM contributed positively to predicted win probabilities, highlighting the role of offensive execution and shot-making consistency.
Conversely, TO and PF exhibited negative SHAP values when high, indicating that excessive ball losses or fouling significantly decreased the likelihood of winning. The H/A indicator had a modest but clear positive contribution, suggesting that home-court advantage provided a consistent, though secondary, boost to win probability. Other variables, such as OR, BLK, and POSS, also contributed meaningfully but with smaller magnitudes, reflecting their contextual influence within overall game dynamics.
Spatial shot distribution features—particularly SHOT_RANGE_MIDDLE and SHOT_RANGE_3P—showed more nuanced effects. A lower reliance on mid-range attempts tended to favor winning predictions, consistent with modern efficiency-driven shot selection strategies, whereas a balanced or increased frequency of three-point attempts was generally associated with higher win probabilities, especially when accompanied by elevated shooting accuracy.
Overall, the SHAP interpretation underscores that shooting efficiency, defensive rebounding, possession control (via steals and turnovers), and shot selection patterns are the principal drivers of game outcome prediction in this EuroLeague model. These results align with established performance analytics in elite basketball, reaffirming that teams maximizing efficient scoring opportunities and minimizing possession errors have the highest likelihood of success.
4. Discussion
Predicting basketball game outcomes is inherently complex, given the multitude of interacting factors influencing team performance—such as tactical choices, player form, contextual circumstances, and psychological and physical readiness. This study focused specifically on team-level game-related statistics, thereby isolating the predictive contribution of in-game performance variables independent of contextual or player-specific influences. Using a supervised machine learning framework with four classifiers, LR, SVM, RF, and NB, we systematically compared the predictive performance, interpretability, and explanatory consistency of these models in forecasting EuroLeague basketball game outcomes.
Among the four classifiers, LR and SVM (RBF kernel) achieved the highest predictive performance, confirming their suitability for structured sports data. SVM achieved an accuracy of 0.841 and an AUC of 0.922, while LR followed closely with an accuracy of 0.818 and an AUC of 0.933. Both models demonstrated high precision and recall, indicating reliable discrimination between winning and losing teams. The robust performance of SVM reflects its compatibility with the structured and moderately correlated nature of box-score data, where linear separability often holds. Recent studies [
20,
41] indicate that when predictors are moderately correlated and the data distribution is balanced, LR can perform similarly to, or even better than, more complex non-linear models, especially when overfitting is a concern. The LR’s robust performance is comparable to recent studies [
2,
21,
23,
29], which explored the KPIs influencing game outcomes across different contexts, including the European, NBA, and Chinese basketball leagues.
SVM’s success, on the other hand, demonstrates its ability to capture non-linear relationships and subtle interaction effects within multivariate performance data [
42]. RF also performed well (accuracy = 0.758, AUC = 0.854) but showed slightly lower stability, likely due to redundancy and collinearity among features that may dilute ensemble-based feature importance. NB produced the weakest performance (accuracy = 0.652, AUC = 0.789), consistent with the algorithm’s independence assumption, which is rarely satisfied in interdependent team performance statistics.
The use of RFE embedded within stratified cross-validation ensured that each classifier was trained on the most informative and non-redundant predictors. The final 18 selected features represented key dimensions of game performance, including shooting efficiency (2PT%, TS%, 3PTM, FGM), rebounding (DR, OR), possession control (TO, ST), and spatial shot distribution (SHOT_RANGE_MIDDLE, SHOT_RANGE_3P). This comprehensive set of variables aligns with previous literature emphasizing the importance of efficient scoring, turnover minimization, and defensive stability as determinants of success in professional basketball [
2,
3,
43,
44].
Model interpretability was enhanced using SHAP, which quantified the contribution of each feature to the predicted probability of winning. The SHAP summary plot indicated that DR, TS%, and ST were the most influential predictors, with higher values in each corresponding to increased win probability. These findings emphasize that successful teams typically combine shooting efficiency, defensive resilience, and turnover creation to control game momentum.
Shooting efficiency was a crucial part of winning strategies in several studies [
2,
21,
43,
45,
46]. Despite the EuroLeague’s notable rise in three-point shot attempts over recent seasons [
47,
48], 3PTM’s predictive power remained strong in this study, consistent with Hu [
49], who found that each 1% increase in 3P% correlates with nearly a 5-percentage-point increase in a team’s winning percentage on average. This significance highlights not only the direct effect of three-pointers on game results but also a broader strategic trend in which the EuroLeague relies more on perimeter shooting without losing shooting efficiency [
50].
Spatial shot selection metrics provided further insight into offensive strategy. A lower reliance on mid-range attempts (SHOT_RANGE_MIDDLE) and a balanced but efficient three-point frequency (SHOT_RANGE_3P) were associated with higher win probabilities. This aligns with modern offensive philosophies that prioritize high-value scoring zones—either near the basket or beyond the arc—while reducing inefficient mid-range shots [
45,
46,
47,
48]. Mikołajec et al. [
7] emphasized that in closely contested games, high efficiency in short-distance shots is positively correlated with winning outcomes. Paint-area shots, such as layups, dunks, and putbacks, are typically high-percentage opportunities. Therefore, teams that prioritize attacking the paint are more likely to benefit from efficient scoring. This observation aligns with core EuroLeague basketball strategies that emphasize inside scoring via cuts, second-chance points, and fast breaks, tactics shown to yield high-efficiency outcomes, particularly during critical game phases [
51]. Meanwhile, the frequency of mid-range shot attempts has declined across the EuroLeague in recent years [
51]. Despite this, the probability of winning is minimally affected by the volume of mid-range attempts, reinforcing the notion that such shots offer limited strategic value [
52].
DR, ST, and OR were consistently identified as critical predictors of outcomes in basketball research [
3,
4,
23,
45,
46,
53,
54,
55]. Their prominent SHAP rankings highlight their dual impact: increasing possession opportunities while limiting opponents’ scoring chances. Winning teams typically exhibit strong defensive capabilities, underscoring the importance of developing effective rebounding strategies that prioritize box-out fundamentals and defensive schemes that create transition opportunities, thereby increasing the likelihood of success. Particularly, ST disrupts opponents’ offensive flow and creates additional scoring opportunities [
3].
Conversely, TO and PF exhibited strong negative effects, underscoring the detrimental impact of possession loss and defensive infractions on team success. Winning teams were found to commit fewer TO, suggesting that reducing TO through improved decision-making enhances offensive efficiency, an essential factor in achieving success [
9,
10]. Recent literature reinforces the view that effective turnover management is a key predictor of team performance, as it helps maintain offensive continuity and defensive positioning [
3,
29]. To mitigate the significant negative impact of TO on winning probability, coaches should implement targeted decision-making drills and prioritize ball security in high-pressure scenarios. Lineup compositions should balance high-usage playmakers with low-error role players to stabilize possession. Emphasizing structured offensive sets and situational awareness, and recruiting players who can collaborate effectively and perform as a cohesive unit, can further reduce TO and enhance overall efficiency [
56].
These findings also align with tactical analyses indicating that EuroLeague teams increasingly prioritize spacing, pace, and perimeter efficiency as determinants of competitive success [
50]. Beyond forcing defensive extensions, effective three-point play enhances ball movement and driving lanes, a trend particularly prominent in European leagues and aligned with the global shift toward faster, perimeter-oriented play [
47]. While high-reward, its impact depends on efficiency, making optimal strategies those that combine open two-point attempts via penetration with quality three-point shots from skilled shooters, which explains why 3PTM and TS% emerged as among the most impactful features in this study.
The H/A showed a positive SHAP contribution, confirming a measurable home-court advantage, consistent with prior EuroLeague research [
57,
58,
59,
60]. This advantage is mainly due to improved performance in game-related statistics because of factors such as court familiarity, the ability to establish control early in the game and set the tone for the rest of the play, the influence of home supporters (especially in European competitions), and the physical and psychological effects of travel on visiting teams.
Collectively, these results suggest that winning teams in the EuroLeague combine accurate shooting with disciplined defensive play and possession management. The strong performance of interpretable models such as LR and SVM reinforces that, even in complex team sports, a small number of strategically coherent indicators—especially those linked to efficiency and control—can yield reliable predictive insights. This balance between predictive power and interpretability enhances the practical utility of ML models for coaches and analysts, providing actionable evidence to guide tactical decision-making and performance optimization.
4.1. Practical Applications
The results of this study offer a clear, data-driven framework for improving competitive performance in elite basketball. By integrating SVM feature importance with SHAP interpretability, the findings highlight that shooting efficiency, rebounding dominance, defensive disruption, and turnover control are the most decisive factors in determining outcomes. Coaches should therefore emphasize efficient shot generation—favoring attempts in the paint and from beyond the three-point arc—while minimizing low-efficiency mid-range shots.
From a defensive perspective, success is linked to rebounding control and defensive anticipation. Teams that consistently secure defensive rebounds and generate steals can limit opponents’ possessions and initiate high-value transition opportunities. Training programs should thus focus on rebounding fundamentals, coordinated defensive rotations, and situational drills designed to increase possession recovery. Offensively, reducing turnovers remains critical: implementing decision-making training, structured offensive sets, and a balanced role between creators and low-turnover support players can significantly improve efficiency.
The findings also reaffirm the value of home-court advantage, suggesting that beyond environmental factors such as crowd support, familiar playing conditions enable teams to control tempo and maintain higher shooting confidence. Collectively, these insights provide a practical roadmap for performance analysts and coaching staff, translating complex statistical modeling into actionable strategies that directly inform game preparation and player development.
4.2. Limitations and Future Research
Despite its robust methodology, this study has several limitations that open avenues for future research. First, the analysis relied exclusively on game-related statistical variables, without incorporating contextual, tactical, and player-level information such as lineup structure, injury status, player workload, or in-game strategic adjustments. These elements may substantially influence outcomes in elite basketball and should be integrated into future models to enhance predictive depth and ecological validity. Additionally, the present work is based on a single EuroLeague season (2024–25); extending the dataset across multiple seasons would enable longitudinal evaluation and improve generalizability.
Another limitation concerns the modeling framework itself. The study intentionally employed well-established machine learning algorithms (LR, RF, SVM, NB) rather than proposing new methodological innovations. While these classical models offer stability, interpretability, and transparency—key strengths in applied basketball analytics—they may not fully capture high-order temporal, spatial, or interaction effects present in elite competition. Recent advances in attention-based architectures and transformer models have demonstrated superior performance in complex sequence-learning tasks across domains, including engineered systems and fault detection (e.g., attention-based automated fault diagnosis [
61] and cross-building transferability studies in air-handling units [
62]). These works illustrate how attention mechanisms can model long-range dependencies and heterogeneous input structures more effectively than traditional models. Applying similar architectures to basketball—such as transformer-based possession modeling, temporal rhythm analysis, or player-interaction graphs—represents a promising direction for future research, provided the challenge of maintaining interpretability is adequately addressed.
Furthermore, while SHAP analysis improved transparency, it provides correlational rather than causal explanations. Integrating causal inference frameworks or hybrid models that combine explainable ML with domain-informed performance analytics may yield deeper insights into the true mechanisms underlying winning strategies.
By addressing these limitations—incorporating richer contextual features, expanding multi-season datasets, and exploring advanced modeling paradigms such as transformers, attention mechanisms, and graph learning—future work can further strengthen predictive accuracy and enhance the practical translational value of ML-based analytics for coaches, scouts, and performance analysts in elite basketball.