From Data to Decisions: Using Explainable Machine Learning to Predict EuroLeague Basketball Outcomes

Foteinakis, Panagiotis F.; Kokkotis, Christos; Karamousalidis, Georgios; Avloniti, Alexandra; Pavlidou, Stefania; Zaras, Nikolaos; Stampoulis, Theodoros; Pantazis, Dimitrios; Aggelakis, Panagiotis; Balampanos, Dimitrios; Liu, Junshi; Laparidis, Konstantinos; Chatzinikolaou, Athanasios

doi:10.3390/app152312401

Open AccessArticle

From Data to Decisions: Using Explainable Machine Learning to Predict EuroLeague Basketball Outcomes

by

Panagiotis F. Foteinakis

¹

,

Christos Kokkotis

¹,

Georgios Karamousalidis

²

,

Alexandra Avloniti

¹

,

Stefania Pavlidou

¹

,

Nikolaos Zaras

¹

,

Theodoros Stampoulis

¹

,

Dimitrios Pantazis

¹

,

Panagiotis Aggelakis

¹

,

Dimitrios Balampanos

¹

,

Junshi Liu

³

,

Konstantinos Laparidis

¹ and

Athanasios Chatzinikolaou

^1,*

¹

Department of Physical Education and Sport Science, School of Physical Education, Sport Science and Occupational Therapy, Democritus University of Thrace, 69100 Komotini, Greece

²

Laboratory of Evaluation of Human Biological Performance, Department of Physical Education and Sports Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

³

Department of Physical Education, Guangdong University of Science and Technology, 99 Xihu Road, Nancheng District, Dongguan 523083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12401; https://doi.org/10.3390/app152312401

Submission received: 19 October 2025 / Revised: 14 November 2025 / Accepted: 20 November 2025 / Published: 21 November 2025

Download

Browse Figures

Versions Notes

Featured Application

The study’s approach provides analysts with a repeatable and understandable framework that strikes a balance between statistical power and tactical utility. Coaches can focus on offensive strategies that produce high-quality shots using structured spacing and ball movement and develop decision-making protocols that reduce possession losses. Defensively, strategies must emphasize rebound control, forcing turnovers, and contesting shots. Teams are advised to de-prioritize offensive sets that result in inefficient mid-range attempts. Finally, this blueprint is essential for scouts to identify players who excel in high-percentage shooting, low-turnover decision-making, and disciplined rebounding. The methodology itself offers an interpretable model that successfully bridges data-driven insights with practical game strategy.

Abstract

Predicting basketball game outcomes in elite competitions is a complex task influenced by multiple interacting performance factors. This study applied a supervised machine learning (ML) framework to predict EuroLeague game outcomes using team-level game-related statistics. Four algorithms—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Naïve Bayes (NB)—were trained and compared following recursive feature elimination (RFE) to identify the most informative predictors. The dataset comprised comprehensive in-game statistics describing shooting efficiency, rebounding, ball security, and spatial shot distribution. Model performance was evaluated using accuracy, area under the receiver operating characteristic curve (AUC), precision, recall, and F1-score, ensuring both discrimination and calibration assessment. Among the four classifiers, SVM (AUC = 0.922, Accuracy = 0.841) and LR (AUC = 0.933, Accuracy = 0.818) achieved the highest predictive performance, outperforming RF and NB. Feature importance analysis using Shapley Additive Explanations (SHAP) on the best-performing SVM classifier revealed that true shooting percentage (TS%), defensive rebounds (DR), steals (ST), and turnovers (TO) were the most influential predictors of game outcomes. Teams that demonstrated higher shooting efficiency, greater rebounding control, and fewer turnovers showed a significantly higher probability of winning. These results confirm that well-validated and interpretable ML models can accurately predict game outcomes in professional basketball using readily available box-score statistics. The integration of RFE-based feature selection and SHAP interpretability provides transparent, evidence-based insights that can inform tactical decisions, enhance scouting accuracy, and support coaches in developing data-driven performance strategies within elite basketball environments.

Keywords:

basketball; sports analytics; outcome prediction; performance analysis; machine learning; feature selection; explainability

1. Introduction

The prediction of sports outcomes is typically treated as a classification problem, where one of three outcomes is predicted: win, lose, or draw [1]. As high-level basketball, such as in the EuroLeague, increasingly relies on data, researchers have focused on predictive models based on in-game performance statistics, further emphasizing the importance of data-driven insights [2]. By examining metrics such as points scored, rebounds, assists, turnovers, and shooting efficiency and distribution, coaches and analysts can gain valuable insights into the factors that influence game outcomes [3]. These game-related statistics are not only descriptive but also predictive, providing a basis for developing models that can accurately predict results.

The application of performance indicators to understand the determinants of basketball success has long been a focus in sports science. Early research relied primarily on conventional statistical methods, such as discriminant analysis, correlation, and linear regression, to identify critical variables associated with winning and losing. Rebounding, turnover control, assists, and shooting efficiency have appeared as key indicators in multiple studies [4,5,6,7]. At the EuroLeague level, common findings underscore the predictive value of shooting efficiency, defensive rebounding, assists, and ball control. Ektirici [8] emphasized field goal shooting, defensive rebounds, and assists as performance enhancers, while Mikolajec et al. [9] linked team success to assists, fouls, and made shots. Correspondingly, Özmen [10] highlighted defensive rebounds and shooting percentages as decisive performance factors.

With increasingly advanced computational tools, greater data availability, and algorithmic progress, machine learning (ML) applications for predicting sports outcomes have become a powerful alternative [11]. ML techniques can analyze non-linear interactions and high-dimensional data, providing a more detailed understanding of how variable combinations influence game results [12]. Unlike linear models, ML approaches reveal complex hidden patterns that might be overlooked with traditional methods.

A growing body of research has applied supervised-learning pipelines to predict basketball outcomes, progressing from interpretable regressions to sophisticated ML models. In professional basketball, studies have employed logistic regression [13], tree-based ensembles [14], and support vector machines [15] to predict outcomes. Research has also identified performance metrics such as effective field goal percentage (eFG%), rebounds, and turnovers as strong predictors [16], while others have examined contextual features (home advantage, fatigue) to enhance predictions [17,18]. Research on basketball outcome prediction using ML spans both the NBA and European leagues, with an increasing focus on model accuracy, feature importance, and tactical implications.

In the NBA context, Zadravec [19] used ML to predict the winner of the 2024 NBA Championship, revealing that the critical predictors were the quality rating of the team and the performance metrics of the players. Horvat et al. [20] introduced a structured team efficiency index, achieving 78% predictive accuracy. Wang [2] compared several algorithms, and Deep Neural Network (DNN) and Random Forest emerged as the most efficacious, particularly when using field goal percentage (FG%) as the key performance variable. A study by Tsagris et al. [21] further demonstrated the value of halftime data in performance forecasting.

European basketball has also seen notable contributions. Lampis et al. [22] applied ML algorithms in four leagues, improving accuracy by 3–5% using advanced metrics. Plakias et al. [23] compared key performance indicators (KPIs) in the EuroLeague and national leagues, identifying context-dependent influences on outcomes, and confirmed the external validity of tree-based feature rankings. They highlighted offensive rating, defensive rebounds, and turnover ratio as key KPIs. Giasemidis [24] showed that gradient boosting with possession-efficiency features achieved 72% accuracy in predicting regular-season winners, outperforming logistic regression. More recently, Foteinakis et al. [25] investigated decision-making under pressure in EuroLeague games, identifying shot range, defensive pressure, offensive possession time, and current game status as key determinants of clutch shot success, which affects scoring efficiency and game outcomes.

Methodological advancements have further enriched this field. Bunker and Thabtah [12] reviewed Artificial Neural Network-based models and proposed a foundational ML framework. Papageorgiou et al. [26] compared 14 ML models, with tree-based models demonstrating superior predictive capabilities. Additional studies by Li [27] and Horvat et al. [28] emphasized model robustness and feature engineering using the Shapley Additive Explanations (SHAP) model, while Ou-Yang et al. [29] focused on the Chinese Basketball Association (CBA), identifying differences in KPI’s importance, such as the dominance of two-point efficiency and offensive metrics during playoff phases. Collectively, these studies highlight the effectiveness of ML in basketball analytics and illustrate the increasing sophistication of predictive models and feature engineering across leagues and contexts.

Despite these contributions, several gaps remain. First, few studies have conducted comparative analysis in different ML algorithms using only game-related performance statistics. Second, most predictive modeling efforts are focused on the NBA, national, and collegiate levels, while the EuroLeague, known for its tactical complexity, distinct rules, and defensive style, remains largely underrepresented in ML research. Third, many existing studies prioritize predictive accuracy over model interpretability, often employing complex ML algorithms without sufficiently clarifying how game-related features impact outcomes [29]. This lack of explainability limits the practical use of such models by coaches and teams, who rely on clear, actionable insights to inform their decisions. As such, interpretable ML techniques are essential for translating analytical outputs into meaningful operational strategies [30].

To address these gaps, the purpose of the present study was to evaluate and compare the performance of four supervised ML algorithms, Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB), in predicting the outcomes of EuroLeague basketball games, offering interpretability of predictions and actionable insights, using game-related statistics from the 2024–25 season. The study seeks to answer two questions: (a) Which model delivers the highest predictive accuracy? and (b) Which game-related variables have the most significant impact on predicting EuroLeague basketball outcomes among the four ML models analyzed?

2. Materials and Methods

2.1. Data Acquisition and Preparation

We used a dataset of game-related statistics from the publicly available website Hack-a-Stat (https://hackastat.eu/, accessed on 2 June 2025), a repository of data from major European basketball tournaments, including the EuroLeague, where statistics are calculated from box scores available on the official EuroLeague website (https://www.euroleaguebasketball.net/euroleague/, accessed on 2 August 2025). This repository has been utilized in previous studies [31,32]. The dataset included a CSV file reporting each EuroLeague game with two observations per game—one for each team—totaling 660 observations. This file provided sufficient data to explore classification ML algorithms and validate their results. The dataset comprised all 330 games from the 2024–25 EuroLeague season, covering the complete 306-game regular season and 24 playoff, play-in, and final four games completed as of the data access date. The CSV file contained standard box-score team statistics, such as three-point shots made and three-point percentage, as well as advanced composite metrics like Offensive Rating (ORTG), eFG%, and True Shooting Percentage (TS%), along with spatially derived statistics, including shot zone and shooting distribution. It also included game-specific details, such as the game result (win = 1 or loss = 0), which was used as the response (target) variable in this study.

Initially, the database comprised 127 raw statistical variables per team. Data preprocessing involved systematic cleaning and quality control: non-numeric entries and placeholders were standardized or converted to missing values, incomplete or inconsistent records were inspected, and redundant or highly correlated variables were reviewed and removed. To avoid introducing information leakage, the inadvertent use of outcome-dependent variables during model training, two independent experts (P.F. and A.C.) in basketball analytics examined all candidate variables and excluded those directly reflecting or derived from the final game outcome (e.g., point differential, Net Rating, and other composite efficiency metrics).

After this preprocessing and expert review, the final dataset consisted of 28 predictor variables (Table 1) describing team performance and one binary outcome variable indicating game result (win/loss). This curated feature set provided a balanced and leakage-free foundation for subsequent machine learning analysis.

2.2. Data Splitting, Validation and Reliability Procedure

To maintain the precision and reliability of the analysis, 33 (10%) randomly selected game statistics from the dataset were cross-referenced with the EuroLeague official box-score stats (https://www.euroleaguebasketball.net/euroleague/, accessed on 2 August 2025) to confirm their validity. Minor discrepancies were resolved through a systematic review of the original data. The use of validated and reliable data is foundational for analytical accuracy, ensuring that derived insights reflect true performance patterns. In sports performance analysis, reliability (consistency) and validity (accuracy) are critical for minimizing errors that can skew predictive modeling and decision-making [33].

All subsequent analyses were conducted in Python (version 3.10) using scikit-learn. To obtain an unbiased estimate of predictive performance while preventing overfitting, a hold-out plus cross-validation strategy was adopted. The dataset was first split into training–validation (80%) and independent test (20%) subsets using stratified random sampling to preserve the win/loss ratio. [28]. Within the training–validation set, model development and tuning were performed through stratified 5-fold cross-validation [34]. Each pipeline consisted of: (i) median imputation of missing values (SimpleImputer), (ii) standardization of predictors (StandardScaler), (iii) wrapper-based Recursive Feature Elimination (RFE), and (iv) the classification algorithm.

Hyperparameters for both the feature selection (RFE) step and the classifier were jointly optimized using Grid Search Cross-Validation with ROC-AUC as the optimization metric. The best-performing configuration (RFE + classifier hyperparameters) was refitted on the full training–validation portion, and its predictive performance was finally evaluated on the held-out 20% test set. This procedure ensured that feature selection, preprocessing, and tuning were fully nested within the cross-validation loop and that the test data remained unseen until final evaluation.

2.3. Feature Selection

All candidate variables (excluding the outcome) were subjected to a wrapper-based Recursive Feature Elimination (RFE) procedure designed to identify the most informative and non-redundant predictors of game outcome. RFE was executed independently for each classifier, ensuring model-specific and unbiased feature selection.

The number of retained predictors (n_{features_to_select}) was treated as a hyperparameter and systematically tuned over a fine grid ranging from 1 to 28 features, in single-feature increments, depending on the total number of available variables. This exhaustive search allowed precise identification of the optimal subset size for each classifier and enabled direct comparison of model performance across all feasible feature dimensionalities.

RFE was implemented within each pipeline as follows: Logistic Regression (LR) used an L2-regularized LR as the wrapper estimator, and the selected features were then used to train the final LR classifier. Random Forest (RF) employed a Random Forest wrapper, ranking features by Gini importance, with the same model type serving as the final classifier. The Support Vector Machine (SVM) with a radial-basis-function (RBF) kernel used a linear SVM (LinearSVC) as the ranking estimator, after which the RBF SVM was trained on the selected subset. For the Gaussian Naïve Bayes (NB) model, which lacks intrinsic featureimportance measures, a Random Forest wrapper was employed to rank variables, and the final NB classifier was trained on the features retained by this process.

By embedding RFE directly within each cross-validation fold, the featureselection step was re-executed during every training iteration, thereby avoiding information leakage. This approach ensured that each model’s selected features were determined solely from the training folds, producing classifier-specific, data-driven subsets that enhance both accuracy and interpretability.

2.4. Machine Learning Algorithms

Four supervised machine learning classifiers were trained to predict game outcomes (win = 1, loss = 0): LR, RF, NB, and SVM with an RBF kernel. All models were implemented in Python using the scikit-learn library and were evaluated through the unified pipeline described above, which incorporated preprocessing, feature selection, and hyperparameter optimization within nested cross-validation to prevent information leakage.

LR is a statistical method that analyzes the relationship between independent variables (features) to estimate the probability that a dependent binary variable is a success (in our case, a win). LR has proven effective in sports analytics due to its ability to handle binary classification problems [35], such as predicting wins and losses, and provides interpretable results that are essential for understanding feature importance [36]. In this task, LR served as the linear and interpretable baseline model. It estimates the probability of a binary outcome through a logistic function, providing both predictive accuracy and model transparency. An L2-regularized form of LR was adopted to mitigate overfitting, while the inverse regularization parameter C and the number of RFE-selected features were tuned during the cross-validation procedure.

NB is a probabilistic algorithm that assumes feature independence and is frequently used for classification tasks in sports events [37]. It effectively handles small or imbalanced datasets, offers strong computational efficiency, and only needs a small amount of training data to estimate classification parameters [28]. Since NB does not inherently yield feature importance rankings suitable for RFE, a Random Forest wrapper was employed to identify the most informative predictors before fitting the final NB model. This ensured methodological consistency across all algorithms, with wrapper-based RFE embedded inside the cross-validation loop for every classifier.

SVM performs well on high-dimensional data and can define both linear and non-linear decision boundaries through kernel functions. It is less susceptible to overfitting compared to other classifiers and has shown robust performance in sports outcome prediction [2,38]. Thus, the SVM classifier with an RBF kernel was employed to model potential non-linear decision boundaries between winning and losing performances. Feature selection for SVM was performed via RFE using a linear SVM (LinearSVC) as the ranking estimator, after which the final RBF kernel SVM was trained on the selected subset of features. The penalty parameter C, kernel width γ, and RFE subset size were tuned jointly within the same nested cross-validation framework to achieve the optimal balance between margin maximization and generalization.

RF is a powerful ML algorithm suitable for both classification and regression tasks, capable of managing complex datasets and providing feature importance rankings. RF is an ensemble method that combines numerous decision trees. The final prediction is derived from the combination of the predictions from all decision trees in the forest [2]. It is widely used in classification because it reduces prediction variance and prevents overfitting, thereby enhancing overall model accuracy [39]. Hence, in this task, RF was used to capture non-linear interactions and complex dependencies between game-related statistics. By averaging the predictions of multiple trees, RF produces robust, low-variance estimates that generalize well to unseen data. In this study, RF acted both as a classification model and as the wrapper estimator in the RFE procedure. The number of trees, maximum depth, and node-splitting criteria, together with the optimal feature subset size, were determined using stratified five-fold cross-validation.

These models were selected based on their proven effectiveness in sports analytics and their ability to handle binary classification problems [35]. For each classifier, model performance was assessed on the independent 20% test set using accuracy, precision, recall, F1-score, and ROC-AUC. This comprehensive set of metrics evaluated both overall classification quality and class-specific discrimination ability.

Specifically, the ROC-AUC assesses the ability to distinguish between classes and summarizes how well a model can generate relative scores to differentiate positive and negative cases. Calibration and class-specific metrics (precision, recall, and F1) complemented this evaluation. Accuracy indicates the percentage of correct predictions. Precision measures the ratio of true positives (TPs) to all positive predictions (TP + FP), reflecting the ability to correctly identify positive cases. Recall is the ratio of true positives (TPs) to the total of true positives and false negatives (FNs), showing how often a model correctly detects positive instances among all actual positives. The F1 score is the harmonic mean of precision and recall.

2.5. Interpretation

To enhance interpretability, SHAP values were used to analyze the contribution of each feature to the model’s predictions. SHAP is a game-theoretic approach that explains the output of any ML model by assigning each feature an importance value for a specific prediction. It is based on Shapley values from cooperative game theory [40], ensuring fair and consistent attribution of each feature’s contribution to the final prediction. SHAP summary plots illustrated the relative importance and direction of influence of each predictor, providing a transparent interpretation of how model features contributed to win/loss classification. This approach improved the explainability of the predictive models and provided actionable insights into the factors most strongly associated with team success in EuroLeague competition [41].

3. Results

3.1. Model’s Performance

We evaluated four supervised machine learning algorithms, LR, RF, SVM with an RBF kernel, and NB, to predict EuroLeague game outcomes using the selected game-related performance variables. Model discrimination was assessed using accuracy, precision, recall, F1-score, and the AUC. All models achieved predictive accuracies above the random 50% baseline, demonstrating meaningful discriminatory capacity between winning and losing teams (Table 2).

Among the evaluated models, the SVM (RBF kernel, RFE wrapper) model yielded the most consistent and robust performance across all evaluation metrics. SVM achieved accuracy of 0.841, AUC = 0.922, precision = 0.836, recall = 0.848, and F1 = 0.842. This demonstrates that SVM effectively captured non-linear relationships within the dataset and maintained excellent generalization on unseen data. The SVM model selected 18 key features through recursive feature elimination, confirming the predictive relevance of a compact and well-generalized subset of variables.

The LR (RFE wrapper) model exhibited comparably strong performance, with an accuracy of 0.818 and the highest AUC of 0.933, reflecting its strong discriminative ability in distinguishing between wins and losses. It also achieved precision = 0.828, recall = 0.803, and F1 = 0.815, indicating balanced performance across both sensitivity and positive predictive power.

The RF (RFE wrapper) model achieved an accuracy of 0.758 and AUC = 0.854, with precision = 0.736, recall = 0.803, and F1 = 0.768. While performing slightly below the linear and kernel-based models, RF still provided reliable predictive capability and offered valuable insights into feature importance, benefiting from its ensemble nature and robustness against noise.

By contrast, the NB (RFE–RF wrapper) model demonstrated the lowest performance, with an accuracy of 0.652 and AUC = 0.789. Although it attained a relatively high precision of 0.917, a recall of 0.333 and an F1 of 0.489 indicated limited sensitivity in detecting winning cases. These results are consistent with the simplifying independence assumptions of the NB algorithm, which may not fully capture the interdependencies among basketball performance variables.

Overall, the results indicate that SVM and LR offered the best generalization and discriminative performance, followed by Random Forest, while Naïve Bayes underperformed relative to the other models. The ROC curves for all classifiers, presented in Figure 1, further confirm the superior classification and separability achieved by the SVM and LR models.

Table 3 presents the means and standard deviations for the 18 most informative game-related performance variables differentiating winning and losing EuroLeague teams. These features were identified through the recursive feature elimination (RFE) process combined with the Random Forest-based feature selection, which together determined the most relevant predictors of game outcomes. The selected variables encompass multiple dimensions of team performance, including shooting efficiency, rebounding, ball security, and spatial shot distribution, offering a comprehensive and data-driven representation of the key performance determinants in elite basketball competition.

3.2. RF Interpretation

To quantify the relative influence of each predictor on game outcomes, SHAP values were computed for the trained models. Figure 2 illustrates the SHAP summary plot for the SVM (RBF, RFE wrapper) classifier, which achieved one of the highest AUC scores among the evaluated models. Each point in the plot represents an individual game observation, where the Y-axis lists the predictive variables ranked by their mean absolute SHAP values (indicating their overall contribution to the model’s predictions), and the X-axis shows the SHAP values themselves, quantifying the direction and magnitude of each variable’s effect on the predicted probability of a win. Positive SHAP values correspond to features that increase win likelihood, while negative values reduce it. The color gradient reflects the original feature values, with red indicating high values and blue indicating low values, thereby allowing simultaneous interpretation of both the feature’s value and its directional effect on predictions.

The SHAP analysis revealed that DR, TS%, and ST were among the most influential predictors of game outcome, with higher values in each strongly associated with an increased probability of winning. These findings emphasize the critical importance of rebounding efficiency, shooting effectiveness, and defensive pressure in determining success at the EuroLeague level. Similarly, higher values of 3PTM and total FGM contributed positively to predicted win probabilities, highlighting the role of offensive execution and shot-making consistency.

Conversely, TO and PF exhibited negative SHAP values when high, indicating that excessive ball losses or fouling significantly decreased the likelihood of winning. The H/A indicator had a modest but clear positive contribution, suggesting that home-court advantage provided a consistent, though secondary, boost to win probability. Other variables, such as OR, BLK, and POSS, also contributed meaningfully but with smaller magnitudes, reflecting their contextual influence within overall game dynamics.

Spatial shot distribution features—particularly SHOT_RANGE_MIDDLE and SHOT_RANGE_3P—showed more nuanced effects. A lower reliance on mid-range attempts tended to favor winning predictions, consistent with modern efficiency-driven shot selection strategies, whereas a balanced or increased frequency of three-point attempts was generally associated with higher win probabilities, especially when accompanied by elevated shooting accuracy.

Overall, the SHAP interpretation underscores that shooting efficiency, defensive rebounding, possession control (via steals and turnovers), and shot selection patterns are the principal drivers of game outcome prediction in this EuroLeague model. These results align with established performance analytics in elite basketball, reaffirming that teams maximizing efficient scoring opportunities and minimizing possession errors have the highest likelihood of success.

4. Discussion

Predicting basketball game outcomes is inherently complex, given the multitude of interacting factors influencing team performance—such as tactical choices, player form, contextual circumstances, and psychological and physical readiness. This study focused specifically on team-level game-related statistics, thereby isolating the predictive contribution of in-game performance variables independent of contextual or player-specific influences. Using a supervised machine learning framework with four classifiers, LR, SVM, RF, and NB, we systematically compared the predictive performance, interpretability, and explanatory consistency of these models in forecasting EuroLeague basketball game outcomes.

Among the four classifiers, LR and SVM (RBF kernel) achieved the highest predictive performance, confirming their suitability for structured sports data. SVM achieved an accuracy of 0.841 and an AUC of 0.922, while LR followed closely with an accuracy of 0.818 and an AUC of 0.933. Both models demonstrated high precision and recall, indicating reliable discrimination between winning and losing teams. The robust performance of SVM reflects its compatibility with the structured and moderately correlated nature of box-score data, where linear separability often holds. Recent studies [20,41] indicate that when predictors are moderately correlated and the data distribution is balanced, LR can perform similarly to, or even better than, more complex non-linear models, especially when overfitting is a concern. The LR’s robust performance is comparable to recent studies [2,21,23,29], which explored the KPIs influencing game outcomes across different contexts, including the European, NBA, and Chinese basketball leagues.

SVM’s success, on the other hand, demonstrates its ability to capture non-linear relationships and subtle interaction effects within multivariate performance data [42]. RF also performed well (accuracy = 0.758, AUC = 0.854) but showed slightly lower stability, likely due to redundancy and collinearity among features that may dilute ensemble-based feature importance. NB produced the weakest performance (accuracy = 0.652, AUC = 0.789), consistent with the algorithm’s independence assumption, which is rarely satisfied in interdependent team performance statistics.

The use of RFE embedded within stratified cross-validation ensured that each classifier was trained on the most informative and non-redundant predictors. The final 18 selected features represented key dimensions of game performance, including shooting efficiency (2PT%, TS%, 3PTM, FGM), rebounding (DR, OR), possession control (TO, ST), and spatial shot distribution (SHOT_RANGE_MIDDLE, SHOT_RANGE_3P). This comprehensive set of variables aligns with previous literature emphasizing the importance of efficient scoring, turnover minimization, and defensive stability as determinants of success in professional basketball [2,3,43,44].

Model interpretability was enhanced using SHAP, which quantified the contribution of each feature to the predicted probability of winning. The SHAP summary plot indicated that DR, TS%, and ST were the most influential predictors, with higher values in each corresponding to increased win probability. These findings emphasize that successful teams typically combine shooting efficiency, defensive resilience, and turnover creation to control game momentum.

Shooting efficiency was a crucial part of winning strategies in several studies [2,21,43,45,46]. Despite the EuroLeague’s notable rise in three-point shot attempts over recent seasons [47,48], 3PTM’s predictive power remained strong in this study, consistent with Hu [49], who found that each 1% increase in 3P% correlates with nearly a 5-percentage-point increase in a team’s winning percentage on average. This significance highlights not only the direct effect of three-pointers on game results but also a broader strategic trend in which the EuroLeague relies more on perimeter shooting without losing shooting efficiency [50].

Spatial shot selection metrics provided further insight into offensive strategy. A lower reliance on mid-range attempts (SHOT_RANGE_MIDDLE) and a balanced but efficient three-point frequency (SHOT_RANGE_3P) were associated with higher win probabilities. This aligns with modern offensive philosophies that prioritize high-value scoring zones—either near the basket or beyond the arc—while reducing inefficient mid-range shots [45,46,47,48]. Mikołajec et al. [7] emphasized that in closely contested games, high efficiency in short-distance shots is positively correlated with winning outcomes. Paint-area shots, such as layups, dunks, and putbacks, are typically high-percentage opportunities. Therefore, teams that prioritize attacking the paint are more likely to benefit from efficient scoring. This observation aligns with core EuroLeague basketball strategies that emphasize inside scoring via cuts, second-chance points, and fast breaks, tactics shown to yield high-efficiency outcomes, particularly during critical game phases [51]. Meanwhile, the frequency of mid-range shot attempts has declined across the EuroLeague in recent years [51]. Despite this, the probability of winning is minimally affected by the volume of mid-range attempts, reinforcing the notion that such shots offer limited strategic value [52].

DR, ST, and OR were consistently identified as critical predictors of outcomes in basketball research [3,4,23,45,46,53,54,55]. Their prominent SHAP rankings highlight their dual impact: increasing possession opportunities while limiting opponents’ scoring chances. Winning teams typically exhibit strong defensive capabilities, underscoring the importance of developing effective rebounding strategies that prioritize box-out fundamentals and defensive schemes that create transition opportunities, thereby increasing the likelihood of success. Particularly, ST disrupts opponents’ offensive flow and creates additional scoring opportunities [3].

Conversely, TO and PF exhibited strong negative effects, underscoring the detrimental impact of possession loss and defensive infractions on team success. Winning teams were found to commit fewer TO, suggesting that reducing TO through improved decision-making enhances offensive efficiency, an essential factor in achieving success [9,10]. Recent literature reinforces the view that effective turnover management is a key predictor of team performance, as it helps maintain offensive continuity and defensive positioning [3,29]. To mitigate the significant negative impact of TO on winning probability, coaches should implement targeted decision-making drills and prioritize ball security in high-pressure scenarios. Lineup compositions should balance high-usage playmakers with low-error role players to stabilize possession. Emphasizing structured offensive sets and situational awareness, and recruiting players who can collaborate effectively and perform as a cohesive unit, can further reduce TO and enhance overall efficiency [56].

These findings also align with tactical analyses indicating that EuroLeague teams increasingly prioritize spacing, pace, and perimeter efficiency as determinants of competitive success [50]. Beyond forcing defensive extensions, effective three-point play enhances ball movement and driving lanes, a trend particularly prominent in European leagues and aligned with the global shift toward faster, perimeter-oriented play [47]. While high-reward, its impact depends on efficiency, making optimal strategies those that combine open two-point attempts via penetration with quality three-point shots from skilled shooters, which explains why 3PTM and TS% emerged as among the most impactful features in this study.

The H/A showed a positive SHAP contribution, confirming a measurable home-court advantage, consistent with prior EuroLeague research [57,58,59,60]. This advantage is mainly due to improved performance in game-related statistics because of factors such as court familiarity, the ability to establish control early in the game and set the tone for the rest of the play, the influence of home supporters (especially in European competitions), and the physical and psychological effects of travel on visiting teams.

Collectively, these results suggest that winning teams in the EuroLeague combine accurate shooting with disciplined defensive play and possession management. The strong performance of interpretable models such as LR and SVM reinforces that, even in complex team sports, a small number of strategically coherent indicators—especially those linked to efficiency and control—can yield reliable predictive insights. This balance between predictive power and interpretability enhances the practical utility of ML models for coaches and analysts, providing actionable evidence to guide tactical decision-making and performance optimization.

4.1. Practical Applications

The results of this study offer a clear, data-driven framework for improving competitive performance in elite basketball. By integrating SVM feature importance with SHAP interpretability, the findings highlight that shooting efficiency, rebounding dominance, defensive disruption, and turnover control are the most decisive factors in determining outcomes. Coaches should therefore emphasize efficient shot generation—favoring attempts in the paint and from beyond the three-point arc—while minimizing low-efficiency mid-range shots.

From a defensive perspective, success is linked to rebounding control and defensive anticipation. Teams that consistently secure defensive rebounds and generate steals can limit opponents’ possessions and initiate high-value transition opportunities. Training programs should thus focus on rebounding fundamentals, coordinated defensive rotations, and situational drills designed to increase possession recovery. Offensively, reducing turnovers remains critical: implementing decision-making training, structured offensive sets, and a balanced role between creators and low-turnover support players can significantly improve efficiency.

The findings also reaffirm the value of home-court advantage, suggesting that beyond environmental factors such as crowd support, familiar playing conditions enable teams to control tempo and maintain higher shooting confidence. Collectively, these insights provide a practical roadmap for performance analysts and coaching staff, translating complex statistical modeling into actionable strategies that directly inform game preparation and player development.

4.2. Limitations and Future Research

Despite its robust methodology, this study has several limitations that open avenues for future research. First, the analysis relied exclusively on game-related statistical variables, without incorporating contextual, tactical, and player-level information such as lineup structure, injury status, player workload, or in-game strategic adjustments. These elements may substantially influence outcomes in elite basketball and should be integrated into future models to enhance predictive depth and ecological validity. Additionally, the present work is based on a single EuroLeague season (2024–25); extending the dataset across multiple seasons would enable longitudinal evaluation and improve generalizability.

Another limitation concerns the modeling framework itself. The study intentionally employed well-established machine learning algorithms (LR, RF, SVM, NB) rather than proposing new methodological innovations. While these classical models offer stability, interpretability, and transparency—key strengths in applied basketball analytics—they may not fully capture high-order temporal, spatial, or interaction effects present in elite competition. Recent advances in attention-based architectures and transformer models have demonstrated superior performance in complex sequence-learning tasks across domains, including engineered systems and fault detection (e.g., attention-based automated fault diagnosis [61] and cross-building transferability studies in air-handling units [62]). These works illustrate how attention mechanisms can model long-range dependencies and heterogeneous input structures more effectively than traditional models. Applying similar architectures to basketball—such as transformer-based possession modeling, temporal rhythm analysis, or player-interaction graphs—represents a promising direction for future research, provided the challenge of maintaining interpretability is adequately addressed.

Furthermore, while SHAP analysis improved transparency, it provides correlational rather than causal explanations. Integrating causal inference frameworks or hybrid models that combine explainable ML with domain-informed performance analytics may yield deeper insights into the true mechanisms underlying winning strategies.

By addressing these limitations—incorporating richer contextual features, expanding multi-season datasets, and exploring advanced modeling paradigms such as transformers, attention mechanisms, and graph learning—future work can further strengthen predictive accuracy and enhance the practical translational value of ML-based analytics for coaches, scouts, and performance analysts in elite basketball.

5. Conclusions

This study applied a comprehensive, explainable machine learning framework to predict EuroLeague basketball game outcomes using team-level game-related statistics. Among four classifiers tested, SVM with an RBF kernel achieved the highest overall predictive performance (AUC = 0.922, Accuracy = 0.841), closely followed by the LR model (AUC = 0.933, Accuracy = 0.818). Both models balanced strong discriminative ability with interpretability, making them ideal for analytical and tactical applications in sports settings. Feature selection through RFE and subsequent SHAP analysis identified the most influential predictors of success, including TS%, DR, ST, and TO. Teams with higher efficiency and defensive control consistently exhibited higher probabilities of winning, while excessive turnovers and fouls reduced success likelihood. Spatial shot distribution analyses further indicated that efficient three-point shooting and limited mid-range reliance were strategic markers of winning performance. From a practical standpoint, these findings highlight that machine learning models combining interpretability and predictive precision can serve as powerful decision-support tools in professional basketball. Coaches and performance analysts can leverage such models to identify key performance indicators, optimize offensive and defensive strategies, and refine training priorities based on empirical evidence. In conclusion, the integration of explainable machine learning with traditional performance analysis provides a reproducible, transparent, and data-driven framework for understanding and enhancing competitive success in elite basketball.

Author Contributions

Conceptualization, P.F.F., S.P., K.L. and A.C.; methodology, P.F.F., C.K., J.L. and A.C.; software, C.K. and N.Z.; validation, C.K., D.P., T.S. and D.B.; formal analysis, P.F.F., C.K. and A.A.; investigation, G.K., A.A. and S.P.; data curation, C.K., P.F.F., T.S. and P.A.; writing—original draft preparation, P.F.F., G.K., C.K. and S.P.; writing—review and editing, P.F.F., C.K., G.K., A.A., N.Z., J.L., K.L. and A.C.; visualization, P.A., D.P. and D.B.; supervision, A.C., K.L. and P.F.F.; project administration, A.C., K.L. and P.F.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Prasetio, D. Predicting Football Match Results with Logistic Regression. In Proceedings of the 2016 International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), Penang, Malaysia, 16–19 August 2016; pp. 1–5. [Google Scholar]
Wang, J. Predictive Analysis of NBA Game Outcomes through Machine Learning. In Proceedings of the 6th International Conference on Machine Learning and Machine Intelligence, Chongqing, China, 27–29 October 2023; pp. 46–55. [Google Scholar]
Foteinakis, P.F.; Pavlidou, S.P. Game-Related Performance Metrics Differentiating Winning and Losing Teams in the Basketball Champions League. J. Phys. Educ. 2025, 36, e3647. [Google Scholar] [CrossRef]
Karipidis, A.; Fotinakis, P.; Taxildaris, K.; Fatouros, J. Factors Characterizing a Successful Performance in Basketball. J. Hum. Mov. Stud. 2001, 41, 385–398. [Google Scholar]
Ibáñez, S.J.; Sampaio, J.; Feu, S.; Lorenzo, A.; Gómez, M.A.; Ortega, E. Basketball Game-Related Statistics That Discriminate between Teams’ Season-Long Success. Eur. J. Sport Sci. 2008, 8, 369–372. [Google Scholar] [CrossRef]
Mateus, N.; Gonçalves, B.; Abade, E.; Leite, N.; Gómez, M.A.; Sampaio, J. Exploring Game Performance in NBA Playoffs. Kinesiology 2018, 50, 89–96. [Google Scholar] [CrossRef]
Mikołajec, K.; Maszczyk, A.; Zając, T. Game Indicators Determining Sports Performance in the NBA. J. Hum. Kinet. 2013, 37, 145–151. [Google Scholar] [CrossRef]
Ektirici, A. Game-Related Statistics Discriminating Winners and Losers in Turkish Basketball Super League: Effect of Home-Away Games. J. Sports Sci. Res. 2023, 8, 148–156. [Google Scholar] [CrossRef]
Mikołajec, K.; Banyś, D.; Żurowska-Cegielska, J.; Zawartka, M.; Gryko, K. How to Win the Basketball EuroLeague? Game Performance Determining Sports Results during 2003–2016 Matches. J. Hum. Kinet. 2021, 77, 287–296. [Google Scholar] [CrossRef]
Özmen, M.U. Marginal Contribution of Game Statistics to Probability of Winning at Different Levels of Competition in Basketball: Evidence from the Euroleague. Int. J. Sports Sci. Coach. 2016, 11, 98–107. [Google Scholar] [CrossRef]
Wong, S.; Ma, X.; Liu, J. Machine Learning in Sports Analytics: Emerging Applications and Future Directions. J. Sports Sci. Technol. 2024, 12, 12–29. [Google Scholar]
Bunker, R.P.; Thabtah, F. A Machine Learning Framework for Sport Result Prediction. Appl. Comput. Inf. 2019, 15, 27–33. [Google Scholar] [CrossRef]
Štrumbelj, E.; Vračar, P. Simulating a Basketball Match with a Homogeneous Markov Model and Forecasting the Outcome. Int. J. Forecast. 2016, 32, 538–547. [Google Scholar] [CrossRef]
Perin, C.; Vuillemot, R.; Fekete, J.D. Assessing the Interpretability of Tree-Based Ensemble Methods in Sports Analytics. Data Min. Knowl. Discov. 2021, 35, 73–110. [Google Scholar]
Pai, P.F.; ChangLiao, L.H.; Lin, K.P. Analyzing Basketball Games by a Support Vector Machines with Decision Tree Model. Neural Comput. Appl. 2017, 28, 4159–4167. [Google Scholar] [CrossRef]
García, J.; Ibáñez, S.J.; Gómez, M.A.; Sampaio, J. Basketball Game-Related Statistics Discriminating ACB League Teams According to Game Outcome and Final Score Differences. Int. J. Perform. Anal. Sport 2020, 20, 289–302. [Google Scholar] [CrossRef]
Cao, C. Sports Data Mining Technology Used in Basketball Outcome Prediction. Int. J. Comput. Sci. Sport 2012, 11, 20–29. [Google Scholar]
Petway, A.J.; Freitas, T.T.; Calleja-González, J.; Alcaraz, P.E. Machine Learning Approaches to Analyze and Predict Basketball Performance: A Systematic Review. Appl. Sci. 2021, 11, 647. [Google Scholar]
Zadravec, H. Machine Learning Approaches to Forecasting the Winner of the 2024 NBA Championship. In Proceedings of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, 3 October 2024; Lukač, N., Fister, I., Kohek, Š., Eds.; University Press of the University of Maribor: Maribor, Slovenia, 2024; pp. 53–56. [Google Scholar]
Horvat, T.; Job, J.; Logozar, R.; Livada, Č. A Data-Driven Machine Learning Algorithm for Predicting the Outcomes of NBA Games. Symmetry 2023, 15, 798. [Google Scholar] [CrossRef]
Tsagris, M.; Adam, C.; Pantatosakis, P. On Predicting an NBA Game Outcome from Half-Time Statistics. Discov. Artif. Intell. 2024, 4, 111. [Google Scholar] [CrossRef]
Lampis, T.; Ioannis, N.; Vasilios, V.; Stavrianna, D. Predictions of European Basketball Match Results with Machine Learning Algorithms. J. Sports Anal. 2023, 9, 171–190. [Google Scholar] [CrossRef]
Plakias, S.; Kokkotis, C.; Pantazis, D.; Tsatalas, T. Comparative Analysis of Key Performance Indicators in Euroleague and National Basketball Leagues. J. Phys. Educ. Sport 2024, 24, 1360–1372. [Google Scholar]
Giasemidis, G. Descriptive and Predictive Analysis of Euroleague Basketball Games and the Wisdom of Basketball Crowds. arXiv 2020, arXiv:2002.08465. Available online: https://arxiv.org/abs/2002.08465 (accessed on 5 May 2025). [CrossRef]
Foteinakis, P.F.; Pavlidou, S.P.; Chatzinikolaou, A.; Stavropoulos, N.; Michalopoulou, M. Optimizing Clutch Performance in EuroLeague Basketball: A Data-Driven Analysis of Late-Game Decision-Making. Balt. J. Health Phys. Act. 2025; forthcoming. [Google Scholar]
Papageorgiou, G.; Sarlis, V.; Tjortjis, C. Evaluating the Effectiveness of Machine Learning Models for Performance Forecasting in Basketball: A Comparative Study. Knowl. Inf. Syst. 2024, 66, 4333–4375. [Google Scholar] [CrossRef]
Li, R. Comparing Machine Learning Methods for NBA Game Outcome Prediction. In Proceedings of the 10th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 24–26 April 2025; IEEE: Chengdu, China, 2025; pp. 52–61. [Google Scholar]
Horvat, T.; Havaš, L.; Srpak, D. The Impact of Selecting a Validation Method in Machine Learning on Predicting Basketball Game Outcomes. Symmetry 2020, 12, 431. [Google Scholar] [CrossRef]
Ou-Yang, Y.; Hong, W.; Peng, L.; Mao, C.X.; Zhou, W.J.; Zheng, W.T.; Wang, Q.; Qi, F.; Li, X.W.; Chen, S.H.; et al. Explaining Basketball Game Performance with SHAP: Insights from Chinese Basketball Association. Sci. Rep. 2025, 15, 13793. [Google Scholar] [CrossRef]
Molnar, C.; Casalicchio, G.; Bischl, B. Interpretable Machine Learning—A Brief History, State-of-the-Art and Challenges. In ECML PKDD 2020 Workshops, Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Ghent, Belgium, 14–18 September 2020; Koprinska, I., Kamp, M., Appice, A., Loglisci, C., Antonie, L., Zimmermann, A., Guidotti, R., Özgöbek, Ö., Ribeiro, R.P., Gavaldà, R., et al., Eds.; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2020; Volume 1323, pp. 417–431. [Google Scholar]
Mountantonakis, M. Efficient Statistical Computation for K-Player Basketball Lineups Using Semilattice Structures. Electronics 2025, 14, 2104. [Google Scholar] [CrossRef]
Ambrutis, A.; Povilaitis, M. Composite Rating Method: Application to European Basketball Leagues. J. Sports Sci. 2024, 42, 201–214. [Google Scholar] [CrossRef]
Hopkins, W.G. Measures of Reliability in Sports Medicine and Science. Sports Med. 2000, 30, 1–15. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
Robertson, P.S. Man & Machine: Adaptive Tools for the Contemporary Performance Analyst. J. Sports Sci. 2020, 38, 2118–2126. [Google Scholar] [CrossRef]
Hosmer, D.W.; Lemeshow, S.; May, S. Applied Survival Analysis: Regression Modeling of Time to Event Data, 2nd ed.; Wiley-Blackwell: Hoboken, NJ, USA, 2011. [Google Scholar]
Miljković, D.; Gajić, L.; Kovačević, A.; Konjović, Z. The Use of Data Mining for Basketball Matches Outcomes Prediction. In Proceedings of the 2010 8th International Symposium on Intelligent Systems and Informatics, Subotica, Serbia, 10–11 September 2010; IEEE: Subotica, Serbia, 2010; pp. 309–312. [Google Scholar]
Pramanik, M.A.; Suzan, M.M.H.; Biswas, A.A.; Rahman, M.Z.; Kalaiarasi, A. Performance Analysis of Classification Algorithms for Outcome Prediction of T20 Cricket Tournament Matches. In Proceedings of the 2022 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 25–27 January 2022; IEEE: Coimbatore, India, 2022; pp. 1–7. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017), Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Long Beach, CA, USA, 2017; pp. 4765–4774. [Google Scholar]
Moustakidis, S.; Plakias, S.; Kokkotis, C.; Tsatalas, T.; Tsaopoulos, D. Predicting Football Team Performance with Explainable AI: Leveraging SHAP to Identify Key Team-Level Performance Metrics. Future Internet 2023, 15, 174. [Google Scholar] [CrossRef]
Li, C.; Zhang, H.; Zhang, Y.; Shen, J.; An, R. The application of artificial intelligence techniques in predicting game outcomes of professional basketball league: A systematic review. PLoS ONE 2025, 20, e0326326. [Google Scholar] [CrossRef]
Puente, C.; Coso, J.D.; Salinero, J.J.; Abián-Vicén, J. Basketball Performance Indicators During the ACB Regular Season from 2003 to 2013. Int. J. Perform. Anal. Sport 2015, 15, 935–948. [Google Scholar] [CrossRef]
Gómez, M.A.; Ibáñez, S.J.; Parejo, I.; Furley, P. The Use of Classification and Regression Tree When Classifying Winning and Losing Basketball Teams. Kinesiology 2017, 49, 47–56. [Google Scholar] [CrossRef]
Chen, W.J.; Jhou, M.J.; Lee, T.S.; Lu, C.J. Hybrid Basketball Game Outcome Prediction Model by Integrating Data Mining Methods for the National Basketball Association. Entropy 2021, 23, 477. [Google Scholar] [CrossRef] [PubMed]
Ou-Yang, Y.; Li, X.; Zhou, W.; Hong, W.; Zheng, W.; Qi, F.; Peng, L. Integration of Machine Learning XGBoost and SHAP Models for NBA Game Outcome Prediction and Quantitative Analysis Methodology. PLoS ONE 2024, 19, e0307478. [Google Scholar]
Foteinakis, P.F.; Pavlidou, S.P. Evolution of Three-Point Field Goals Shooting Trends in EuroLeague Basketball. TRENDS Sport Sci. 2024, 31, 207–213. [Google Scholar]
Durmuş, T.; Erdeveciler, Ö. Shot Selection Trends in Euroleague Basketball from 2013 to 2023. Perform. Anal. Sport Exerc. 2023, 2, 18–24. [Google Scholar]
Hu, Q. The Three-Point Revolution: A Profound Impact on NBA Game Strategy. Sci. Technol. Eng. Chem. Environ. Prot. 2024, 1, 1–8. [Google Scholar] [CrossRef]
Foteinakis, P.F.; Pavlidou, S.P. A Decade of Evolution: Comparative Analysis of Shooting Trends and Offensive Efficiency in the NBA and EuroLeague. Monten. J. Sports Sci. Med. 2025, 14, 13–19. [Google Scholar] [CrossRef]
.Foteinakis, P.; Pavlidou, S.; Stavropoulos, N. Analysis of the Effectiveness of Different Play Types in the End of Game Possessions of Close EuroLeague Matches. J. Hum. Sport Exerc. 2024, 19, 617–630. [Google Scholar] [CrossRef]
Giasemidis, G. Mid-Range Analysis of Euroleague—Part II—And Its Impact on Teams’ Performance, Player Stats and a Bonus. Available online: https://giasemidis.github.io/2024/04/13/euroleague-midrange-analysis.html (accessed on 7 August 2025).
Buyukcelebi, H.; Sahin, F.N.; Acak, M.; Uysal, H.Ş.; Sari, C.; Erkan, D.; Yatak, S.; Karayigit, R. Changes in Defensive Variables Determining Success in the NBA over the Last 10 Years. Appl. Sci. 2024, 14, 6696. [Google Scholar] [CrossRef]
Leicht, A.S.; Gómez, M.A.; Woods, C.T. Explaining Match Outcome during the Men’s Basketball Tournament at the Olympic Games. J. Sports Sci. Med. 2017, 16, 468–476. [Google Scholar]
Zhang, S.; Gomez, M.Á.; Yi, Q.; Dong, R.; Leicht, A.; Lorenzo, A. Modelling the Relationship between Match Outcome and Match Performances During the 2019 FIBA Basketball World Cup: A Quantile Regression Analysis. Int. J. Environ. Res. Public Health 2020, 17, 5722. [Google Scholar] [CrossRef]
Foteinakis, P.; Pavlidou, S. Positional Differences in the Efficacy of Critical End-of-Game Possessions in EuroLeague Basketball. Sport Mont 2024, 22, 25–31. [Google Scholar] [CrossRef]
Alonso Pérez-Chao, E.; Portes, R.; Ribas, C.; Lorenzo, A.; Leicht, A.S.; Gómez, M.Á. Impact of Spectators, League and Team Ability on Home Advantage in Professional European Basketball. Percept. Mot. Skills 2023, 131, 177–191. [Google Scholar] [CrossRef] [PubMed]
Bustamante-Sánchez, Á.; Gómez, M.A.; Jiménez-Saiz, S.L. Game Location Effect in the NBA: A Comparative Analysis of Playing at Home, Away and in a Neutral Court during the COVID-19 Season. Int. J. Perform. Anal. Sport 2022, 22, 370–381. [Google Scholar] [CrossRef]
Mochales Cuesta, I.; Jiménez-Sáiz, S.L.; Kelly, A.L.; Bustamante-Sánchez, Á. The Influence of Home-Court Advantage in Elite Basketball: A Systematic Review. J. Funct. Morphol. Kinesiol. 2024, 9, 192. [Google Scholar] [CrossRef]
López-García, A.; Alonso-Pérez-Chao, E.; Navarro Barragán, R.M.; Jiménez-Sáiz, S.L. Home-Court Advantage and Home Win Percentage in the NBA: An In-Depth Investigation by Conference and Team Ability. Appl. Sci. 2024, 14, 9989. [Google Scholar] [CrossRef]
Wang, S. Evaluating Cross-Building Transferability of Attention-Based Automated Fault Detection and Diagnosis for Air Handling Units: Auditorium and Hospital Case Study. Build. Environ. 2025, 287, 113889. [Google Scholar] [CrossRef]
Park, S.; Kim, J.; Kim, J.; Wang, S. Fault Diagnosis of Air Handling Units in an Auditorium Using Real Operational Labeled Data across Different Operation Modes. J. Comput. Civ. Eng. 2025, 39, 04025065. [Google Scholar] [CrossRef]

Figure 1. Receiver operating characteristic (ROC) curves for the (a) LR, (b) RF, (c) SVM and (d) NB models.

Figure 2. SHAP Summary Plot: Feature contributions to win probability in EuroLeague games.

Table 1. The game-related variables after expert review process.

Features	Description
2PTM	Two Points Made
2PTA	Two Points Attempted
2PT%	Two-Point Percentage
3PTM	Three Points Made
3PTA	Three Points Attempted
3PT%	Three-Point Percentage
FGM	Field Goal Made
FGA	Field Goal Attempted
FG%	Field Goal Percentage
FTM	Free Throw Made
FTA	Free Throw Attempted
FT%	Free Throw Percentage
OR	Offensive Rebounds
DR	Defensive Rebounds
AST	Assists
TO	Turnovers
ST	Steals
BLK	Blocks
PF	Personal Fouls
POSS	Possessions
OFF RTG	Offensive Rating
DEF RTG	Defensive Rating
eFG%	Effective Field Goal Percentage
TS%	True Scoring Percentage
SHOT_RANGE_PAINT	Paint Shots Frequency
SHOT_RANGE_MIDDLE	Mid-Range Shots Frequency
SHOT_RANGE_3P	Three-Point Shots Attempts Frequency
H/A	Home/Away Indicator

Table 2. Comparative results of classification performance metrics in EuroLeague game outcome prediction across the models.

Model	Accuracy	AUC	Recall	Precision	F1	Features
LR	0.818	0.933	0.803	0.828	0.815	16
RF	0.758	0.854	0.803	0.736	0.768	26
SVM	0.841	0.922	0.848	0.836	0.842	18
NB	0.652	0.789	0.333	0.917	0.489	26

Note: LR: Logistic Regression, RF: Random Forest, SVM: Support Vector Machine, NB: Naïve Bayes.

Table 3. Descriptive Statistics of the Selected Game-Related Performance Variables.

Game-Related Statistic	Winning Teams (M ± SD)	Losing Teams (M ± SD)
2PTM (Two-Point Field Goals Made)	22.84 ± 4.72	20.13 ± 4.66
2PT% (Two-Point Field Goal Percentage)	57.7% ± 8.88%	53.81% ± 7.72%
3PTM (Three-Point Field Goals Made)	9.97 ± 3.21	8.02 ± 3.26
3PTA (Three-Point Field Goals Attempted)	26.42 ± 6.31	24.35 ± 6.04
FGM (Total Field Goals Made)	32.76 ± 5.14	28.48 ± 5.22
FGA (Total Field Goals Attempted)	63.17 ± 6.98	61.42 ± 7.04
FTM (Free Throws Made)	14.53 ± 5.62	12.76 ± 5.21
OR (Offensive Rebounds)	10.76 ± 3.71	10.84 ± 3.76
DR (Defensive Rebounds)	24.46 ± 4.19	21.84 ± 3.78
TO (Turnovers)	11.64 ± 3.34	13.01 ± 3.38
ST (Steals)	7.04 ± 2.64	5.86 ± 2.42
BLK (Blocks)	2.73 ± 1.86	2.04 ± 1.54
PF (Personal Fouls)	19.97 ± 3.15	20.95 ± 3.56
POSS (Possessions)	72.17 ± 4.06	72.33 ± 4.09
TS% (True Shooting Percentage)	61.3% ± 6.72%	55.9% ± 7.14%
SHOT_RANGE_MIDDLE (Mid-Range Shot Frequency)	7.08 ± 3.38	7.09 ± 3.40
SHOT_RANGE_3P (Three-Point Shot Frequency)	26.22 ± 5.01	25.24 ± 5.35
H/A (Home/Away Indicator)	61.8% ± 48.7%	38.2% ± 48.7%

Note: M ± SD = Mean ± Standard Deviation. Variables represent two-point field goals made (2PTM), two-point percentage (2PT%), three-point field goals made (3PTM), three-point attempts (3PTA), total field goals made (FGM), total field goals attempted (FGA), free throws made (FTM), offensive rebounds (OR), defensive rebounds (DR), turnovers (TO), steals (ST), blocks (BLK), personal fouls (PF), possessions (POSS), true shooting percentage (TS%), mid-range shot frequency (SHOT_RANGE_MIDDLE), three-point shot frequency (SHOT_RANGE_3P), and home/away indicator (H/A).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Foteinakis, P.F.; Kokkotis, C.; Karamousalidis, G.; Avloniti, A.; Pavlidou, S.; Zaras, N.; Stampoulis, T.; Pantazis, D.; Aggelakis, P.; Balampanos, D.; et al. From Data to Decisions: Using Explainable Machine Learning to Predict EuroLeague Basketball Outcomes. Appl. Sci. 2025, 15, 12401. https://doi.org/10.3390/app152312401

AMA Style

Foteinakis PF, Kokkotis C, Karamousalidis G, Avloniti A, Pavlidou S, Zaras N, Stampoulis T, Pantazis D, Aggelakis P, Balampanos D, et al. From Data to Decisions: Using Explainable Machine Learning to Predict EuroLeague Basketball Outcomes. Applied Sciences. 2025; 15(23):12401. https://doi.org/10.3390/app152312401

Chicago/Turabian Style

Foteinakis, Panagiotis F., Christos Kokkotis, Georgios Karamousalidis, Alexandra Avloniti, Stefania Pavlidou, Nikolaos Zaras, Theodoros Stampoulis, Dimitrios Pantazis, Panagiotis Aggelakis, Dimitrios Balampanos, and et al. 2025. "From Data to Decisions: Using Explainable Machine Learning to Predict EuroLeague Basketball Outcomes" Applied Sciences 15, no. 23: 12401. https://doi.org/10.3390/app152312401

APA Style

Foteinakis, P. F., Kokkotis, C., Karamousalidis, G., Avloniti, A., Pavlidou, S., Zaras, N., Stampoulis, T., Pantazis, D., Aggelakis, P., Balampanos, D., Liu, J., Laparidis, K., & Chatzinikolaou, A. (2025). From Data to Decisions: Using Explainable Machine Learning to Predict EuroLeague Basketball Outcomes. Applied Sciences, 15(23), 12401. https://doi.org/10.3390/app152312401

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Data to Decisions: Using Explainable Machine Learning to Predict EuroLeague Basketball Outcomes

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition and Preparation

2.2. Data Splitting, Validation and Reliability Procedure

2.3. Feature Selection

2.4. Machine Learning Algorithms

2.5. Interpretation

3. Results

3.1. Model’s Performance

3.2. RF Interpretation

4. Discussion

4.1. Practical Applications

4.2. Limitations and Future Research

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI