Next Article in Journal
Stability Analysis of a Receiving-End VSC-HVDC System with Parallel-Connected VSCs
Next Article in Special Issue
AgriTransformer: A Transformer-Based Model with Attention Mechanisms for Enhanced Multimodal Crop Yield Prediction
Previous Article in Journal
Learning to Utilize Multi-Scale Feature Information for Crisp Power Line Detection
Previous Article in Special Issue
LocRecNet: A Synergistic Framework for Table Localization and Rectification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Basketball Team Strategies Through Predictive Analytics of Player Performance

1
Department of Computing Science and Mathematics, Dundalk Institute of Technology, A91 K584 Dundalk, Ireland
2
Regulated Software Research Center, Dundalk Institute of Technology, A91 K584 Dundalk, Ireland
3
Lero, the Research Ireland Centre for Software, V94 NYD3 Limerick, Ireland
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(11), 2177; https://doi.org/10.3390/electronics14112177
Submission received: 7 January 2025 / Revised: 24 April 2025 / Accepted: 27 April 2025 / Published: 27 May 2025

Abstract

:
This study explores the application of predictive analytics in evaluating player performance in the National Basketball Association (NBA), focusing on rebounds per game (REB), an essential component for better performance and results in basketball. The research employs a comparative analysis of machine learning (ML) models by leveraging a detailed NBA dataset. A key novelty lies in integrating advanced hyperparameter tuning and feature selection, enabling these models to capture complex relationships within the dataset. The Gradient Boosting Regressor demonstrated superior predictive performance, achieving an R² score of 0.8749 after tuning, with Linear Regression following closely at 0.8668. This study also highlights the importance of model interpretability and scalability, emphasizing the balance between predictive accuracy and usability for real-world decision-making. By offering actionable insights for optimizing player strategies and team performance, this research contributes to the growing body of knowledge in data-driven sports analytics and paves the way for more advanced applications in professional basketball management.

1. Introduction

In professional sports, accurately predicting player performance is essential in effective decision-making and maximizing game outcomes. The experts can make better team and game plan choices by applying the mathematical theory of evidence, which allows them to express and integrate subjective beliefs to enhance decisions precisely [1]. Player performance analysis is transformed by shifting to complex statistical models, such as machine learning (ML) algorithms, which offer a dynamic and accurate framework for athlete assessment. The research on predictive saccades of athletes has revealed information regarding the ability of players to adapt to game changes, which can impact overall performance significantly [2]. The significance of their actions in determining outcomes, such as committee decisions, which are collectively made by a panel or team, is often influenced by the probabilistic prediction value of individual contributions. This gives an idea of the importance of personal contributions to team performance and strategy [3].
The traditional fitness–fatigue models are mathematical frameworks used in sports science to predict and explain how training impacts performance over time and are improved by using ML [4]. This is done by integrating physiological data and multivariate algorithms, which advance performance predictions and provide a deeper understanding of athlete potential [5].
Professional sports performance and strategic planning have greatly improved by incorporating ML techniques into sports analytics, allowing for more accurate forecasts and data-driven decision-making. ML is increasingly used to improve the team’s training to prevent injuries and monitor performance in sports analytics [6,7].
ML can conduct video analysis for classifying strength and conditioning exercises, which helps achieve high accuracy with less engineering than deep learning. By achieving remarkable accuracy in predicting ski jump lengths, ML models have shown that there is a new opportunity for enhanced sports broadcasting and real-time performance analysis [8]. The heterogeneity in performance curves is formed by variability in basketball player performance measurements. This causes prediction challenges, requiring sophisticated methods to balance curve smoothness with covariate integration [9]. For the modeling of performance indicators’ joint distribution, Bayesian networks are crucial. This provides deeper insights into factors influencing game outcomes and performance dynamics [10]. There is a significant evolution from traditional metrics by evaluating defensive performance through technologies such as optical player-tracking systems, which offer a more comprehensive assessment of player actions [11,12].
The more holistic approach to player evaluation is achieved by understanding the physiological factors impacting performance: body composition and nutrient intake [13]. They provide detailed insights into players’ abilities and playing styles, informing coaching; player utilization strategies can be found by utilizing predictive models based on NBA tracking data [14,15]. The tailoring of game plans to player profiles is aided by discovering latent heterogeneity in shot selection through Bayesian nonparametric clustering [16]. Beyond sports, cognitive constraint modeling enables the optimization of digital and physical environments for improved human performance. This method of player evaluation and prediction highlights the increasing complexity and influence of sports analytics in improving professional sports operations and strategy [17].
In professional basketball, evaluating player performance is pivotal for optimizing team strategies and decision-making. Predictive analytics has emerged as a transformative tool in sports, enabling data-driven insights into key performance metrics such as rebounds per game (REB). One of the most crucial aspects of the basketball game is rebounding. A team gains possession of the basketball when a rebound is made, and the rebound is given to the player who catches the ball later on. The more rebounds there are, the higher the ball’s possession. And each possession benefits the team’s defense and offense, ultimately leading to a victory.
Despite its potential, traditional parametric models like Linear Regression are limited by their inability to account for nonlinear relationships and complex interactions inherent in player performance data [1]. On the other hand, advanced ML methods, such as Gradient Boosting Regressor, offer greater flexibility in capturing these dynamics while maintaining high predictive accuracy [8]. This research focuses on addressing the trade-offs between model simplicity and accuracy by exploring the effectiveness of parametric and nonparametric models in predicting REB. By leveraging techniques such as hyperparameter tuning and feature engineering, this study seeks to improve predictive accuracy and identify the factors that significantly influence performance outcomes [8]. Furthermore, it evaluates how these models can balance interpretability and scalability, ensuring their applicability in real-world decision-making scenarios [3,8]. Through robust ML methodologies, this study aims to contribute actionable insights to the growing field of sports analytics. Previous studies have demonstrated the potential of integrating ML techniques into sports analytics to optimize player utilization and enhance decision-making strategies [1,8].
The formulated hypothesis and related research questions of the study are as follows:
Hypothesis 1.
Advanced feature engineering, data preprocessing, and hyperparameter tuning significantly increase the ML models’ ability to predict future average rebounds per game (REB) for NBA players.
The research questions to investigate the hypothesis are:
  • What is the prediction accuracy level of parametric and non-parametric ML models when predicting REB for NBA players?
  • This question aims to evaluate and compare how effectively different types of ML models—both parametric and non-parametric—can predict average REB. By analyzing model performance using evaluation metrics such as R 2 , MSE, and RMSE, we identify which models best capture the patterns in basketball performance data.
  • What type of model, parametric or nonparametric, benefits most from optimization by tuning hyperparameters to predict REB for NBA players?
  • This question investigates the impact of hyperparameter tuning on model performance. Specifically, it examines whether parametric or non-parametric models show greater improvements in prediction accuracy when optimized, providing insights into the models’ adaptability and potential for enhancement in real-world applications.
The major contributions of this paper are as follows:
  • We conducted a comparative analysis of various parametric and non-parametric machine learning models to predict NBA player performance, with a focus on rebounds (REB).
  • We applied hyperparameter tuning techniques to optimize model performance, demonstrating that proper tuning significantly improves prediction accuracy in sports analytics.
The rest of the paper is structured as follows: Section 2 discusses the relevant literature in the analysis of predictive modeling techniques and optimization of predictive accuracy. Section 3 discusses the study’s methodology, including dataset details, data preparation, exploratory data analysis, feature selection, and ML models. In Section 4, the results of the model training and evaluation are presented. Section 5 discusses our research hypothesis and research questions. Section 6 concludes the study with future guidance.

2. Literature Review

2.1. Analysis of Predictive Modeling Techniques: Applications and Evaluations in NBA Player Performance

In recent years, predictive modeling has become a key component of sports analytics, offering data-driven insights into player performance. Accurately predicting metrics requires selecting appropriate models that can handle the complexity and variability of basketball data. This section introduces and evaluates a range of ML models commonly used in this domain, assessing their suitability and effectiveness for predicting NBA player performance.
Linear regression is very straightforward for modeling relationships between variables and is practical, making it suitable for predicting basketball player performance. Its simplicity allows even those without statistical training to grasp and use it efficiently. Because of the roles of explanatory variables and key concepts like r-squared, the statistical significance of regression coefficients is straightforward and intuitive. Thus, linear regression is a standard tool in educational settings for introducing statistical principles using basketball statistics [18]. Due to the low computational demands and efficiency, linear regression is highly valued across various sectors and finds applications beyond sports, such as predicting food content. This versatility shows that it can be used in multiple fields, which provides effectiveness in predictive modeling [19].
Even though linear regression provides a solid foundation, it is found that non-linear models often yield higher accuracy in more complex systems. For example, while capturing the intricate dynamics of non-linear relationships between anthropometric predictors and basketball performance, non-linear regression is more precise [20]. The complexity of player performance assessment is illustrated when integrating personality traits into performance prediction models. It was found that traits such as agreeableness and conscientiousness are significant predictors of performance in the studies using automated language-based analyses. This adds another layer of complexity to the analysis [21].
The RF model is quite effective for analyzing complex datasets, such as basketball analytics, which consists of player statistics, game events, and biometric data. This model is very effective in selecting essential features. This is achieved by breaking the data into smaller groups using RF-based multi-round screening (RFMS) [22]. This can be further improved using techniques like Mutual forest impact (MFI) and Mutual impurity reduction (MIR). These techniques help to understand how different features interact, which in turn helps improve model performance [23]. The RF model can also handle various data types, such as categorical, time-series, and numerical. Methods such as random similarity forests ensure the model works well with different kinds of data [24]. Techniques such as correlation-based feature selection (CFS) increase the model’s accuracy and efficiency using extensive data.
Gradient Boosting combines multiple weak learners, typically decision trees, to produce a strong predictive model. Its robustness comes from its iterative methodology, which learns from the residuals of previous models to minimize prediction errors. It can easily handle complex, non-linear relationships in sports analytics, improving accuracy with each iteration. Large datasets benefit significantly from the significant advancement in this field known as the gradient-boosted binary histogram ensemble (GBBHE), which increases convergence rates and computational efficiency [25].
In sports analytics, the KNN algorithm is very effective for making accurate predictions by utilizing the inherent characteristics of local data points. The performance metrics in sports data can vary significantly between contexts and conditions, and the algorithm’s high adaptability to changes is essential [26]. It has been found that KNN can improve predictive performance by using effective forward selection of predictor variables. This is useful in sports analytics, as it can result in more accurate predictions by choosing the most pertinent features, such as player statistics and game conditions [27]. Additionally, it can be found that KNN integration with other techniques has demonstrated high accuracy rates. For example, protein sequence coding in biological contexts has improved accuracy, indicating that sports analytics could benefit from applying similar hybrid approaches to improve prediction models [28].
Many studies have demonstrated the effectiveness of the MLP and other deep learning models in identifying patterns in basketball player performance. The studies on performance statistics of the MLP model, specifically its neural network architecture with a 21-7-3 design, have provided high accuracy in NBA player classification. The complex data is easily handled, identifying patterns and classifying clusters by this neural network architecture [29]. Furthermore, the MP-LSTM algorithm has achieved 94% recognition accuracy.
After reviewing the strengths and limitations of various predictive modeling techniques, we recognized that different models handle complexity and data structure in distinct ways. This led us to formulate our first research question. This question aims to evaluate which modeling approach better captures the variation of player performance, particularly in rebound prediction.

2.2. Optimizing Predictive Accuracy: The Critical Role of Hyperparameter Tuning in Sports Analytics

The model’s efficacy and precision can be increased by figuring out the best mix of hyperparameters. It has been found that there is a significant advancement in convergence rates and recovery performance by utilizing neural network-based auto-tuners. These are important for precise and timely sports predictions [30]. Evidence suggests that various algorithms perform differently based on their hyperparameters. This includes Feedforward Neural Networks (FFNN), RF, and Extreme XGB. Adjusting these hyperparameters is essential to obtaining the best outcomes [31].
In a study, it has been demonstrated that the model’s capacity to identify malware in Internet of Things devices correctly can be significantly impacted by modifying hyperparameters such as learning rate, neighborhood function, and number of neurons in the Self-organizing Maps (SOM) [32], which can also be applied for predictive analysis in sports.
Similarly, obtaining the best results from models such as RF and CNN strongly depends on particular hyperparameters. The hyperparameters of CNNs are learning rate, batch size, and the number of layers. The hyperparameter of RFs is several trees [33]. Overfitting or underfitting can be prevented, and the data can be effectively learned from the model by carefully adjusting hyperparameters such as the learning rate, maximum depth, and number of estimators for Extreme Gradient Boosting (XGB); the number of hidden layers and neurons for Feedforward Neural Networks (FFNN) also need to be adjusted, which in turn improves the accuracy and efficiency of the model [31].
Several hyperparameter optimization techniques can be used to obtain the best hyperparameter settings; for instance, Bayesian Optimization, Genetic Algorithms, and other metaheuristic optimization techniques perform better than manual tuning. These techniques offer reduced computational overhead and increased performance [34,35].
Through this review, we observed that model performance in sports analytics greatly depends on the careful tuning of hyperparameters, which can significantly impact accuracy and prevent issues like overfitting. This realization led us to our second research question. This question aims to explore which modeling approach gains more from hyperparameter tuning when applied to the task of rebound prediction.

3. Methodology

This section explains the methodology, covering the dataset used, data preparation, feature engineering, and exploratory data analysis to gain insights and prepare the data for modeling. The step-by-step procedure is shown in Figure 1.

3.1. Dataset

The National Basketball Association (NBA) dataset is an open-access dataset available in Kaggle [36]. The dataset contains 1340 entries and 22 features associated with players’ performance, each representing an individual NBA player. The description of each feature is given in Table 1. The data was collected over five years. The target feature is REB, which is the average rebounds per game.

3.2. Exploratory Data Analysis

First, we conducted exploratory data analysis on raw data to uncover patterns and relationships in the dataset. Table 2 provides the descriptive statistics for rebounds by type, and Figure 2 presents the distribution of rebounds per game (REB), offering a deeper understanding of the spread of REB, which comprises three key metrics: ‘OREB’ is the average number of offensive rebounds per game; ‘DREB’ is the average number of defensive rebounds per game; and ‘REB’ is the total average number of rebounds per game. The REB distribution is right-skewed, indicating that most players had lower rebound averages, with a few exhibiting exceptionally high values.
Figure 3 is a scatter plot that provides details regarding the clear trend where players with more minutes per game tend to record higher REB. This pattern holds for both groups, mainly those who did and did not meet the target performance. It was found that the trend is more pronounced among players who achieved the target, which is denoted by green dots, which shows that increased playing time often correlates with better rebounding performance.
Figure 4 is a box plot that compares the distribution of REBs between players who have met or have not met the target over five years. The players who met the target, which is denoted by the orange box or 1 on the x-axis, show higher median REB and a wider range of values compared to their counterparts. This gives the importance of rebounding in achieving long-term player success. These plots provide an understanding of the influence of minutes per game and efficiency on rebounding performance, and also give information regarding essential indicators for evaluating basketball players’ success and potential.

3.3. Data Preprocessing

Several data preprocessing techniques have been applied to clean the data. First, we checked for missing values in the dataset, and no missing values were found. Then, we removed the duplicate rows with all the same entries, which resulted in 12 duplicate rows, and removed them. The next step involves checking the outliers. Figure 2 shows that the data is right-skewed, so we have used the Interquartile Range (IQR) method to remove the outliers [37]. First, we determined outlier thresholds using the IQR method by calculating the first and third quartiles and defining limits based on 1.5 times the IQR. Next, we identified outliers by checking whether data points fell outside these thresholds. Finally, we handled the detected outliers by replacing them with the respective threshold values. Figure 5 shows an example of detected outliers in the ‘REB’ feature.
By using this technique, the impact of extreme values, which could skew the analysis and the model’s predictions, can be eliminated. This ensures that data remains robust by reducing the influence of extremes while retaining as much data as possible.

Feature Selection

Feature selection was conducted to identify the most relevant features for predicting REB. We employed Recursive Feature Elimination (RFE) [38] using a Linear Regression estimator to select the most relevant features from our training dataset. RFE iteratively removed the least essential predictors until the top 10 features were retained, the selected features were PTS, FGM, 3p_made, FTM, FTA, OREB, DRAB, STL, BLK, and TOV. The selected features were then used to construct reduced training and testing sets, ensuring the model leveraged only the most informative predictors. Finally, all ML models were trained on these selected features and evaluated using the MSE and R 2 score.

3.4. Machine Learning Models

In sports predictive analysis, various regression models are utilized to forecast outcomes such as player performance, game results, and injury probabilities. In our study, we applied multiple regression models—Linear Regression, RF Regressor, Gradient Boosting Regressor, K-Neighbors Regressor, and Multi-Layer Perceptron (MLP) Regressor—to the dataset to evaluate their predictive performance for enhancing basketball team strategies.

3.4.1. Linear Regression

Linear Regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The hyperparameter, ’fit_intercept=True,’ ensures the model includes an intercept term in the equation, allowing it to fit the data better if the target variable does not naturally pass through the origin [39].

3.4.2. Random Forest Regressor

RF is an ensemble learning method that constructs multiple decision trees and outputs the average prediction. It has been used in sports to predict athlete performance metrics [40]. The hyperparameters n_estimators = 100, max_depth = 10, min_samples_split = 4, and min_samples_leaf = 2 help balance model complexity and overfitting. It performs well with high-dimensional data and captures intricate feature interactions, but its main drawback is its computational expense and reduced interpretability compared to simpler models [41].

3.4.3. Gradient Boosting Regressor

XGB builds models sequentially, with each new model correcting errors made by previous ones. With hyperparameters n_estimators = 100, learning_rate = 0.01, and max_depth = 3, GBR balances accuracy and overfitting, making it effective for match outcome predictions and player valuation. It excels in handling complex, non-linear relationships but requires careful hyperparameter tuning to prevent excessive computational costs [42].

3.4.4. K-Neighbors Regressor

The KNN predicts the value of a target variable based on the average of the ‘k’ nearest neighbors in the feature space. Its simplicity allows for easy implementation in sports analytics to predict outcomes based on similar historical instances. However, KNN can be sensitive to the choice of ‘k’ and the distance metric used, and it may struggle with high-dimensional data common in sports analytics [43].

3.4.5. Multi-Layer Perceptron Regressor

An MLP Regressor is an artificial neural network consisting of multiple layers of nodes, including input, hidden, and output layers. Each node uses a nonlinear activation function, enabling the network to capture complex, nonlinear relationships in data [39,44,45].
Table 3 summarizes the ML models and their associated hyperparameters. This comparison highlights the specific hyperparameters considered for each model, showcasing the diversity in their configurations and optimization potential.

4. Results

The results obtained after training and evaluating the ML models are discussed in this section. The dataset was split into training and testing sets using an 80/20 ratio. The selected models were trained on the training set, and their predictive accuracy was found using a test set.

4.1. Performance Matrix

The model evaluation is carried out by using several performance metrics, such as mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R2) score. These metrics offer a thorough evaluation of the performance of every model. The model precision is calculated by MSE and RMSE, which measures the average squared and absolute differences between the predicted and actual values, respectively. The MAE offered an average absolute error measure that provides the prediction error’s magnitude. The R2 score indicated the proportion of variance in the target variable explained by the model.

4.2. Model Training and Evaluation

The trained model was evaluated on the test dataset. Table 4 provides the Comparison of Regression Models. Table 4 summarizes our experimental results for predicting REB. Notably, Linear Regression (MSE = 0.6133, RMSE = 0.7831, MAE = 0.5834, R 2 = 0.8668) and Gradient Boosting Regressor (MSE = 0.6282, RMSE = 0.7926, MAE = 0.5896, R 2 = 0.8636) are the best-performing models, exhibiting lower error metrics and higher R 2 scores. In comparison, while the RF Regressor shows competitive performance ( R 2 = 0.8564), the K-Neighbors Regressor ( R 2 = 0.5274) and MLP Regressor ( R 2 = 0.8244) yield a higher number of errors, suggesting that more straightforward linear approaches or boosting methods may be more appropriate for this prediction task.
Initial evaluations showed that Linear Regression and Gradient Boosting Regressor performed well compared to other models, with high R 2 scores and low error metrics.

4.3. Hyperparameter Tuning with Grid Search

This section will discuss the details of hyperparameter tuning with a grid search to fine-tune the hyperparameter. Hyperparameter tuning optimizes the ML models and enhances their performance. This is done for the best-performing models: Linear Regression and Gradient Boosting Regressor. For hyperparameter tuning, 5-fold cross-validation was performed via GridSearchCV. For every set of hyperparameters, multiple models were trained as part of the Grid Search process, and their performance was assessed using the validation set’s R-squared ( R 2 ) score. This provides a robust understanding of how different hyperparameters influence model performance. The hyper-parameters that minimized the MSE, RMSE, and MAE and maximized the R 2 score were chosen as shown in Table 5.
The grid search for optimal hyperparameters for Linear Regression (‘fit_intercept = False’) suggests that the dataset is well-preprocessed, eliminating the need for an intercept term, likely due to feature scaling or centering. For the XGB Regressor, the selected parameters (‘learning_rate = 0.1’, ‘max_depth = 3’, ‘n_estimators = 300’) indicate a balance between learning complexity and generalization. A moderate learning rate prevents overfitting while ensuring convergence, a tree depth of 3 captures essential feature interactions without excessive complexity, and 300 estimators provide sufficient iterations for robust predictions.
Table 6 provides a comparison of regression models after hyperparameter tuning. It is found that the best model configurations for Linear Regression are the optimal settings of fit_intercept and normalize and for Gradient Boosting Regressor, the optimal settings of n_estimators, learning_rate, and max_depth. These tuned models were then evaluated on the test set. It can be found that the model performance has increased compared to not-tuned models. The final models were made accurate and generalized by carefully adjusting the hyperparameters, meaning these approaches can be utilized in real-world scenarios effectively.

5. Discussion

Our study investigated how well different parametric and nonparametric ML models could predict REB for NBA players. The prior studies discussed in the literature review align with our research, which supports the effectiveness of nonparametric models after hyperparameter tuning in capturing complex patterns that influence REB statistics [20].
Our findings are built on these concepts by measuring the predictive accuracy through various metrics, including R², MSE, RMSE, and MAE. In contrast, R² measures the proportion of variance explained by the model. Metrics such as MSE and RMSE quantify the average squared and absolute differences between predicted and actual values, offering insights into the magnitude of prediction errors. By combining these metrics, we ensure a balanced evaluation of the model’s ability to capture patterns in the data while minimizing error
RQ1. What is the prediction accuracy level of parametric and non-parametric ML models when it comes to the prediction of REB for NBA players?
It was found that, among all nonparametric models, the Gradient Boosting Regressor and RF Regressor are best for predicting REB. The Gradient Boosting Regressor achieved an R 2 score of 0.8636, and the RF Regressor achieved an R 2 score of 0.8564. Interestingly, a parametric model, such as Linear Regression, achieved a slightly higher R 2 value of 0.8668, showing the ability to capture REB variance despite the model’s simplicity effectively. These results underscore the fact that nonparametric models are typically more flexible in handling nonlinear relationships. However, parametric models can still perform well, depending on the nature of the data and the features used.
RQ2. What type of model, parametric or nonparametric, benefits most from optimizing hyperparameters to predict REB for NBA players?
The prediction accuracy of the ML models can be enhanced by hyperparameter tuning. It was found that, among all ML models, the ability of the Gradient Boosting Regressor in terms of R 2 value increased from 0.8636 to 0.8749 after optimization. This underscores the importance of model-specific fine-tuning, which was very beneficial for non-parametric ML models. The findings underscore how parameter adjustment can mitigate overfitting and underfitting, improving the model’s predictive accuracy [31].
Our study employs “GridSearchCV” to perform a systematic and exhaustive search over predefined hyperparameter values, optimizing key hyperparameters such as ‘n_estimators’ and ‘learning_rate’ for the Gradient Boosting Regressor to enhance predictive performance. The parameter grid was carefully tailored to balance computational efficiency with model accuracy. This methodological innovation allows our models to predict outcomes with higher accuracy and can be utilized in real-world scenarios effectively. In our study, we compared the effectiveness of both parametric and nonparametric models in predicting NBA players’ performance, such as REB.
The parametric Linear Regression model demonstrated a strong ability to predict outcomes with an R 2 score of 0.8668. This suggests that it can explain approximately 86.68% of the variance in REB from the variables used. The non-parametric XGB Regressor showed a lower initial R 2 value of 0.8636 but benefited from hyperparameter tuning, which improved its R 2 value to 0.8749.
Our study also deeply explains model applicability in sports in real time. The most precise of our predictive models was the Gradient Boosting Regressor with an R 2 value of 0.8749, which can help teams to make well-informed strategic decisions like optimizing player rotations and game tactics, which in turn minimize injury risks by more accurately predicting player fatigue levels [5]. In addition, predictive models can help create personalized training programs based on players’ performance metrics. This helps to maximize player performance and overall team efficiency [6].
Together, the findings from RQ1 and RQ2 support our hypothesis: Advanced feature engineering, data preprocessing, and hyperparameter tuning significantly increase the ML models’ ability to predict future average rebounds per game (REB) for NBA players. Our work demonstrates that while parametric models can perform well with proper data design, non-parametric models—especially when tuned—offer superior flexibility and predictive power in complex sports data environments.

6. Conclusions and Future Work

In this research, we explored the effectiveness of predictive analytics through various regression models to predict player performance, such as REB. Our findings indicate that predictive analytics can enhance basketball team strategies by providing data-driven insights. Two models proved most effective after hyperparameter tuning: Gradient Boosting and Linear Regression models. Among these two models, the non-parametric Gradient Boosting model performed well.
The major limitation observed is that even after fine-tuning the hyperparameters, the performance of different models did not increase significantly. This highlights the need for a larger dataset to improve the model’s performance and better generalization in predictive analysis.
Due to the low availability of the dataset, the ML models have been utilized in this study. Nevertheless, deep learning (DL) techniques can also be explored in larger datasets, which will help identify more complex patterns in future work. A comparative study between DL and ML models can be examined to better understand their implementation. Our analysis is mainly focused on predicting REBs; other features, such as the prediction of the number of years played, can be explored. Our study underscores the ability of ML to transform the sports field by offering a dynamic approach to player performance analysis. By accurately predicting REB, teams can optimize training, develop game strategies, and effectively utilize the game’s players, enhancing team performance. As these approaches become more advanced and prevalent, a new age of data-driven sports management could emerge, reshaping the norms of operations and competition in the NBA and other sports leagues.

Author Contributions

Conceptualization, A.K.; Investigation, R.C.; Writing—original draft, R.C.; Writing—review & editing, R.C. and P.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially funded with the financial support of Taighde Éireann–Research Ireland under Grant number 13/RC/2094.

Data Availability Statement

The original data presented in the study are openly available in Kaggle at DATA.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RFRandom Forest
CNNConvolutional Neural Networks
MLMachine Learning
DLDeep Learning
MLPMulti-Layer Perceptron
MSEMean Squared Error
RMSERoot Mean Squared Error
MAEMean Absolute Error
KNNK-Neighbors
NBANational Basketball Association
FFNNFeedforward Neural Networks
SOMSelf-organizing Maps
XGBGradient Boosting

References

  1. Biichner, A.G.; Dubitzky, W.; Schuster, A.; Lopes, P.; O’Doneghue, P.G.; Hughes, J.; Bell, D.A.; Adamson, K.; White, J.A.; Anderson, J.M.C.C.; et al. Corporate evidential decision making in performance prediction domains. arXiv 2013, arXiv:1302.1523. [Google Scholar]
  2. Vater, C.; Mann, D.L. Are predictive saccades linked to the processing of peripheral information? Psychol. Res. Psychol. Forsch. 2022, 87, 1501–1519. [Google Scholar] [CrossRef] [PubMed]
  3. Koster, M.; Kurz, S.; Lindner, I.; Napel, S. The Prediction Value. Soc. Choice Welf. 2017, 48, 433–460. [Google Scholar] [CrossRef]
  4. Wang, H.; Gao, S.; Wang, B.; Ma, Y.; Guo, Z.; Zhang, K.; Yang, Y.; Yue, X.; Hou, J.; Huang, H.; et al. Recent advances in machine learning-assisted fatigue life prediction of additive manufactured metallic materials: A review. J. Mater. Sci. Technol. 2024, 198, 111–136. [Google Scholar] [CrossRef]
  5. Imbach, F.; Sutton-Charani, N.; Montmain, J.; Candau, R.; Perrey, S. The Use of Fitness-Fatigue Models for Sport Performance Modelling: Conceptual Issues and Contributions from Machine-Learning. Sport Med. Open 2022, 8, 29. [Google Scholar] [CrossRef]
  6. Singh, A.; Bevilacqua, A.; Nguyen, T.L.; Hu, F.; McGuinness, K.; OReilly, M.; Whelan, D.; Caulfield, B.; Ifrim, G. Fast and robust video-based exercise classification via body pose tracking and scalable multivariate time series classifiers. Data Min. Knowl. Discov. 2022, 37, 873–912. [Google Scholar] [CrossRef]
  7. Jaiswal, P.; Kaushik, A.; Lawless, F.; Malaquias, T.; McCaffery, F. Preliminary Investigation on Machine Learning and Deep Learning Models for Change of Direction Classification in Running. In International Conference on Intelligent Data Engineering and Automated Learning; Springer: Cham, Switzerland, 2024; pp. 180–191. [Google Scholar]
  8. Rossi, A.; Perri, E.; Pappalardo, L.; Cintia, P.; Alberti, G.; Norman, D.; Iaia, F.M. Wellness Forecasting by External and Internal Workloads in Elite Soccer Players: A Machine Learning Approach. Front. Physiol. 2022, 13, 896928. [Google Scholar] [CrossRef]
  9. Guo, J.; Czarnecki, K.; Apely, S.; Siegmundy, N.; Wasowski, A. Variability-aware performance prediction: A statistical learning approach. In Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; pp. 301–311. [Google Scholar] [CrossRef]
  10. Transtrum, M.K.; Machta, B.B.; Sethna, J.P. Why are nonlinear fits to data so challenging. Phys. Rev. Lett. 2010, 104, 060201. [Google Scholar] [CrossRef]
  11. Sodhi, H.S. Kinanthropometry and performance of top ranking Indian basketball players. Br. J. Sport Med. 1980, 14, 139–144. [Google Scholar] [CrossRef]
  12. Sampaio, J.; McGarry, T.; Calleja-González, J.; Sáiz, S.L.J.; i del Alcázar, X.S.; Balciunas, M. Exploring Game Performance in the National Basketball Association Using Player Tracking Data. PLoS ONE 2015, 10, e0132894. [Google Scholar] [CrossRef]
  13. Nishisaka, M.M.; Zorn, S.; Kristo, A.S.; Sikalidis, A.K.; Reaves, S.K. Assessing Dietary Nutrient Adequacy and the Effect of Season—Long Training on Body Composition and Metabolic Rate in Collegiate Male Basketball Players. Sports 2022, 10, 127. [Google Scholar] [CrossRef] [PubMed]
  14. Hauri, S.; Djuric, N.; Radosavljevic, V.; Vucetic, S. Multi-Modal Trajectory Prediction of NBA Players. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 1639–1648. [Google Scholar] [CrossRef]
  15. Chang, J.C. Predictive Bayesian selection of multistep Markov chains, applied to the detection of the hot hand and other statistical dependencies in free throws. R. Soc. Open Sci. 2019, 6. [Google Scholar] [CrossRef] [PubMed]
  16. Vaci, N.; Cocić, D.; Gula, B.; Bilalić, M. Large data and Bayesian modeling-aging curves of NBA players. Behav. Res. Methods 2019, 51, 1544–1564. [Google Scholar] [CrossRef]
  17. Shen, W. Analysis of Professional Basketball Field Goal Attempts via a Bayesian Matrix Clustering Approach. J. Comput. Graph. Stat. 2022, 32, 49–60. [Google Scholar] [CrossRef]
  18. Arnold, T.; Godbey, J.M. Introducing Linear Regression: An Example Using Basketball Statistics. Soc. Sci. Res. Netw. 2012. [Google Scholar] [CrossRef]
  19. Xie, X. Analysis on the Application of Linear Regression in Various Fields. 2020. Available online: https://www.clausiuspress.com/conferences/LNEMSS/ICEMGD%202020/368.pdf (accessed on 26 April 2025).
  20. Siemon, D.; Ahmad, R.; Huttner, J.P.; Robra-Bissantz, S. Predicting the Performance of Basketball Players Using Automated Personality Mining. In Proceedings of the Twenty-fourth Americas Conference on Information Systems, New Orleans, LA, USA, 16–18 August 2018. [Google Scholar]
  21. Bavencoff, F.; Vanpeperstraete, J.M.; Cadre, J.P. Performance analysis of optimal data association within a linear regression framework. In Proceedings of the 2005 7th International Conference on Information Fusion, Philadelphia, PA, USA, 25–28 July 2005. [Google Scholar] [CrossRef]
  22. Hanczár, G.; Stippinger, M.; Hanák, D.; Kurbucz, M.T.; Törteli, O.M.; Chripkó, Á.; Somogyvári, Z. Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS). arXiv 2023. [Google Scholar] [CrossRef]
  23. Voges, L.F.; Jarren, L.C.; Seifert, S. Opening the random forest black box by the analysis of the mutual impact of features. arXiv 2023. [Google Scholar] [CrossRef]
  24. Piernik, M. Random Similarity Forests. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
  25. Ustimenko, A.; Prokhorenkova, L.O.; Malinin, A. Uncertainty in Gradient Boosting via Ensembles. arXiv 2020, arXiv:2006.10562. [Google Scholar]
  26. Rittler, N.; Chaudhuri, K. A Two-Stage Active Learning Algorithm for k-Nearest Neighbors. arXiv 2022. [Google Scholar] [CrossRef]
  27. Pei, E.; Fokoué, E. Improving the Predictive Performances of k Nearest Neighbors Learning by Efficient Variable Selection. arXiv 2022. [Google Scholar] [CrossRef]
  28. Gui, Y.; Wang, X. Application of K-nearest neighbors in protein-protein interaction prediction. In Highlights in Science, Engineering and Technology; Darcy & Roy Press: Hillsboro, OR, USA, 2022. [Google Scholar] [CrossRef]
  29. Chi, Y.N.; Chi, J. A Mixed Model for Performance-Based Classification of NBA Players: Performance-Based Classification of NBA Players. Int. J. Data Sci. Adv. Anal. 2021, 3, 36–46. [Google Scholar] [CrossRef]
  30. Gao, D.; Guo, Q.; Jin, M.; Liao, G.; Eldar, Y.C. Hyper-Parameter Auto-Tuning for Sparse Bayesian Learning. arXiv 2022. [Google Scholar] [CrossRef]
  31. Bhattacharyya, A.; Vaughan, J.; Nair, V.N. Behavior of Hyper-Parameters for Selected Machine Learning Algorithms: An Empirical Investigation. arXiv 2022. [Google Scholar] [CrossRef]
  32. Nguyen, H.N. Tuning hyperparameters of self-organizing maps in combination with k-nearest neighbors for iot malware detection. J. Sci. Technol. 2023, 12. [Google Scholar] [CrossRef]
  33. Raji, I.D.; Bello-Salau, H.; Umoh, I.J.; Onumanyi, A.J.; Adegboye, M.A.; Salawudeen, A.T. Simple Deterministic Selection-Based Genetic Algorithm for Hyperparameter Tuning of Machine Learning Models. Appl. Sci. 2022, 12, 1186. [Google Scholar] [CrossRef]
  34. Eimer, T.; Lindauer, M.; Raileanu, R. Hyperparameters in Reinforcement Learning and How To Tune Them. arXiv 2023. [Google Scholar] [CrossRef]
  35. Roy, S.; Mehera, R.; Pal, R.K.; Bandyopadhyay, S.K. Hyperparameter Optimization for Deep NeuralNetwork Models: A Comprehensive Study onMethods and Techniques. Res. Sq. 2023, preprint. [Google Scholar] [CrossRef]
  36. Yakhyojon. National Basketball Association (NBA) Dataset. Kaggle dataset. 2024. Available online: https://www.kaggle.com/datasets/yakhyojon/national-basketball-association-nba (accessed on 26 April 2025).
  37. Dash, C.S.K.; Behera, A.K.; Dehuri, S.; Ghosh, A. An outliers detection and elimination framework in classification task of data mining. Decis. Anal. J. 2023, 6, 100164. [Google Scholar] [CrossRef]
  38. Chen, X.w.; Jeong, J.C. Enhanced recursive feature elimination. In Proceedings of the Sixth international conference on machine learning and applications (ICMLA 2007), Cincinnati, OH, USA, 13–15 December 2007; pp. 429–435. [Google Scholar]
  39. Baboota, R.; Kaur, H. Predictive analysis and modelling football results using machine learning approach for English Premier League. Int. J. Forecast. 2019, 35, 741–755. [Google Scholar] [CrossRef]
  40. Sahin, E.K. Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Appl. Sci. 2020, 2, 1308. [Google Scholar] [CrossRef]
  41. Salman, H.A.; Kalakech, A.; Steiti, A. Random forest algorithm overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef] [PubMed]
  42. Zhang, Z.; Zhao, Y.; Canes, A.; Steinberg, D.; Lyashevska, O. Predictive analytics with gradient boosting in clinical medicine. Ann. Transl. Med. 2019, 7, 152. [Google Scholar] [CrossRef] [PubMed]
  43. Halder, R.K.; Uddin, M.N.; Uddin, M.A.; Aryal, S.; Khraisat, A. Enhancing K-nearest neighbor algorithm: A comprehensive review and performance analysis of modifications. J. Big Data 2024, 11, 113. [Google Scholar] [CrossRef]
  44. Sayed, E.H.; Alabrah, A.; Rahouma, K.H.; Zohaib, M.; Badry, R.M. Machine Learning and Deep Learning for Loan Prediction in Banking: Exploring Ensemble Methods and Data Balancing. IEEE Access 2024, 12, 193997–194019. [Google Scholar] [CrossRef]
  45. Maszczyk, A.; Gołaś, A.; Pietraszewski, P.; Roczniok, R.; Zając, A.; Stanula, A. Application of neural and regression models in sports results prediction. Procedia-Soc. Behav. Sci. 2014, 117, 482–487. [Google Scholar] [CrossRef]
Figure 1. Procedure followed during this study.
Figure 1. Procedure followed during this study.
Electronics 14 02177 g001
Figure 2. Distribution plot of REB.
Figure 2. Distribution plot of REB.
Electronics 14 02177 g002
Figure 3. REB in relation to minutes per game.
Figure 3. REB in relation to minutes per game.
Electronics 14 02177 g003
Figure 4. REB made by players who have played for 5 Years or not.
Figure 4. REB made by players who have played for 5 Years or not.
Electronics 14 02177 g004
Figure 5. Detected outliers in the ‘REB’ feature.
Figure 5. Detected outliers in the ‘REB’ feature.
Electronics 14 02177 g005
Table 1. Description of NBA Player Statistics.
Table 1. Description of NBA Player Statistics.
Column NameColumn Description
nameNBA player’s name
GPTotal games played
MINAverage minutes played per game
PTSMean points scored per game
FGMAverage field goals successfully made per game
FGAMean field goal attempts per game
FGPercentage of successful field goals per game
3p_madeAverage three-point shots successfully made per game
3PAMean attempts for three-point shots per game
3PSuccess rate of three-point shots per game
FTMAverage number of free throws made per game
FTAMean free throw attempts per game
FTFree throw success percentage
OREBMean offensive rebounds per game
DREBAverage defensive rebounds per game
ASTMean assists per game
STLAverage number of steals per game
BLKMean blocks recorded per game
TOVAverage turnovers per game
target_5yrs1 if the player’s career lasts at least 5 years, otherwise 0
REBTotal average rebounds per game
Table 2. Descriptive Statistics for Rebounds.
Table 2. Descriptive Statistics for Rebounds.
OREBDREBREB
count1328.001328.001328.00
mean1.012.033.04
std0.781.362.06
min0.000.200.30
25%0.401.001.50
50%0.801.702.50
75%1.402.604.00
max5.309.6013.90
Table 3. Comparison of Models Used and Their Hyperparameters.
Table 3. Comparison of Models Used and Their Hyperparameters.
ModelHyperparameters Used
Linear Regressionfit_intercept = True
XGB Regressorn_estimators = 100, learning_rate = 0.01, max_depth = 3
RF Regressorn_estimators = 100, max_depth = 10, min_samples_split = 4, min_samples_leaf = 2
KNNn_neighbors = 10, weights = uniform
MLP Regressorhidden_layer_sizes = (50, 50), activation = relu, solver = adam, learning_rate = 0.001, max_iter = 200
Table 4. Comparison of Regression Models.
Table 4. Comparison of Regression Models.
ModelMSERMSEMAE R 2 Score
Linear Regression0.61330.78310.58340.8668
RF Regressor0.66150.81330.58040.8564
XGB Regressor0.62820.79260.58960.8636
KNN Regressor2.17651.47531.12400.5274
MLP Regressor0.80890.89940.66740.8244
Table 5. Hyperparameter Tuning Grids for Regression Models.
Table 5. Hyperparameter Tuning Grids for Regression Models.
ModelParameterValues
Linear Regressionfit_interceptTrue, False
XGB Regressorn_estimators100, 200, 300
learning_rate0.001, 0.01, 0.1
max_depth2, 3, 4
Table 6. Comparison of Regression Models after Hyper-parameter Tuning.
Table 6. Comparison of Regression Models after Hyper-parameter Tuning.
ModelMSERMSEMAE R 2 Score
Best Linear Regression0.61330.78310.58340.8668
Best Gradient Boosting Regressor0.57610.75900.57110.8749
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chandru, R.; Kaushik, A.; Jaiswal, P. Enhancing Basketball Team Strategies Through Predictive Analytics of Player Performance. Electronics 2025, 14, 2177. https://doi.org/10.3390/electronics14112177

AMA Style

Chandru R, Kaushik A, Jaiswal P. Enhancing Basketball Team Strategies Through Predictive Analytics of Player Performance. Electronics. 2025; 14(11):2177. https://doi.org/10.3390/electronics14112177

Chicago/Turabian Style

Chandru, Roshan, Abhishek Kaushik, and Pranay Jaiswal. 2025. "Enhancing Basketball Team Strategies Through Predictive Analytics of Player Performance" Electronics 14, no. 11: 2177. https://doi.org/10.3390/electronics14112177

APA Style

Chandru, R., Kaushik, A., & Jaiswal, P. (2025). Enhancing Basketball Team Strategies Through Predictive Analytics of Player Performance. Electronics, 14(11), 2177. https://doi.org/10.3390/electronics14112177

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop