1. Introduction
In professional sports, accurately predicting player performance is essential in effective decision-making and maximizing game outcomes. The experts can make better team and game plan choices by applying the mathematical theory of evidence, which allows them to express and integrate subjective beliefs to enhance decisions precisely [
1]. Player performance analysis is transformed by shifting to complex statistical models, such as machine learning (ML) algorithms, which offer a dynamic and accurate framework for athlete assessment. The research on predictive saccades of athletes has revealed information regarding the ability of players to adapt to game changes, which can impact overall performance significantly [
2]. The significance of their actions in determining outcomes, such as committee decisions, which are collectively made by a panel or team, is often influenced by the probabilistic prediction value of individual contributions. This gives an idea of the importance of personal contributions to team performance and strategy [
3].
The traditional fitness–fatigue models are mathematical frameworks used in sports science to predict and explain how training impacts performance over time and are improved by using ML [
4]. This is done by integrating physiological data and multivariate algorithms, which advance performance predictions and provide a deeper understanding of athlete potential [
5].
Professional sports performance and strategic planning have greatly improved by incorporating ML techniques into sports analytics, allowing for more accurate forecasts and data-driven decision-making. ML is increasingly used to improve the team’s training to prevent injuries and monitor performance in sports analytics [
6,
7].
ML can conduct video analysis for classifying strength and conditioning exercises, which helps achieve high accuracy with less engineering than deep learning. By achieving remarkable accuracy in predicting ski jump lengths, ML models have shown that there is a new opportunity for enhanced sports broadcasting and real-time performance analysis [
8]. The heterogeneity in performance curves is formed by variability in basketball player performance measurements. This causes prediction challenges, requiring sophisticated methods to balance curve smoothness with covariate integration [
9]. For the modeling of performance indicators’ joint distribution, Bayesian networks are crucial. This provides deeper insights into factors influencing game outcomes and performance dynamics [
10]. There is a significant evolution from traditional metrics by evaluating defensive performance through technologies such as optical player-tracking systems, which offer a more comprehensive assessment of player actions [
11,
12].
The more holistic approach to player evaluation is achieved by understanding the physiological factors impacting performance: body composition and nutrient intake [
13]. They provide detailed insights into players’ abilities and playing styles, informing coaching; player utilization strategies can be found by utilizing predictive models based on NBA tracking data [
14,
15]. The tailoring of game plans to player profiles is aided by discovering latent heterogeneity in shot selection through Bayesian nonparametric clustering [
16]. Beyond sports, cognitive constraint modeling enables the optimization of digital and physical environments for improved human performance. This method of player evaluation and prediction highlights the increasing complexity and influence of sports analytics in improving professional sports operations and strategy [
17].
In professional basketball, evaluating player performance is pivotal for optimizing team strategies and decision-making. Predictive analytics has emerged as a transformative tool in sports, enabling data-driven insights into key performance metrics such as rebounds per game (REB). One of the most crucial aspects of the basketball game is rebounding. A team gains possession of the basketball when a rebound is made, and the rebound is given to the player who catches the ball later on. The more rebounds there are, the higher the ball’s possession. And each possession benefits the team’s defense and offense, ultimately leading to a victory.
Despite its potential, traditional parametric models like Linear Regression are limited by their inability to account for nonlinear relationships and complex interactions inherent in player performance data [
1]. On the other hand, advanced ML methods, such as Gradient Boosting Regressor, offer greater flexibility in capturing these dynamics while maintaining high predictive accuracy [
8]. This research focuses on addressing the trade-offs between model simplicity and accuracy by exploring the effectiveness of parametric and nonparametric models in predicting REB. By leveraging techniques such as hyperparameter tuning and feature engineering, this study seeks to improve predictive accuracy and identify the factors that significantly influence performance outcomes [
8]. Furthermore, it evaluates how these models can balance interpretability and scalability, ensuring their applicability in real-world decision-making scenarios [
3,
8]. Through robust ML methodologies, this study aims to contribute actionable insights to the growing field of sports analytics. Previous studies have demonstrated the potential of integrating ML techniques into sports analytics to optimize player utilization and enhance decision-making strategies [
1,
8].
The formulated hypothesis and related research questions of the study are as follows:
Hypothesis 1. Advanced feature engineering, data preprocessing, and hyperparameter tuning significantly increase the ML models’ ability to predict future average rebounds per game (REB) for NBA players.
The research questions to investigate the hypothesis are:
What is the prediction accuracy level of parametric and non-parametric ML models when predicting REB for NBA players?
This question aims to evaluate and compare how effectively different types of ML models—both parametric and non-parametric—can predict average REB. By analyzing model performance using evaluation metrics such as , MSE, and RMSE, we identify which models best capture the patterns in basketball performance data.
What type of model, parametric or nonparametric, benefits most from optimization by tuning hyperparameters to predict REB for NBA players?
This question investigates the impact of hyperparameter tuning on model performance. Specifically, it examines whether parametric or non-parametric models show greater improvements in prediction accuracy when optimized, providing insights into the models’ adaptability and potential for enhancement in real-world applications.
The major contributions of this paper are as follows:
We conducted a comparative analysis of various parametric and non-parametric machine learning models to predict NBA player performance, with a focus on rebounds (REB).
We applied hyperparameter tuning techniques to optimize model performance, demonstrating that proper tuning significantly improves prediction accuracy in sports analytics.
The rest of the paper is structured as follows:
Section 2 discusses the relevant literature in the analysis of predictive modeling techniques and optimization of predictive accuracy.
Section 3 discusses the study’s methodology, including dataset details, data preparation, exploratory data analysis, feature selection, and ML models. In
Section 4, the results of the model training and evaluation are presented.
Section 5 discusses our research hypothesis and research questions.
Section 6 concludes the study with future guidance.
4. Results
The results obtained after training and evaluating the ML models are discussed in this section. The dataset was split into training and testing sets using an 80/20 ratio. The selected models were trained on the training set, and their predictive accuracy was found using a test set.
4.1. Performance Matrix
The model evaluation is carried out by using several performance metrics, such as mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R2) score. These metrics offer a thorough evaluation of the performance of every model. The model precision is calculated by MSE and RMSE, which measures the average squared and absolute differences between the predicted and actual values, respectively. The MAE offered an average absolute error measure that provides the prediction error’s magnitude. The R2 score indicated the proportion of variance in the target variable explained by the model.
4.2. Model Training and Evaluation
The trained model was evaluated on the test dataset.
Table 4 provides the Comparison of Regression Models.
Table 4 summarizes our experimental results for predicting REB. Notably, Linear Regression (MSE = 0.6133, RMSE = 0.7831, MAE = 0.5834,
= 0.8668) and Gradient Boosting Regressor (MSE = 0.6282, RMSE = 0.7926, MAE = 0.5896,
= 0.8636) are the best-performing models, exhibiting lower error metrics and higher
scores. In comparison, while the RF Regressor shows competitive performance (
= 0.8564), the K-Neighbors Regressor (
= 0.5274) and MLP Regressor (
= 0.8244) yield a higher number of errors, suggesting that more straightforward linear approaches or boosting methods may be more appropriate for this prediction task.
Initial evaluations showed that Linear Regression and Gradient Boosting Regressor performed well compared to other models, with high scores and low error metrics.
4.3. Hyperparameter Tuning with Grid Search
This section will discuss the details of hyperparameter tuning with a grid search to fine-tune the hyperparameter. Hyperparameter tuning optimizes the ML models and enhances their performance. This is done for the best-performing models: Linear Regression and Gradient Boosting Regressor. For hyperparameter tuning, 5-fold cross-validation was performed via GridSearchCV. For every set of hyperparameters, multiple models were trained as part of the Grid Search process, and their performance was assessed using the validation set’s R-squared (
) score. This provides a robust understanding of how different hyperparameters influence model performance. The hyper-parameters that minimized the MSE, RMSE, and MAE and maximized the
score were chosen as shown in
Table 5.
The grid search for optimal hyperparameters for Linear Regression (‘fit_intercept = False’) suggests that the dataset is well-preprocessed, eliminating the need for an intercept term, likely due to feature scaling or centering. For the XGB Regressor, the selected parameters (‘learning_rate = 0.1’, ‘max_depth = 3’, ‘n_estimators = 300’) indicate a balance between learning complexity and generalization. A moderate learning rate prevents overfitting while ensuring convergence, a tree depth of 3 captures essential feature interactions without excessive complexity, and 300 estimators provide sufficient iterations for robust predictions.
Table 6 provides a comparison of regression models after hyperparameter tuning. It is found that the best model configurations for Linear Regression are the optimal settings of fit_intercept and normalize and for Gradient Boosting Regressor, the optimal settings of n_estimators, learning_rate, and max_depth. These tuned models were then evaluated on the test set. It can be found that the model performance has increased compared to not-tuned models. The final models were made accurate and generalized by carefully adjusting the hyperparameters, meaning these approaches can be utilized in real-world scenarios effectively.
5. Discussion
Our study investigated how well different parametric and nonparametric ML models could predict REB for NBA players. The prior studies discussed in the literature review align with our research, which supports the effectiveness of nonparametric models after hyperparameter tuning in capturing complex patterns that influence REB statistics [
20].
Our findings are built on these concepts by measuring the predictive accuracy through various metrics, including R², MSE, RMSE, and MAE. In contrast, R² measures the proportion of variance explained by the model. Metrics such as MSE and RMSE quantify the average squared and absolute differences between predicted and actual values, offering insights into the magnitude of prediction errors. By combining these metrics, we ensure a balanced evaluation of the model’s ability to capture patterns in the data while minimizing error
RQ1. What is the prediction accuracy level of parametric and non-parametric ML models when it comes to the prediction of REB for NBA players?
It was found that, among all nonparametric models, the Gradient Boosting Regressor and RF Regressor are best for predicting REB. The Gradient Boosting Regressor achieved an score of 0.8636, and the RF Regressor achieved an score of 0.8564. Interestingly, a parametric model, such as Linear Regression, achieved a slightly higher value of 0.8668, showing the ability to capture REB variance despite the model’s simplicity effectively. These results underscore the fact that nonparametric models are typically more flexible in handling nonlinear relationships. However, parametric models can still perform well, depending on the nature of the data and the features used.
RQ2. What type of model, parametric or nonparametric, benefits most from optimizing hyperparameters to predict REB for NBA players?
The prediction accuracy of the ML models can be enhanced by hyperparameter tuning. It was found that, among all ML models, the ability of the Gradient Boosting Regressor in terms of
value increased from 0.8636 to 0.8749 after optimization. This underscores the importance of model-specific fine-tuning, which was very beneficial for non-parametric ML models. The findings underscore how parameter adjustment can mitigate overfitting and underfitting, improving the model’s predictive accuracy [
31].
Our study employs “GridSearchCV” to perform a systematic and exhaustive search over predefined hyperparameter values, optimizing key hyperparameters such as ‘n_estimators’ and ‘learning_rate’ for the Gradient Boosting Regressor to enhance predictive performance. The parameter grid was carefully tailored to balance computational efficiency with model accuracy. This methodological innovation allows our models to predict outcomes with higher accuracy and can be utilized in real-world scenarios effectively. In our study, we compared the effectiveness of both parametric and nonparametric models in predicting NBA players’ performance, such as REB.
The parametric Linear Regression model demonstrated a strong ability to predict outcomes with an score of 0.8668. This suggests that it can explain approximately 86.68% of the variance in REB from the variables used. The non-parametric XGB Regressor showed a lower initial value of 0.8636 but benefited from hyperparameter tuning, which improved its value to 0.8749.
Our study also deeply explains model applicability in sports in real time. The most precise of our predictive models was the Gradient Boosting Regressor with an
value of 0.8749, which can help teams to make well-informed strategic decisions like optimizing player rotations and game tactics, which in turn minimize injury risks by more accurately predicting player fatigue levels [
5]. In addition, predictive models can help create personalized training programs based on players’ performance metrics. This helps to maximize player performance and overall team efficiency [
6].
Together, the findings from RQ1 and RQ2 support our hypothesis: Advanced feature engineering, data preprocessing, and hyperparameter tuning significantly increase the ML models’ ability to predict future average rebounds per game (REB) for NBA players. Our work demonstrates that while parametric models can perform well with proper data design, non-parametric models—especially when tuned—offer superior flexibility and predictive power in complex sports data environments.
6. Conclusions and Future Work
In this research, we explored the effectiveness of predictive analytics through various regression models to predict player performance, such as REB. Our findings indicate that predictive analytics can enhance basketball team strategies by providing data-driven insights. Two models proved most effective after hyperparameter tuning: Gradient Boosting and Linear Regression models. Among these two models, the non-parametric Gradient Boosting model performed well.
The major limitation observed is that even after fine-tuning the hyperparameters, the performance of different models did not increase significantly. This highlights the need for a larger dataset to improve the model’s performance and better generalization in predictive analysis.
Due to the low availability of the dataset, the ML models have been utilized in this study. Nevertheless, deep learning (DL) techniques can also be explored in larger datasets, which will help identify more complex patterns in future work. A comparative study between DL and ML models can be examined to better understand their implementation. Our analysis is mainly focused on predicting REBs; other features, such as the prediction of the number of years played, can be explored. Our study underscores the ability of ML to transform the sports field by offering a dynamic approach to player performance analysis. By accurately predicting REB, teams can optimize training, develop game strategies, and effectively utilize the game’s players, enhancing team performance. As these approaches become more advanced and prevalent, a new age of data-driven sports management could emerge, reshaping the norms of operations and competition in the NBA and other sports leagues.