Improving Sports Outcome Prediction Process Using Integrating Adaptive Weighted Features and Machine Learning Techniques

: Developing an effective sports performance analysis process is an attractive issue in sports team management. This study proposed an improved sports outcome prediction process by integrating adaptive weighted features and machine learning algorithms for basketball game score prediction. The feature engineering method is used to construct designed features based on game-lag information and adaptive weighting of variables in the proposed prediction process. These designed features are then applied to the ﬁve machine learning methods, including classiﬁcation and regression trees (CART), random forest (RF), stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and extreme learning machine (ELM) for constructing effective prediction models. The empirical results from National Basketball Association (NBA) data revealed that the proposed sports outcome prediction process could generate a promising prediction result compared to the competing models without adaptive weighting features. Our results also showed that the machine learning models with four game-lags information and adaptive weighting of power could generate better prediction performance.


Introduction
The sports market has proliferated at the beginning of the 21st century with new approaches and techniques such as streaming broadcasting, social media, and the global supply chain. With the growth of sports market value, sports team management attracted considerable attention and became a popular topic from researchers. For example, some studies established decision support methods to assist the teams' decision of rookie draft [1,2]. Some research aimed to analyze the teams' player selection and scouting policy [3,4]. Moreover, developing an effective sports performance analysis process such as determining the factors that affect the results of sports games [2,5] and predicting games or players' performance [6,7] is one of the most attractive issues in sports teams' management.
Sports outcome prediction is a significant part of the sports performance analysis process. It influences many sports markets, such as sports team management, betting, and customer service, since the accurate prediction of sports game outcome provides detailed information for managing strategy reference, bookmakers, and attraction for viewers. For example, a team supervisor, such as a general manager (GM), is facing numerous vital decisions on the daily management of the team. The accurate prediction of future games provides the reference of marketing strategy to design specific fans' service activities. Moreover, detail and accurate prediction provide GM and head coach vital information Processes 2021, 9,1563 3 of 16 amount of variables and observations by combining several randomized decision tress and aggregates their predictions by averaging [50][51][52][53][54][55][56][57][58][59][60]. SGB is a hybrid method that merging boosting and bagging techniques, and the input data are selected by random sampling at each stage of the steepest gradient algorithm-based boosting procedure [24,45,52,[61][62][63][64][65][66][67]. XGBoost is a supervised machine learning algorithm developed from a scalable end-to-end gradient tree boosting principle [24,[68][69][70][71][72][73]. ELM is a single-hidden-layer feedforward neural network that randomly selects the input weights and systematically calculates the output weights of the network [24,[73][74][75][76][77].
In the proposed NBA game score prediction process, the most acceptable and accessible statistics of an NBA game are collected and used as initial variables. Then, using these variables as input, the feature engineering method is used to construct the designed feature based on the interaction of game-lag information and adaptive weighting of variables. These designed features are then applied to the five machine learning methods, i.e., CART, RF, SGB, XGBoost, and ELM, to build prediction models. Finally, by evaluating the performance of prediction models, the sufficient selection of game-lag information, and proper adaptive weighting of variables combination on features were identified.
This research is organized as follows: Section 1 presents the background and the concept of this study. Section 2 shows the brief introduction of machine learning techniques that were used in this research. Section 3 demonstrates the details of the outcome of the proposed sports prediction process. Section 4 gives the proposed sports outcome prediction process's empirical results, followed by the Conclusions section.

Methodology
Five machine learning techniques involving CART, RF, SGB, XGBoost, and ELM are utilized in this study.
CART is a statistical procedure often used as a regression tool to analyze continuous data. CART can be concluded into three stages. The first stage involves developing the tree using a recursive partitioning technique to filter variables and split data points using a splitting criterion. After a large tree is produced, the second stage uses a pruning procedure that coordinates with a minimal cost complexity measure. The methodology's last stage is identifying the optimal tree by corresponding with a tree yielding the lowest crossvalidated or testing set error rate [42][43][44][45][46][47][48][49]. For modeling the CART model, the "rpart" R package of version 4.1-15 [78] with pruning strategy was used.
RF has provided promising performance as a general-purpose regression method. By combining several randomized decision trees and aggregates their predictions by averaging, the approach of RF has shown excellent performance in the dataset with large amount of variables and observations. It is also flexible enough to be implemented to large-scale task, is conveniently adapted to various ad hoc learning problems, and returns measures of variable importance [50][51][52][53][54][55][56][57][58][59][60]. The "randomForest" R package of version 4.6-14 [79] was used to construct the RF model. SGB is a hybrid algorithm that merging boosting and bagging techniques. Data are filtered by random sampling at each phase of the steepest gradient method-based boosting process. SGB develops smaller trees at each stage of the boosting procedure instead of developing a full regression tree. Optimal data fractionation is computed by coordinate with a consequential process, and the residual of every fraction is obtained. The results are comprised to lower the sensitivity of these methods for target data [24,45,52,[61][62][63][64][65][66][67]. The SGB model was constructed by the "gbm" R package of version 2.1.8 [80].
XGBoost commonly uses a tree-based supervised machine learning technique developed from a scalable end-to-end gradient tree boosting principle. A weak learner is developed to be optimistically correlated with the negative gradient of the loss function and refers to the entire model. XGBoost is a generalized gradient boosting decision tree applied by a new tree scanning method that reduces tree building time. XGBoost moderated overfitting to support arbitrary adaptable loss functions by regularization term [24,[68][69][70][71][72][73]. The "xgboost" R package of version 1.3.2.1 [81] is used to construct the XGBoost model. ELM is a single-hidden-layer feedforward neural network that randomly indicates the weighting of input and systematically calculates that weighting of the network's output. ELM has a quicker model building time compared to the traditional feedforward network machine learning methods. ELM reduces common disadvantages found in gradient-based techniques [24,[73][74][75][76][77]. The ELM model was constructed by the "elmNN" package version 1.0 [82]. The default used activation function in this package is radial basis.
All R packages were implemented in R software of version 3.6.2 [83]. The default setting of modeling strategy of each package is used. To find the best hyper-parameter set for building an effective prediction model, "caret" R package of version 6.0.84 [84] was implemented for tuning the important hyper-parameters of each machine learning algorithm.

Proposed Sports Outcome Prediction Process
The adaptive weighting information was integrated into the feature design process in the proposed sports prediction process outcome. Then, the five machine learning algorithms are implemented to predict the final score of the NBA game using the designed features. The flowchart of the proposed process is shown in Figure 1. refers to the entire model. XGBoost is a generalized gradient boosting decision tree applied by a new tree scanning method that reduces tree building time. XGBoost moderated overfitting to support arbitrary adaptable loss functions by regularization term [24,[68][69][70][71][72][73].
The "xgboost" R package of version 1.3.2.1 [81] is used to construct the XGBoost model. ELM is a single-hidden-layer feedforward neural network that randomly indicates the weighting of input and systematically calculates that weighting of the network's output. ELM has a quicker model building time compared to the traditional feedforward network machine learning methods. ELM reduces common disadvantages found in gradientbased techniques [24,[73][74][75][76][77]. The ELM model was constructed by the "elmNN"package version 1.0 [82]. The default used activation function in this package is radial basis.
All R packages were implemented in R software of version 3.6.2 [83]. The default setting of modeling strategy of each package is used. To find the best hyper-parameter set for building an effective prediction model, "caret" R package of version 6.0.84 [84] was implemented for tuning the important hyper-parameters of each machine learning algorithm.

Proposed Sports Outcome Prediction Process
The adaptive weighting information was integrated into the feature design process in the proposed sports prediction process outcome. Then, the five machine learning algorithms are implemented to predict the final score of the NBA game using the designed features. The flowchart of the proposed process is shown in Figure 1. Step 1: Data Collection The first step is data collection. We acquired data from the basketball-reference website (https://www.basketball-reference.com, accessed on 15 March 2021) for every NBA 2018-2019 regular-season games. This NBA season consists of 1230 regular-season games, Step 1: Data Collection The first step is data collection. We acquired data from the basketball-reference website (https://www.basketball-reference.com, accessed on 15 March 2021) for every NBA 2018-2019 regular-season games. This NBA season consists of 1230 regular-season games, and each game has two teams (home team and away team) on the court. Each game presents two data points, one from home and another from the away team. Therefore, 2460 data points were collected and used in this study.
A total of 15 variables are collected and utilized in this research. One is the team's final game score, and the remaining 14 are the most widely used statistics of the basketball game, including the team's defensive and offensive performance. Table 1 presents variable  Table 1. Variable Description.

Variables
Abbreviation Description 3-Point Field Goal Percentage of a team in t th game X 5,t FTA Free Throw Attempts of a team in t th game X 6,t FT% Free Throw Percentage of a team in t th game X 7,t ORB Offensive Rebounds of a team in t th game X 8,t DRB Defensive Rebounds of a team in t th game X 9,t AST Assists of a team in t th game X 10,t STL Steals of a team in t th game X 11,t BLK Blocks of a team in t th game X 12,t TOV Turnovers of a team in t th game X 13,t PF Personal Fouls of a team in t th game X 14,t H/A Home or Away game of a team in t th game Y t Score Team Score of a team in t th game Step 2: Data Normalization Data normalization is implemented before feature construction since each variable has different scales. The min-max normalization technique is used to transform a value v of variable V to v in the range [0, 1] by using the following equation as a calculation: where maxX i and minX i are the maximum and minimum values for the attribute X i . Data normalization is implemented to ensure that larger input variable values will not influence smaller input variables values to reduce prediction errors.
Step 3: Feature Design based on Adaptive Weighting This step aims to integrate adaptive weighting techniques into feature construction. We design our features for the prediction models based on the normalized variables shown in Table 1. This feature construction proceeds based on the interaction of two dimensions: game-lag information and the exponential power of adaptive weighting.
First, we define the game-lag information as "the n th game before game t". Researchers utilized the single game-lag information of up to six games for model construction in recent related research [14][15][16][17][18][19][20][21]. This research considers three to six game-lag information instead of considering only single game-lag information. Moreover, this research considers the moving average of the past statistic of a basketball game in order to be sufficient for evaluating a team's performance. We calculate the mean value of a normalized variable within l game-lags to evaluate a team's performance during a specific duration.
Second, it is crucial to understand that the nearest data point theoretically has more influence on the prediction unknown target. Therefore, this research designed the adaptive weighting method based on inverse-distance weighting (IDW) by integrating exponential power d as a weighting control parameter. The designed feature X l,d i,t can be described as follow: Processes 2021, 9, 1563 6 of 16 and the adaptive weighting, AW l n,d , is calculated as follow: i,t is the designed i th predictor variable at the t th game with l game-lags based on d exponential power of adaptive weighting.
For instance, for the first normalized variable (i = 1), if we wish to design a feature considering four game-lag's information (l = 3) with weighting control parameter d = 1 for the game No. 25 (or 25th game, t = 25) of a team, the values of the first variable in the previous three games are calculated as the designed feature. That is 16. This instance can be found in Figure 2 as the line of d = 1. It can be observed in Figure 2 that the weighting distribution on the line of d = 0 will be equal with all the variables, which represents the simple average method. The weighting distribution of d = 0 is demonstrated     Figure 4 illustrates the weighting distribution of four feature set under l = 5. Figure 5 shows the weighting distribution of four feature set under l = 6. Note that with the higher value of selected game-lag information, the nearest data point (t-1) weighting is not necessarily growing with it.        Step 4: Prediction Model Construction In this phase, we construct a prediction model for predicting the final scores of the NBA games by using 14 designed features with five machine learning methods, namely CART, RF, SGB, XGBoost, and ELM. According to step3, each variable can be extended to 16 features which is the combination of four game-lags and four weighting control parameters. For modeling a machine learning prediction model, the 14 designed variables with Most recent research using previous basketball game's statistics to predict the outcome of basketball games construct the feature by using the simple average method [14][15][16][17][18][19][20][21] Processes 2021, 9, 1563 8 of 16 which is equal to setting weighting control parameter (d) = 0 in Equations (2) and (3). The weighting distribution of adaptive weighting method (d = 1, d = 2, and d = 3) allocates the most weighting on the nearest data point, i.e., the data point on t-1. The weighting of t-1 is growing with the higher of a weighting control parameter. On the other hand, the weighting of the farthest data point is allocated with the lowest weighting. The margin between the highest weighting and lowest weighting is increased with the higher weighting control parameter.
Step 4: Prediction Model Construction In this phase, we construct a prediction model for predicting the final scores of the NBA games by using 14 designed features with five machine learning methods, namely CART, RF, SGB, XGBoost, and ELM. According to step3, each variable can be extended to 16 features which is the combination of four game-lags and four weighting control parameters. For modeling a machine learning prediction model, the 14 designed variables with a specific game-lag and weighting control parameter are used as predictor variables. That is, for a machine learning method, 16 prediction models will be can be generated for evaluating the effects of different game-lags and different weighting control parameters based on prediction performance.
That is, using the designed features with a specific game-lag and weighting control parameter (X l,d i,t ) to predict the final score of a game (Y t ) can be expressed as the following equation: Note that all 14 designed features (1 ≤ i ≤ 14) were used with three to six game-lags' information (3 ≤ l ≤ 6) and with zero to three of the weighting control parameters (0 ≤ d ≤ 3) for each Y t . As aforementioned, this research compares the prediction performance with different weighting control parameters under a different selection of game-lag information. Since we use up to six games' information as our game-lag information, the season's first six games are skipped (7 ≤ t ≤ 82).
Step 5: Performance Evaluation This research aims to discover the influence of patterns from previous games to the target game and is a cross-sectional analysis. Therefore, this study utilized the cross-validation method which is commonly used in sports outcome prediction [85][86][87] to estimate the performance of the proposed prediction process. Moreover, according to [13], cross-validation method can be a better validation method in sports outcome prediction. We replicate 10-fold cross-validation 100 times. This study used the root mean square error (RMSE) as the indicator to evaluate the prediction performance of each method as it is one of the most commonly used prediction performance indicators [88,89] and has been used as a standard statistical metric to measure model performance in meteorology, air quality, and climate researches [90][91][92]. Many sports outcome-prediction research used only RMSE as the prediction performance indicator [93][94][95]. Therefore, this research use RMSE as prediction performance indicator. It is calculated as Equation (6): where n is the sample size, and e i represents the error of predictions.
Step 6: Final Results In the final phase, after the performance evaluation of the prediction models, the best production process can be obtained. Based on the best prediction process, we shall determine the sufficient selection of game-lag information and practical preference of exponential power on adaptive weighting.

Empirical Results
In this research, each NBA game's statistics of both competing teams in the 2018-2019 regular season were collected to evaluate the performance of the proposed basketball game score prediction process. Figure 6 shows the prediction performance of each prediction model with different weighting control parameters (d) under game-lag (l) = 3. It can be observed that prediction performance varies from different weighting control parameters. Each prediction model reach its best prediction performance with weighting control parameter(d) where n is the sample size, and represents the error of predictions.
Step 6: Final Results In the final phase, after the performance evaluation of the prediction models, the best production process can be obtained. Based on the best prediction process, we shall determine the sufficient selection of game-lag information and practical preference of exponential power on adaptive weighting.

Empirical Results
In this research, each NBA game's statistics of both competing teams in the 2018-2019 regular season were collected to evaluate the performance of the proposed basketball game score prediction process. Figure 6 shows the prediction performance of each prediction model with different weighting control parameters (d) under game-lag (l) = 3. It can be observed that prediction performance varies from different weighting control parameters. Each prediction model reach its best prediction performance with weighting control parameter(d)          Figure 8 presents the prediction performance of prediction models with different weighting control parameters (d) under game-lag (l) = 5. Every prediction model reaches its best prediction performance with weighting control parameter (d) = 1, such as CART (RMSE = 12.4316), RF (RMSE = 12.1525), SGB (RMSE = 12.2448), XGBoost (RMSE = 12.3145), and ELM (RMSE = 12.6748). Note that the prediction performance became worse with the increase of weighting control parameters from 1 to 3.     Table 2 summarizes the mean and standard deviation (SD) of RMSE of each machine learning method with different weighting control parameters (d) under a different scenario of game-lag information (l). It can be observed that every model reaches their best prediction performance with weighting control parameter value of level one (d = 1) under the scenario of selecting four as game-lag information (l = 4), such as CART (RMSE = 11.7564), RF (RMSE = 11.6303), SGB (RMSE = 11.5586), XGBoost (RMSE = 11.6941), and ELM (RMSE = 11.8020). Compared to the methods with a simple average weighting feature (d = 0), the models with a weighting control parameter value of level one (d = 1) provide promising effects on the improvement of prediction performance. Note that with the higher value of the weighting control parameter, the prediction performance is not necessarily encouraged.  Table 2 summarizes the mean and standard deviation (SD) of RMSE of each machine learning method with different weighting control parameters (d) under a different scenario of game-lag information (l). It can be observed that every model reaches their best prediction performance with weighting control parameter value of level one (d = 1) under the scenario of selecting four as game-lag information (l = 4), such as CART (RMSE = 11.7564), RF (RMSE = 11.6303), SGB (RMSE = 11.5586), XGBoost (RMSE = 11.6941), and ELM (RMSE = 11.8020). Compared to the methods with a simple average weighting feature Processes 2021, 9, 1563 11 of 16 (d = 0), the models with a weighting control parameter value of level one (d = 1) provide promising effects on the improvement of prediction performance. Note that with the higher value of the weighting control parameter, the prediction performance is not necessarily encouraged. However, it also can be observed in Table 2 that the difference between models' results is relatively small. The confident interval (CI) is calculated in order to determine whether the difference between models and feature combinations is significant. Table 3 shows the confident intervals of each machine learning method with different weighting control parameters and different game-lag information. It reveals that the prediction performance of weighting control parameter of one and a game-lag of four significantly outperformed other feature combinations used as an input predictor for the five machine learning algorithms. It also can be observed from Tables 2 and 3 that although the SGB with d = 1 and l = 4 slightly outperforms the competing methods in this feature combination, the difference is not statistically significant. Therefore, these five machine learning algorithms provide a promising prediction performance by using designed features with a weighting control parameter of one and game-lag of four as predictor variables.
Moreover, from Tables 2 and 3, it can be seen that the models' prediction performance with weighting control parameter of two and three are not as good as a model with a weighting control parameter of one. The potential cause of this circumstance can be explained as, by observing Figures 2-5, the weighting of each data point is linearly declining with a weighting control parameter of one while the weighting of each data point is nonlinearly declining with a weighting control parameter of two and three. The weighting distribution curve with a linear decline represents a team's performance over the last few games are linearly and stable, referable to the target game t. Contrarily, for the weighting distribution curve with a non-linear decline, a team's performance over the last few games is unstable and is relatively not referable. That is, the influence of the performance in nearer games are enhanced and subsequently too high, while the performance in farther games are declining too fast. However, since NBA is a professional team sport, coach and players are well-trained to adapt themselves by substitute the cooling players with hot-hand players, using time-outs to adjust their condition, or provide and receive assistance with teammates to cover the unstable performance. Therefore, the weighting distribution with a linear decline with a farther data point which represents the stable team performance is a more appropriate distribution for the prediction on outcome of team sports.

Conclusions
This research integrated the designed features using the adaptive weighted method and CART, RF, SGB, XGBoost, and ELM machine learning methods for constructing an effective sports outcome prediction process. The designed features were based on the interaction of three to six pieces of game-lags information, and four different levels of the adaptive weighting of variables were generated. This study collected data from all the regular-season games of the NBA 2018-2019 seasons as illustrative examples. Empirical results showed that the proposed sports outcome prediction could generate a promising prediction result compared to the competing models without adaptive weighting features. All the five machine learning methods reach their best prediction performance with the weighting control parameter value of level one and four pieces of game-lags information. Although SGB in this feature combination slightly outperforms the competing methods, the difference is not statistically significant. Therefore, these five machine learning algorithms provide promising prediction performance by using feature combination with weighting control parameter of one and game-lag of four to generate the input features.
Integrating the machine learning methods with adaptive weighting based on feature selection and combination strategies to generate an improved version of the proposed scheme is worthy further investigated since the exploration of feature selection results is an important task in the implementation of machine learning algorithms in sports outcome prediction [9]. Thus, modifying the proposed scheme to adapt to feature selection techniques could be one of the future research directions. Moreover, exploring the performance of the proposed scheme with more NBA seasons' data could be also a future research direction.