Silver Price Forecasting Using Extreme Gradient Boosting (XGBoost) Method

: This article presents a study on forecasting silver prices using the extreme gradient boosting (XGBoost) machine learning method with hyperparameter tuning. Silver, a valuable precious metal used in various industries and medicine, experiences signiﬁcant price ﬂuctuations. XGBoost, known for its computational efﬁciency and parallel processing capabilities, proves suitable for predicting silver prices. The research focuses on identifying optimal hyperparameter combinations to improve model performance. The study forecasts silver prices for the next six days, evaluating models based on mean absolute percentage error (MAPE) and root mean square error (RMSE). Model A (the best model based on MAPE value) suggests silver prices decline on the ﬁrst and second days, rise on the third, decline again on the fourth, and stabilize with an increase on the ﬁfth and sixth days. Model A achieves a MAPE of 5.98% and an RMSE of 1.6998, utilizing speciﬁc hyperparameters. Conversely, model B (the best model based on RMSE value) indicates a price decrease until the third day, followed by an upward trend until the sixth day. Model B achieves a MAPE of 6.06% and an RMSE of 1.6967, employing distinct hyperparameters. The study also compared the proposed models with several other ensemble models (CatBoost and random forest). The model comparison was carried out by incorporating 2 additional metrics (MAE and SI), and it was found that the proposed models exhibited the best performance. These ﬁndings provide valuable insights for forecasting silver prices using XGBoost.


Introduction
Silver, denoted by the symbol Ag and originating from the Latin term 'argentum', stands as a metallic element with an atomic number of 47. Its distinct properties and traits render it a sought-after resource across diverse industries. Renowned for its remarkable electrical and thermal conductivity, silver frequently assumes a crucial role in the production of electronic devices within the manufacturing sector [1]. Furthermore, within the realm of medicine, the utilization of silver nanoparticles has wielded a substantial influence on the progression of treatments in the past few decades [2]. Silver's antimicrobial properties empower the application of silver nanoparticles as coatings for medical instruments and treatments. Beyond this, silver assumes a pivotal function in the realm of solar energy capture, with a standard solar panel necessitating around 20 g of silver for its production [3].
Year by year, the manufacturing of solar panels demonstrates a consistent rise, driving an escalating need for silver. Additionally, classified as a precious metal, silver maintains a relatively high value and demand in the market [4].
Precious metals such as gold, silver, and platinum have correlations that influence their respective prices [5]. The price correlation among these precious metals can be influenced by various factors, including politics, the value of the US dollar, market demand, and others [6].
Based on the background presented, this study aims to forecast silver prices using the XGBoost method, similar to the approach employed by Jabeur et al. [7], but with the addition of hyperparameter tuning using grid search. The novelty offered in this research focuses on the hyperparameter tuning process before grid search. Generally, a random value is selected for each hyperparameter for grid search, but this research proposes to perform hyperparameter tuning for each hyperparameter first by plotting the MAPE and RMSE evaluation values to determine the value to be selected. The study incorporates gold and platinum prices, as well as the euro-to-dollar exchange rate, as additional variables. To attain the best XGBoost model, the performance of the model was evaluated using the MAPE and RMSE. In addition, this research also compares models by adding two evaluation metrics, namely MAE and SI, to obtain a more comprehensive conclusion.

Data Collection
This research aims to analyze a time series dataset encompassing silver prices, as well as supplementary variables including gold prices, platinum prices, and the dollar-to-euro exchange rate. This analysis serves as a foundation for conducting forecasting, utilizing a comprehensive dataset comprising 2566 data points. The dataset comprises a daily timeframe obtained from investing.com [10], covering the period from 20 February 2013 to 20 February 2023. Notably, the prices of silver, gold, and platinum are denominated in USD per troy ounce. Captured as a daily time series, the data span a decade.
This dataset is partitioned into training and testing subsets, accounting for 80% and 20%, respectively. The training data encompasses the period from 20 February 2013 to 11 February 2021, while the testing data spans from 12 February 2021 to 20 February 2023. Employing the Python programming language, the data are randomly divided, resulting in 2052 data points for training and 514 for testing. Table 1 presents the descriptive statistics of the time series data used. Figure 1 displays data visualization in graphical form. Figure 2 presents the correlation matrix between variables. Based on the background presented, this study aims to forecast silver prices using the XGBoost method, similar to the approach employed by Jabeur et al. [7], but with the addition of hyperparameter tuning using grid search. The novelty offered in this research focuses on the hyperparameter tuning process before grid search. Generally, a random value is selected for each hyperparameter for grid search, but this research proposes to perform hyperparameter tuning for each hyperparameter first by plotting the MAPE and RMSE evaluation values to determine the value to be selected. The study incorporates gold and platinum prices, as well as the euro-to-dollar exchange rate, as additional variables. To attain the best XGBoost model, the performance of the model was evaluated using the MAPE and RMSE. In addition, this research also compares models by adding two evaluation metrics, namely MAE and SI, to obtain a more comprehensive conclusion.

Data Collection
This research aims to analyze a time series dataset encompassing silver prices, as well as supplementary variables including gold prices, platinum prices, and the dollar-to-euro exchange rate. This analysis serves as a foundation for conducting forecasting, utilizing a comprehensive dataset comprising 2566 data points. The dataset comprises a daily timeframe obtained from investing.com [10], covering the period from 20 February 2013 to 20 February 2023. Notably, the prices of silver, gold, and platinum are denominated in USD per troy ounce. Captured as a daily time series, the data span a decade.
This dataset is partitioned into training and testing subsets, accounting for 80% and 20%, respectively. The training data encompasses the period from 20 February 2013 to 11 February 2021, while the testing data spans from 12 February 2021 to 20 February 2023. Employing the Python programming language, the data are randomly divided, resulting in 2052 data points for training and 514 for testing. Table 1 presents the descriptive statistics of the time series data used. Figure 1 displays data visualization in graphical form. Figure 2 presents the correlation matrix between variables.

Random Forest
Random forest (RF) is an ensemble learning algorithm commonly used to perform classification or regression processes. RF models are often used as base models to assess the performance of more complex models and are known for their good performance in performing a variety of tasks. The RF method offers better generalizations and valid estimates because it includes random sampling and improved properties of techniques in

Random Forest
Random forest (RF) is an ensemble learning algorithm commonly used to perform classification or regression processes. RF models are often used as base models to assess the performance of more complex models and are known for their good performance in performing a variety of tasks. The RF method offers better generalizations and valid estimates because it includes random sampling and improved properties of techniques in

Random Forest
Random forest (RF) is an ensemble learning algorithm commonly used to perform classification or regression processes. RF models are often used as base models to assess the performance of more complex models and are known for their good performance in performing a variety of tasks. The RF method offers better generalizations and valid estimates because it includes random sampling and improved properties of techniques in ensemble methods [21]. The predicted value, Y t , in the RF algorithm can be expressed as follows: where l k (x) is a set of k-th learner random tree learners and T is the number of samples/tree.

CatBoost
CatBoost or "Categorical Boosting" is one of the machine learning algorithms developed from gradient boosting. CatBoost modifies the standard gradient boosting algorithm called the ordering principle, which avoids target leakage, and a new algorithm for pro-cessing categorical features [22]. CatBoost is commonly used on datasets with a mix of categorical and numerical features, which are commonly used in real-world applications. The function of decision tree h can be written as: where X k is the random vector of N input variables, Y k is the outcome, and f function is a least squares approximation by the Newton method.

Extreme Gradient Boosting
Extreme gradient boosting, commonly known as XGBoost, is a method that further enhances or optimizes the gradient boosting technique. XGBoost is a robust and widely used machine learning technique that has swept the data science world [23]. In the boosting method, models are trained sequentially, where the results from each weak learner's training influence the next model to be trained [24]. This method was developed by Chen and Guestrin [15] to propose an algorithm that exhibits sparsity awareness (identifying data that has little impact on calculations) for tree learning predictions. XGBoost utilizes the output values from each constructed tree to obtain the final output value, as shown in the following equation:ŷ is the predicted value at the previous iteration, x i is the input vector, t is the number of regression trees, F represents the set of all regression trees, f k is the output of the kth tree, and f t is the t−th regression trees. The objective in XGBoost modeling is to minimize the value of the loss function using the following equation: where l(y i ,ŷ i ) is the loss function and Ω( f t ) is the regularization term. The equation L can be expanded as follows: where t indicates the number of iterations. The equation L (t) can be rewritten by applying the second-order Taylor series [25] and eliminating the constant variable l y i ,ŷ (t−1) i as follows: where

Hyperparameter Tuning
Each machine learning method typically has more than one hyperparameter. The values of these hyperparameters need to be initialized by the model creator before building the model, and they are independent of the data or model used [26]. Hyperparameter tuning is an optimization process performed by the model creator to improve the constructed model by modifying the parameter values that influence the model's training process [27]. This research uses the grid search method [26] in performing the hyperparameter tuning process. Grid search is an approach in machine learning for systematically exploring a predetermined set of hyperparameter values to find the combination that yields the optimal performance for a model [28]. Here are several hyperparameters of the XGBoost method that were tuned in this research using the Python programming language. More detailed information about the hyperparameters can be found in Table 2. MAPE (mean absolute percentage error) is a metric used as an indicator of model accuracy. Suppose there are n samples with forecasted valuesŷ i (forecasted data for the i-th sample) and actual values y i (actual data for the i-th sample). The formula for MAPE is as follows [29]: 2.6.2. Root Mean Square Error RMSE (root mean square error) is commonly used to measure the difference (error) between actual and forecasted data. It calculates the square root of the average squared differences between the actual and forecasted values [30]. The RMSE is given by: 2.6.3. Mean Absolute Error MAE (mean absolute error) is a commonly used metric to evaluate models for regression or classification. It quantifies the average magnitude of errors between predicted values and actual (observed) values. The formula for MAE is as follows [31]: 2.6.4. Scatter Index SI (scatter index) is a normalized metric of RMSE. The range of the SI for the classification of the models is "excellent" if SI < 0.1, "good" if 0.1 < SI < 0.2, "fair" if 0.2 < SI < 0.3, and "poor" if SI > 0.3 [32]. The SI formula is given by: 2.6.5. K-Fold Cross Validation K-fold cross-validation is a commonly used evaluation technique in machine learning [33]. This technique divides the data into k segments without repetition to calculate the average metric for each training. In each training process, the model is trained on k − 1 segments and then validated using the remaining segment [34]. This process continues k times until each segment is used exactly once as validation data.

Methodology
This research uses the XGBoost method to forecast the price of silver with additional variables of platinum price data, gold, and the dollar exchange rate in euros. This research uses the Python programming language. The research process includes the data input process, model building, and hyperparameter tuning which can be seen in full in Figure 3. In the hyperparameter tuning process, the values for each hyperparameter are selected first. For example, to select the gamma value, several XGBoost models are built with different gamma values (default values are used for hyperparameters other than gamma). Then each model is evaluated and the MAPE and RMSE values are plotted into a graph. Then from the graph, some hyperparameter values with the best performance are selected. Then the process is carried out for other hyperparameters. After selecting several values In the hyperparameter tuning process, the values for each hyperparameter are selected first. For example, to select the gamma value, several XGBoost models are built with different gamma values (default values are used for hyperparameters other than gamma). Then each model is evaluated and the MAPE and RMSE values are plotted into a graph. Then from the graph, some hyperparameter values with the best performance are selected. Then the process is carried out for other hyperparameters. After selecting several values for each hyperparameter, GridSearch is performed to build XGBoost models for all possible hyperparameter combinations. Finally, the model with the best MAPE and RMSE values is selected as the final model.

Initial Model
First, an initial baseline model was constructed before performing hyperparameter tuning using the Python programming language. The initial model utilized default values for its hyperparameters (Table 1), which were pre-defined in the xgboost package, leading to the creation of 100 trees. The initial model was built using the training data, and then forecasting was conducted on the testing data, followed by evaluation. The comparison between the forecasted results of the initial model and the actual data can be seen in Figure 4.

Initial Model
First, an initial baseline model was constructed before performing hyperparameter tuning using the Python programming language. The initial model utilized default values for its hyperparameters (Table 1), which were pre-defined in the xgboost package, leading to the creation of 100 trees. The initial model was built using the training data, and then forecasting was conducted on the testing data, followed by evaluation. The comparison between the forecasted results of the initial model and the actual data can be seen in Figure  4. The evaluation results of the initial model yielded an MAPE value of 7.77% (highly accurate) and an RMSE value of 2.16. Subsequently, the hyperparameter tuning process was conducted with the aim of finding hyperparameter values that could optimize the model to achieve a MAPE value smaller than that of the initial model.

Hyperparameter Tuning
In order to optimize the model's performance, a hyperparameter tuning process was conducted to identify the optimal combination of hyperparameters for silver price forecasting. The hyperparameters considered for tuning were max_depth, gamma, learn-ing_rate, and n_estimators. Initially, different values were tested for each hyperparameter using Python programming, aiming to determine suitable value ranges. During each experiment, default values were used for the remaining hyperparameters. MAPE and RMSE were employed as evaluation metrics to assess the performance of each experiment.

Max_Depth
Several values of the hyperparameter max_depth was tested, including 2, 3, 4, 5, 6, 7 8, 9, and 10. The evaluation results for the RMSE and MAPE values of each model can be observed in Figure 5. The evaluation results of the initial model yielded an MAPE value of 7.77% (highly accurate) and an RMSE value of 2.16. Subsequently, the hyperparameter tuning process was conducted with the aim of finding hyperparameter values that could optimize the model to achieve a MAPE value smaller than that of the initial model.

Hyperparameter Tuning
In order to optimize the model's performance, a hyperparameter tuning process was conducted to identify the optimal combination of hyperparameters for silver price forecasting. The hyperparameters considered for tuning were max_depth, gamma, learning_rate, and n_estimators. Initially, different values were tested for each hyperparameter using Python programming, aiming to determine suitable value ranges. During each experiment, default values were used for the remaining hyperparameters. MAPE and RMSE were employed as evaluation metrics to assess the performance of each experiment.

Max_Depth
Several values of the hyperparameter max_depth was tested, including 2, 3, 4, 5, 6, 7, 8, 9, and 10. The evaluation results for the RMSE and MAPE values of each model can be observed in Figure 5. From Figure 5, it can be observed that the MAPE and RMSE values tend to increase as the max_depth value increases. Therefore, the values 2, 3, 4, and 6 were selected for the max_depth parameter.

Gamma
Next, an estimation of the optimal gamma value was attempted. Several values were tested, including 10, 15, 20, 25, 30, 35, 40, 45, …, 250, 255. The RMSE and MAPE values for each gamma value experiment can be observed in Figure 3 in graphical form.
In Figure 6, it can be observed that the MAPE and RMSE values exhibit fluctuations and display an upward trend as the gamma value increases. Consequently, the gamma values of 0, 45, and 70 were chosen based on these observations.

Learning_Rate
Next, various values were tested for the learning_rate parameter. The tested values included 0. 001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, and 0.5. The RMSE and MAPE values for each learning_rate experiment can be observed in Figure 4 in graphical form. From Figure 5, it can be observed that the MAPE and RMSE values tend to increase as the max_depth value increases. Therefore, the values 2, 3, 4, and 6 were selected for the max_depth parameter.
In Figure 6, it can be observed that the MAPE and RMSE values exhibit fluctuations and display an upward trend as the gamma value increases. Consequently, the gamma values of 0, 45, and 70 were chosen based on these observations. From Figure 5, it can be observed that the MAPE and RMSE values tend to increase as the max_depth value increases. Therefore, the values 2, 3, 4, and 6 were selected for the max_depth parameter.

Gamma
Next, an estimation of the optimal gamma value was attempted. Several values were tested, including 10, 15, 20, 25, 30, 35, 40, 45, …, 250, 255. The RMSE and MAPE values for each gamma value experiment can be observed in Figure 3 in graphical form.
In Figure 6, it can be observed that the MAPE and RMSE values exhibit fluctuations and display an upward trend as the gamma value increases. Consequently, the gamma values of 0, 45, and 70 were chosen based on these observations.

Best Hyperparameter Combination
Once the values for each hyperparameter were selected, experiments were conducted for every possible combination of hyperparameters using the grid search method. There was a total of 108 possible combinations of hyperparameters. The process of

Best Hyperparameter Combination
Once the values for each hyperparameter were selected, experiments were conducted for every possible combination of hyperparameters using the grid search method. There was a total of 108 possible combinations of hyperparameters. The process of

Best Hyperparameter Combination
Once the values for each hyperparameter were selected, experiments were conducted for every possible combination of hyperparameters using the grid search method. There was a total of 108 possible combinations of hyperparameters. The process of hyperparameter tuning was stopped as there were already models with better MAPE values than the initial model. From 108 models of hyperparameter tuning results, the average MAPE value is 13.99% and the average RMSE value is 3.5173. Subsequently, the top three models were selected based on the smallest MAPE and RMSE values. In this study, the top three models based on MAPE and RMSE values can be seen in Tables 3 and 4, respectively. From Table 3, the combination of learning_rate = 0.15, max_depth = 2, n_estimators = 130, and gamma = 0 is the best hyperparameter combination based on the MAPE value (model A). Meanwhile, from Table 4, the combination of learning_rate = 0.1, max_depth = 3, n_estimators = 130, and gamma = 0 is the best hyperparameter combination based on the RMSE value (model B).
To assess the significance of the model results using improved tuning parameters compared to the initial model, we employed Welch's t-test (one-tailed). The hypothesis testing of the MAPE value is outlined as follows: H 0 : AverageMAPE Hyperparameter tuned model = MAPE Initial model : this indicating that the performance of the model achieved through hyperparameter tuning is equivalent to that of the initial model. Utilizing a significance level (α = 5%), analysis reveals a p-value = 2.1764 × 10 −8 < 0.05, leading to the rejection of H 0 . Analogous to the performance of the model resulting from parameter tuning based on RMSE, the p-value = 7.1873 × 10 −27 < 0.05. This underscores that the model's performance, as a result of hyperparameter tuning, significantly surpasses that of the initial model.

Forecasting Result
Next, we forecast the silver price using model A and model B. The forecasted silver prices for the next 6 days can be seen in Table 5.
According to the data in Table 5, the forecasted silver prices show a mix of upward and downward trends over the forecast period in model A. However, in model B, the silver prices initially decrease until the third day and then steadily increase until the sixth day.

K-Fold Cross Validation
Next, an evaluation was conducted using 5-fold cross-validation using the model A and model B that have been obtained. The data partitioning was performed in a nonrandom manner, but instead sequentially based on the data index. The MAPE and RMSE values from each iteration can be found in Table 6 and the ranking of the two best models can be seen in Table 7.

Comparison with Other Models
The final models (A and B) obtained from the hyperparameter tuning results are then compared with other machine learning models. CatBoost and random forest models are built which belong to the ensemble learning type with the same basic concept as XGBoost, namely building regression trees. In the process of comparing, it becomes necessary to incorporate extra evaluation measures for a more all-encompassing inference. Supplementary metrics such as MAE and SI are introduced. The results of the model comparison with the evaluation metric value can be seen in Table 8. The results of Table 8 further convince us that XGBoost models A and B with the tuning process are better compared to random forest and CatBoost based on those four metrics.

Conclusions
This study builds and optimizes an XGBoost model by conducting hyperparameter tuning to forecast silver prices. The research explores 108 hyperparameter combinations and identifies the top two models based on the evaluation metrics of MAPE and RMSE. The best model based on MAPE has a MAPE value of 5.98% and an RMSE of 1.6998, with a hyperparameter combination of learning_rate = 0.15 and max_depth = 2. The forecasted silver prices for the next six days show a decline on the first and second days, followed by an increase on the third day, another decline on the fourth day, and then a subsequent rise on the fifth and sixth days. The best model based on RMSE has a MAPE value of 6.06% and an RMSE of 1.6967, with a hyperparameter combination of learning_rate = 0.1, max_depth = 3, n_estimators = 130, and gamma = 0. The forecasted silver prices for the next six days indicate a decrease until the third day, followed by a continuous increase until the sixth day. Based on the model evaluation results in Table 8, the MAPE, RMSE, MAE, and SI metrics of the proposed model perform better than other ensemble models (CatBoost and random forest).

Recommendations
This study aims to optimize the XGBoost model through hyperparameter tuning, specifically focusing on four hyperparameters. It is suggested for future research to explore additional hyperparameters, including eta, lambda, alpha, and min_child_weight, and to increase the number of hyperparameter variations based on research capacity. Furthermore, incorporating data preprocessing techniques to address missing or messy data can improve the forecasting performance [35]. Additionally, future studies can consider incorporating additional variables such as inflation data, oil prices, or other relevant factors to enhance the accuracy of silver price forecasting.

Data Availability Statement:
The data in this paper are accessible at the following link: https://github. com/DylanNorbert/SilverPriceForecast-Dylan (accessed on 7 July 2023).

Conflicts of Interest:
The authors declare no conflict of interest.