Setting the Initial Value for Single Exponential Smoothing and the Value of the Smoothing Constant for Forecasting Using Solver in Microsoft Excel

: Although single exponential smoothing is a popular forecasting method for a wide range of applications involving stationary time series data, consistent rules about choosing the initial value and determining the value for the smoothing constant ( α ) are still required, because they directly impact the forecast accuracy. The purpose of this study is to mitigate these shortcomings. First, a new method for setting the initial value by weighting is derived, and its performance is compared with two other traditional methods. Second, the optimal ( α ) was automatically solved using Solver in Microsoft Excel, after which α was determined by minimizing the mean squared error (MSE). This was accomplished by comparing the α from Solver with step search by setting the smoothing constant by varying its value from 0.001 to 1 in increments of 0.001 and then choosing the optimal α value from this range that has the lowest MSE. The experimental results show that α from Solver and the optimal α with step search are not different, and the initial value set by the proposed method outperformed the existing ones regarding the MSE.


Introduction
Exponential smoothing is one of the most widely used techniques in forecasting due to its simplicity, robustness, and accuracy in an automatic forecasting procedure.Hence, it has been widely used for forecasting future values from time series data [1][2][3][4][5].One of the basic ideas behind using weighted averages of past observations is that more recent observations carry more weight when determining forecasts than observations in the distant past.Exponential smoothing is divided into three types: single, double, and triple.Single exponential smoothing (SES) is used on data with a stable fluctuating pattern; double exponential smoothing is used on data with a trend pattern; and triple exponential smoothing is used on data with both trend and seasonal patterns [6].
However, there are two problems with exponential smoothing forecasting methods.The first is choosing a suitable value for the smoothing constant (α) and the second is setting a suitable initial value.The forecaster must determine one or more parameters in exponential smoothing for assigning exponentially decreasing weights as the observations become older, because future events usually depend more on the most recent data [7].Hence, the value of α is important for successful forecasting via exponential smoothing.Nevertheless, there need to be consistent guidelines on how they should be selected.In general, the value of α is selected by applying a suitable calculation process, many of which have been tried out in the past.Many statisticians have recommended that the value of α should be kept small (in the 0.1 to 0.3 range) to minimize the forecasting error function [8,9].Moreover, Paul [10] recommended selecting the value for α using a nonlinear optimizer [11,12].However, it is often the case that the value is outside the recommended range.Solver in Microsoft Excel has become increasingly popular as a nonlinear optimizer [13,14].This optimization can be performed using the Solver in Microsoft Excel, and many textbooks, such as Chopra and Meindl [15] and Balakrishnan et al. (2013) [16], mentioned and illustrate this approach.
As is well known, SES models are recursive, and thus, an initial value is required to feed the model to obtain the most accurate prediction [6,17].Therefore, the choice of this value has an important impact on the forecasting performance.Many researchers have proposed methods to determine a suitable initial value for the smoothing parameter in SES models.Brown's [18] original suggestion was simply to use the mean of the data for the initial value while others have suggested using the first observation or the average of the first three observations as the initial value.Ledolter and Abraham [17] recommended backcasting to obtain the initial value.In this method, the smoothing algorithm estimated the initial value by going backward in the series.Another approach with a limited number of data points is to use the Bayesian method to combine the prior estimate of the level with the average of the available data [19][20][21].In most cases, the first actual value is considered to be the initial value for the smoothing parameter when using SES [22].Furthermore, the average of the first five or six observations can be used as the initial value [23], which is used in many statistical packages, such as Minitab, and has been used to set the initial value for SES in many forecasting approaches [24][25][26][27].
The accuracy of SES may vary depending on the chosen value of α.Even though much research has been conducted on this subject, forecasters have not been able to reach a consensus on how to select the value for α or the initial value.Therefore, in this study, the effects of α and the initial value are clarified, after which, various methods for setting the initial value are investigated.Their performances and searching for the optimal value for α were then investigated based on the mean squared error (MSE) values, a popular metric that is commonly used for comparing forecasting techniques.
The remaining parts of this paper are as follows: The theoretical framework is covered in Section 2. The proposed methods for setting the initial value are presented in Section 3. Experimentation to show the efficacy of the proposed methods is reported in Section 4. The results and a discussion are provided in Section 5. Finally, the conclusions and remarks are presented in Section 6.

SES
This is a suitable forecasting method based on data without a trend or a seasonal component [18] (Brown, 1959).In SES, α ranging from 0 to 1 is used as a weight to indicate how similar the current and previous observations are: a weight value close to 1 emphasizes the current value whereas one close to 0 emphasizes the previous one [28].The formula for SES is defined as the following: where S t is the smoothed value of the time series at time t, α is the smoothing constant for α ∈ [0, 1], and y t is the actual value of the time series at time t.
In the case of SES, the smoothed statistic is the forecasted value, which is derived as where F t+1 and F t are the forecasted values of time series at time t + 1 and t, respectively.Expanding Equation (2) means that By substituting each of these into the following equation, we obtain Generalizing the above equation provides Equation ( 3) represents the weighted moving average of all past observations with the weights decreasing exponentially; i.e., exponential smoothing [29].It can be seen that large recent observations are assigned a larger weight, which can be interpreted as the weighted average for the most recent forecast from the most recent observations [1].
The assumption of these weights are as follows: For forecasting from one step ahead at time t + 1, all of the observations in the series y 1 , y 2 , . . ., y t are used with an exponential weighting scheme that assigns the maximum weight to the most recent observation.Moreover, the weights decline systematically as the observations that are still included become older.Weighting in an exponential smoothing technique requires a given value of α, and the forecasted values vary depending on the value of this constant, as does the forecasting error.Hence, the critical process is determining the value of α constrained by α ∈ [0, 1].Since mistakes in selecting the optimal value adversely affect the estimated results, obtaining the value of α is conducted via trial and error while minimizing the sum of the squared error.As the value of α controls the smoothing level adaptation speed, choosing the initial value must be conducted carefully, because a poor choice will require more time for the recursive smoothing formula to adapt and for its effect to dissipate.The impact is very tangible among small time series and those with a small value [30].Considering Equation (3), the initial value F 1 needs to be found via a prediction algorithm, which plays a critical role in computing all of the subsequent SES.In practice, different methods are applied to solve these problems.

Setting the Initial Value for SES
Because SES requires the previous forecasted value at each stage, it must be apparent how to start the method.Different initialization and estimation methods for setting the initial value for SES have been discussed in the literature.Still, the state-of-the-art method is to estimate the optimum alpha and the initial value together by minimizing some loss function [31].Typically, MSE is used, minimizing the squares of one step ahead in a sample forecast error.The first observation as the initial value for SES was suggested by Hyndman and Athanasopoulos [1].Additionally, many statistical packages, such as Minitab, have been used to set the initial value for SES with the average of the first six observations [23].
The following are two methods that are most commonly used to set the initial value for SES and are used to compare with the proposed method.

1.
The first observation is used as the initial value of α when using SES [22].By simple algebra, of SES, can also be expressed as where F 1 and y 1 are the initial value and the actual value, respectively.The choice is preferable when the level changes rapidly (α close to 1, or 1 − α close to 0).

2.
The average of the first six observations is used for the initial value [23].
By simple algebra of SES, can also be expressed as where F 1 and y 1 , y 2 , . . ., y 6 are the initial value and the actual values, respectively.

Performance Metrics
The fundamental challenge for a metric error measure is that it condenses a large amount of data into a single value.Evaluating the performance of a forecasting method is achieved by comparing the actual values with the predicted ones.A typical approach is to use a specific criterion to measure the error of the predicted value.Subsequently, the efficiency of the forecasting method is evaluated based on how close the predicted and actual values are.The most frequently used error index, MSE [6,32], was used in the present study, which is defined as n where y t and F t are the actual and predicted values at time t, respectively, and n is the number of data points.

The Proposed Method to Set the Initial Value for SES
In this method, the first six observations are still used for taking the weighted average.Based on the concept of the weighted average of past observations, the closer it gets to the first observation, the greater the weight and the farther (Table 1).Hence, F 1 is assigned more weight (w 1 ) for y 1 than for y 2 , . . ., y 6 .Performing SES with the proposed method is achieved as follows: 1.

2.
Use Solver in Microsoft Excel to find the optimal value for α while minimizing MSE.

3.
Provide the initial forecasting value via Calculate the MSE.
An example of using the Solver in Microsoft Excel is presented in Figure 1.The time series dataset of S1 is set with the initial value of Method 1, and the optimal smoothing value returned is α = 0.3689, whereas the MSE value is 896,212.76.
  Use Solver in Microsoft Excel to find the optimal value for α while minimizing MSE.(1 )

Provide the initial forecasting value via
An example of using the Solver in Microsoft Excel is presented in Figure 1.The time series dataset of S1 is set with the initial value of Method 1, and the optimal smoothing value returned is α = 0.3689, whereas the MSE value is 896,212.76.

Experimental Study
Fifteen real time series datasets from the M3 competition with a stationary pattern (five small, medium, and large ones) [33] were used in this study to evaluate the performance of the proposed method.For each one, the augmented Dickey-Fuller test was used to check whether the pattern was stationary.Brief details of these datasets are reported in Table 2.
α was set in increments of 0.001 using α ∈ [0, 1], so there were 1000 sets of conditions for each dataset.The experimental study was conducted via the steps using the R version 3.5.2[34] to measure the performance of the proposed method compared to the SES method.
The steps of the experimental study are as follows: 1.
Each time series dataset with a stationary pattern is performed for three initial values by applying methods 1 and 2, and the proposed method are as follows: Method 1: For each initial value setting, use the solver in Excel to find the optimal value for α while minimizing the MSE.Hence, we obtained the optimal value for α and the MSE value of each initial value setting.

3.
For each initial value setting, use a grid search with α by varying 0.001, 0.002, . . ., 1.00 and compute the MSE in each α.With these 1000 conditions, we searched the optimal value for α and obtained the lowest MSE.

4.
The performance of the proposed method compared with the other two initialization methods of the SES method, considering the lowest MSE.
Y, Q, and M are yearly, quarterly, and monthly, respectively.

Results and Discussion
Table 3 provides the initial values via the three methods and the optimal value for α using a step search and Solver in Microsoft Excel, which were very similar.As an example for a small-sized dataset (S1), the optimal values for α using a step search and Solver from Microsoft Excel were 0.369 and 0.3689, respectively, when setting the initial value using Method 1, whereas they were 0.316 and 0.3162, respectively, when setting the initial value using Method 2, and 0.323 and 0.3232, respectively, when setting the initial value using the proposed method.This trend was the same for all of the datasets.Figure 2 shows the results for each dataset when using different initialization methods; the optimal value for α differs only slightly.Eventually, Solver from Microsoft Excel and the step search methods obtained very similar optimal values for α for the same initialization method.Overall, the different sizes of small with S1-S5, medium with M1-M5, and large with L1-L5 datasets are represented by the bar chart of Figure 3, and it can be seen that both Solver in Microsoft Excel and step search obtained nearly the same optimal value for α results.Hence, the Solver from Microsoft Excel is an alternative to a powerful method for obtaining the optimal value for α. ods obtained very similar optimal values for α for the same initialization method.Overall, the different sizes of small with S1-S5, medium with M1-M5, and large with L1-L5 datasets are represented by the bar chart of Figure 3, and it can be seen that both Solver in Microsoft Excel and step search obtained nearly the same optimal value for α results.Hence, the Solver from Microsoft Excel is an alternative to a powerful method for obtaining the optimal value for α.The results of a correlation analysis of the performances of the three methods are presented in Figure 3.It can be seen that the correlation coefficient values are close to +1 and lie in a straight line, meaning that the optimal values for α obtained using the step search method or Solver from Microsoft Excel using the initial value from all three methods are in good agreement.Figure 4 illustrates the actual and predicted values when setting the initial value using Method 1, Method 2, and the proposed method.For example, in S1, when setting the initial value using Method 2 and the proposed method, similar predicted values are achieved, but when setting the initial value using Method 1, the predicted values in the early period are far from those of the other methods.Additionally, they can be interpreted in the same way in the other time series.Moreover, Table 4 reports the lowest MSE values for the three initial value setting methods used to obtain the optimal value for α using the The results of a correlation analysis of the performances of the three methods are presented in Figure 3.It can be seen that the correlation coefficient values are close to +1 and lie in a straight line, meaning that the optimal values for α obtained using the step search method or Solver from Microsoft Excel using the initial value from all three methods are in good agreement.
Figure 4 illustrates the actual and predicted values when setting the initial value using Method 1, Method 2, and the proposed method.For example, in S1, when setting the initial value using Method 2 and the proposed method, similar predicted values are achieved, but when setting the initial value using Method 1, the predicted values in the early period are far from those of the other methods.Additionally, they can be interpreted in the same way in the other time series.Moreover, Table 4 reports the lowest MSE values for the three initial value setting methods used to obtain the optimal value for α using the step search method and Solver from Microsoft Excel.For each dataset, the lowest MSE value was obtained from a set of 1000 MSE values.For example, for S1, when setting the initial value using Method 1, Method 2, and the proposed method, the lowest MSE values were 896,212.77,886,047.38,and 885,070.96,respectively.This trend was the same for the other datasets.It can be seen that although the results are similar, setting the initial value using the proposed method obtained lower MSE values than the other methods.Figure 5 visually supports these findings.

Method 1 Method 2 Proposed
Step Solver Step Solver Step Solver

Small
The initial setting methods' lowest MSE frequencies were compared using Chi-squared goodness-of-fit tests (Table 5).The null hypothesis states that the number of times that each method achieves the lowest MSE is the same.For example, for S1, the Chi-squared statistic for the lowest MSE value is 159.65 with a p-value < 0.0001, which leads to the conclusion that the number of times that each method achieves the lowest MSE is significantly different.The results for the other datasets can be interpreted in the same way.Thus, the proposed method for setting the initial value achieved the lowest MSE values, and it is evident that it quite considerably outperformed the other two methods.
step search method and Solver from Microsoft Excel.For each dataset, the lowest MSE value was obtained from a set of 1000 MSE values.For example, for S1, when setting the initial value using Method 1, Method 2, and the proposed method, the lowest MSE values were 896,212.77,886,047.38,and 885,070.96,respectively.This trend was the same for the other datasets.It can be seen that although the results are similar, setting the initial value using the proposed method obtained lower MSE values than the other methods.Figure 5 visually supports these findings.

Small Medium Large
Figure 5.The lowest MSE using the step search method and Solver from Microsoft Excel to find the optimal value for α after setting the initial value using M1 (Method 1), M2 (Method 2), and P (the proposed method) for large (L1-L5), medium (M1-M5), and small (S1-S5) time series datasets.
The initial setting methods' lowest MSE frequencies were compared using Chisquared goodness-of-fit tests (Table 5).The null hypothesis states that the number of times that each method achieves the lowest MSE is the same.For example, for S1, the Chisquared statistic for the lowest MSE value is 159.65 with a p-value < 0.0001, which leads to the conclusion that the number of times that each method achieves the lowest MSE is significantly different.The results for the other datasets can be interpreted in the same way.Thus, the proposed method for setting the initial value achieved the lowest MSE values, and it is evident that it quite considerably outperformed the other two methods.5.The lowest MSE using the step search method and Solver from Microsoft Excel to find the optimal value for α after setting the initial value using M1 (Method 1), M2 (Method 2), and P (the proposed method) for large (L1-L5), medium (M1-M5), and small (S1-S5) time series datasets.
Figure 6 shows a stacked bar chart of the frequencies of obtaining the lowest MSE value when setting the initial value using the three initial value methods: Method 1, Method 2, and the proposed method for each of the 15 datasets.Notably, Figure 6 shows the MSE values for the proposed method were lower than those for the other two methods, a trend that was the same for all of the datasets.Moreover, in Figure 7, it is worth noting that using an initial value, in Method 1, of α in the range of 0.1-0.4provided forecasted values far from the actual values, whereas when it exceeded 0.5, all three methods provided similar results.The proposed method and Method 2 usually provided similar MSE values for the same set of conditions, and their plotted lines tended to overlap each other.Meanwhile, lower MSE values for the three initial value setting methods became more evident as the settings were decreased.These results are consistent with the other findings, and it was concluded that the proposed method performed better than the others.Figure 6 shows a stacked bar chart of the frequencies of obtaining the lowest MSE value when setting the initial value using the three initial value methods: Method 1, Method 2, and the proposed method for each of the 15 datasets.Notably, Figure 6 shows the MSE values for the proposed method were lower than those for the other two methods, a trend that was the same for all of the datasets.Moreover, in Figure 7, it is worth noting that using an initial value, in Method 1, of α in the range of 0.1-0.4provided forecasted values far from the actual values, whereas when it exceeded 0.5, all three methods provided similar results.The proposed method and Method 2 usually provided similar MSE values for the same set of conditions, and their plotted lines tended to overlap each other.Meanwhile, lower MSE values for the three initial value setting methods became more evident as the settings were decreased.These results are consistent with the other findings, and it was concluded that the proposed method performed better than the others.It can be seen that when using values of α ranging from 0 to 1 (0.001, 0.002, . . ., and 1) to produce 1000 settings, different average MSE levels were provided (Table 6).For example, for S1, Method 1, Method 2, and the proposed method produced MSE values of 984,214.12,969,740.55,and 967,322.56,respectively.Similarly, they produced MSE values of 31,757.83,31,338.43,and 30,365.02,respectively, for M1, and 1,639,370.09,1,619,541.12,and 1,616,325.69,respectively, for L1.Thus, it can be concluded that, when using the same value of α, the initial value provided by the proposed method provided the lowest MSE for all sizes of datasets and quite considerably outperformed the other two methods in this endeavor.Bar charts of the average MSE values obtained using the three methods for each dataset are shown in Figure 8.The results clearly show that the proposed method performed much better than Method 1 and Method 2. It can be seen that when using values of α ranging from 0 to 1 (0.001, 0.002, ..., and 1) to produce 1000 settings, different average MSE levels were provided (Table 6).For example, for S1, Method 1, Method 2, and the proposed method produced MSE values of 984,214.12,969,740.55,and 967,322.56,respectively.Similarly, they produced MSE values of 31,757.83,31,338.43,and 30,365.02,respectively, for M1, and 1,639,370.09,1,619,541.12,and 1,616,325.69,respectively, for L1.Thus, it can be concluded that, when using the same value of , the initial value provided by the proposed method provided the lowest MSE for all sizes of datasets and quite considerably outperformed the other two methods in this endeavor.Bar charts of the average MSE values obtained using the three methods for each dataset are shown in Figure 8.The results clearly show that the proposed method performed much better than Method 1 and Method 2.   .Average accuracy values with MSE for the three initial value setting methods using values of α ranging from 0 to 1 (0.001, 0.002, ..., and 1) for the 15 datasets: small (S1-S5), medium (M1-M5), and large (L1-L5).

Conclusions and Remarks
To effectively use the SES method, the forecaster must first choose a proper value for α and then set the initial value to calculate the smoothed values and make the forecast.The MSE is often used as a criterion for selecting an appropriate value for α.For instance, by assigning the values [0,1], one then selects the value that produces the smallest MSE.Importantly, these values considerably affect the accuracy of the forecast.
In this study, the Solver in Microsoft Excel and step search methods were used to determine the value of α that optimally fitted several time series datasets.Since their performances were not different, Solver from Microsoft Excel is a powerful alternative .Average accuracy values with MSE for the three initial value setting methods using values of α ranging from 0 to 1 (0.001, 0.002, . . ., and 1) for the 15 datasets: small (S1-S5), medium (M1-M5), and large (L1-L5).

Conclusions and Remarks
To effectively use the SES method, the forecaster must first choose a proper value for α and then set the initial value to calculate the smoothed values and make the forecast.The MSE is often used as a criterion for selecting an appropriate value for α.For instance, by assigning the values [0,1], one then selects the value that produces the smallest MSE.Importantly, these values considerably affect the accuracy of the forecast.
In this study, the Solver in Microsoft Excel and step search methods were used to determine the value of α that optimally fitted several time series datasets.Since their

Figure 1 .
Figure 1.An example of the Solver results pane for the time series of S1 with initial value setting with Method 1.Figure 1.An example of the Solver results pane for the time series of S1 with initial value setting with Method 1.

Figure 1 .
Figure 1.An example of the Solver results pane for the time series of S1 with initial value setting with Method 1.Figure 1.An example of the Solver results pane for the time series of S1 with initial value setting with Method 1.

Figure 2 .
Figure 2.A bar chart of the optimal values of α (alpha) using a step search or Solver from Microsoft Excel when setting the initial value using T1 (Method 1), T2 (Method 2), or P (the proposed method) for large, medium, and small time series datasets.

Figure 2 .
Figure 2.A bar chart of the optimal values of α (alpha) using a step search or Solver from Microsoft Excel when setting the initial value using T1 (Method 1), T2 (Method 2), or P (the proposed method) for large, medium, and small time series datasets.

Figure 3 .
Figure 3. Scatter plots of the optimal value for  obtained using the step search method and Solver from Microsoft Excel when setting the initial value using (a) Method 1, (b) Method 2, or (c) the proposed method for all 15 time series datasets.

Figure 3 .
Figure 3. Scatter plots of the optimal value for α obtained using the step search method and Solver from Microsoft Excel when setting the initial value using (a) Method 1, (b) Method 2, or (c) the proposed method for all 15 time series datasets.

Figure 4 .
Figure 4. Plots of the actual and predicted values when setting the initial value using Method 1, Method 2, and the proposed method; for example, of small, medium, and large time series data with S1, S2, M1, M2, L1, and L2.

Figure 4 .
Figure 4. Plots of the actual and predicted values when setting the initial value using Method 1, Method 2, and the proposed method; for example, of small, medium, and large time series data with S1, S2, M1, M2, L1, and L2.

Figure 6 .
Figure 6.The frequencies of achieving the lowest MSE when setting the initial value using Method 1, Method 2, and the proposed method.

Figure 7 .
Figure 7. MSE of the three initial value setting methods for 1000 sets of conditions of S1, M1, and L1.

Figure 7 .
Figure 7. MSE of the three initial value setting methods for 1000 sets of conditions of S1, M1, and L1.

Table 1 .
Assigning weights to the first six observations.

Table 2 .
The M3 competition time series data used to compare the forecast methods.

Table 3 .
The optimal values of α using the initial values provided by the three methods.

Table 3 .
The optimal values of α using the initial values provided by the three methods.

Table 4 .
The lowest MSE of three initial value setting methods using the optimal value for α from step search and Solver from Microsoft Excel.

Table 4 .
The lowest MSE of three initial value setting methods using the optimal value for α from step search and Solver from Microsoft Excel.

Table 5 .
Chi-squared goodness-of-fit test results for comparing the lowest MSE value frequencies of the three initial value setting methods.

Table 5 .
Chi-squared goodness-of-fit test results for comparing the lowest MSE value frequencies of the three initial value setting methods.
* The best performance in terms of average MSE value.