A New Period-Sequential Index Forecasting Algorithm for Time Series Data

A period-sequential index algorithm with sigma-pi neural network technology, which is called the (SPNN-PSI) method, is proposed for the prediction of time series datasets. Using the SPNN-PSI method, the cumulative electricity output (CEO) dataset, Volkswagen sales (VS) dataset, and electric motors exports (EME) dataset are tested. The results show that, in contrast to the moving average (MA), exponential smoothing (ES), and autoregressive integrated moving average (ARIMA) methods, the proposed SPNN-PSI method shows satisfactory forecasting quality due to lower error, and is more suitable for the prediction of time series datasets. It is also concluded that: There is a trend that the higher the correlation coefficient value of the reference historical datasets, the higher the prediction quality of SPNN-PSI method, and a higher value (>0.4) of correlation coefficient for SPNN-PSI method can help to improve occurrence probability of higher forecasting accuracy, and produce more accurate forecasts for the big datasets.


Introduction
In the big data era, a large number of time series data are continuously generated in the network systems, such as stock price, sales volume, production capacity, weather data, ocean engineering, engineering control, and largely in any system of applied science and engineering which involves investigations of time-varying parameters [1][2][3]. In general, the distribution of time series data changes over time, and is non-stationary [4,5], while some data shows potential periodicity characteristics. Since the 1950s, time series forecasting has received much interest in prediction science.
Continuously growing numbers of new algorithms have been proposed and studied on time series forecasting. Firstly, the exponential smoothing (ES) method [5,6] and the moving average (MA) method [7] are simple and widely used, and performed well in forecasting competitions against more sophisticated approaches. Secondly, the autoregressive integrated moving average (ARIMA) model integrates autoregressive models (AR) and moving average models (MA), and is widely used as a linear time series forecasting method [8,9]. The ARIMA model gives good accuracy in forecasting relatively stationary time series data, but needs a strong assumption that the values of future data are linearly dependent on the values of historical data [10]. Thirdly, the artificial neural network (ANN) [11][12][13] and adaptive models [14] have also been used to forecast nonlinear time series data, and improve forecasting accuracy in different time scales. It is also possible to hybrid different methods to improve overall forecasting accuracy [15].
improve forecasting accuracy in different time scales. It is also possible to hybrid different methods to improve overall forecasting accuracy [15].
However, no traditional forecasting methods can meet all the targets [16][17][18], while applying heuristic methods are also worth researching [19]. Here, a period-sequential index algorithm with a sigma-pi neural network (SPNN-PSI) is proposed and dedicated to the prediction of time series data. Here a period-sequential index algorithm (PSI) by identifying structures from transformed data, where there are four indexes carrying implicitly structure information usable for forecasting, the period index, sequential index, small period index, and super sequential index, is combined with a sigma-pi neural network algorithm (SPNN) improving the accuracy and robustness of forecasting algorithm. The SPNN-PSI method has a universal application, and a satisfactory prediction quality improved as the correlation coefficient value of the reference historical datasets increased.

Period-Sequential Index (PSI) Algorithm
Finding index-values implicitly carrying structure information, a period-sequential index (PSI) algorithm, is proposed to predict the time series data. The index-values cover the complete period, while the period index and sequential index, as well as small period index and super sequential index, are introduced to describe the dataset structure information in vertical and horizontal dimensions, respectively. In this way, for time series data, the following year's dataset can be predicted using only two consecutive years of reference historical datasets. Figure 1 shows the schematic diagram of the PSI algorithm. H-2, H-1 denote reference historical periods, i.e., the year before last and last year. H0 represents the forecasting period. The period for H-2, H-1, and H0 is uniform, and defined as T in this paper. At historical time of t, PI(t), SI(t), pi(t), and si(t) describe the period index, sequential index, small period index, and super sequential index, respectively. We assume that the forecasting value follows the measurement Equation (1), where F(ti) is the forecasting value at time of ti (i = 1, 2 … N) during H0 period. N represents the number of model forecasting samples.
, and si(ti − T), as the eight variables in Equation (1), are described as follows: (1) Period index The period index indicates the relationship between the reference historical data and the reference value which can be explained through Equations (2) and (3), We assume that the forecasting value follows the measurement Equation (1), where F(t i ) is the forecasting value at time of t i (i = 1, 2 . . . N) during H 0 period. N represents the number of model forecasting samples.
, and si(t i − T), as the eight variables in Equation (1), are described as follows: (1) Period index The period index indicates the relationship between the reference historical data and the reference value which can be explained through Equations (2) and (3), where y(t i − 2T) and y(t i − T) describe the reference historical data at time of t i − 2T and t i − T, respectively. K −2 and K −1 are reference functions of period index. A standard period average is originally set to be a reference function of period index, and where it is defined as a constant.
(2) Sequential Index The sequential index indicates the relationship between two adjacent reference historical data with the defined time steps. It is calculated through Equations (4) and (5).
(3) Small Period Index The small period index indicates the relationship between the reference historical data and the reference value, which can be explained through Equations (6) and (7), where k −1 and k −2 are reference functions of a small period index. Here a small period (such as three months, because of seasonal factors) average, is originally set to be a reference function of small period index.
(4) Super Sequential Index The super sequential index indicates the relationship between two interval reference historical data with the defined steps. It is calculated through Equations (8) and (9).

SPNN-PSI Method
SPNN has been proposed by Lyutikova [20]. The products of the different linear combinations of the inputs in SPNN are the output of the network. SPNN has simpler structure, less variance, and faster convergence speed. A high SPNN degree determines the function that defines the relationship between output and input to depend on more parameters and to have a more complex structure. This can contribute to better prediction results but can also cause overfitting, which requires more computation time for the training algorithm. In the study, the architecture of SPNN-PSI with degree 4 and 8 inputs is shown in Figure 2, which can reproduce modeling function of Equation (10). As shown in Figure 2, where, c, d, e, and f are the weighing factors of forecasting Equation (10). K0 and k0 are the correction coefficients for period index and small period index, respectively. γ0 is planning factor, which is defined by median method in Equation (17).

SPNN-PSI Method
SPNN has been proposed by Lyutikova [20]. The products of the different linear combinations of the inputs in SPNN are the output of the network. SPNN has simpler structure, less variance, and faster convergence speed. A high SPNN degree determines the function that defines the relationship between output and input to depend on more parameters and to have a more complex structure. This can contribute to better prediction results but can also cause overfitting, which requires more computation time for the training algorithm. In the study, the architecture of SPNN-PSI with degree 4 and 8 inputs is shown in Figure 2, which can reproduce modeling function of Equation (10). As shown in Figure   The input vector zi is given by ( ) The weight matrix w is defined as follows In order to obtain the optimized weight values of c, d, e, and f, the output vector is defined as the observation data y(ti-T) during H-1 period. The input vector z i is given by The weight matrix w is defined as follows Appl. Sci. 2019, 9, 4386 5 of 12 In order to obtain the optimized weight values of c, d, e, and f, the output vector is defined as the observation data y(t i − T) during H −1 period. (20) In this study, multiple inputs and outputs are combined into the following equation: where Y and Z are a matrix of multiple outputs and matrix of multiple inputs, respectively.
Then, based on the measured value Y and input Z, the learning control method of the initial training neural network is used to get the optimal weight matrix w, which is given:

Error Evaluation
In order to evaluate the obtained results, the forecasting accuracy was measured with three error indicators, that are the mean absolute percentage error (MAPE), the root mean squared error (RMSE) and the mean absolute error (MAE) [20,21]. Calculation equations of error indicators are given in the below Equations (25)-(27). In addition, the Pearson correlation coefficient (r) in Equation (28) is also used to quantify the strength and direction of the linear relationship between two sets of data during the reference historical period [22].
where y(t i ) is the measured value at time of t i during H 0 ; F(t i ) is the forecasting value at time of t i during H 0 .

Steps of Computation
The flow chart of the optimized network model is shown in Figure 3. The specific steps are as follows: Step 1: Initialize the reference historical datasets.
Step 2: Training the reference historical two-cycle-years datasets, calculate the parameters Y and Z by solving Equations (22) and (23).
Step 3: At the time step of t i , take Y and Z into the Equation (24), and solve it, and get the optimal weight matrix w.
Step 4: Update the optimal values of c, d, e, and f in Equation (10), solve Equation (10), and get the forecasting value F(t i ).
Step 5: If the time steps of the stop condition (t i+1 > t N ) are satisfied, the search stops, and output parameters of MAPE, RMSE, MAE, and r; Otherwise, the time step is added, and the procedure returns Step 2.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 12 Step 5: If the time steps of the stop condition (ti+1 > tN) are satisfied, the search stops, and output parameters of MAPE, RMSE, MAE, and r; Otherwise, the time step is added, and the procedure returns Step 2.

Results and Discussion
Three groups of actual time series datasets [23][24][25] are shown in Table 1. In these three samples, two-cycle-years datasets from January 2016 to December 2017 are used as reference historical data to forecast the data of 2018. In order to evaluate the confidence of the period-sequential index algorithm with sigma-pi neural network (SPNN-PSI), the correlation coefficients of datasets for 2016 and 2017 are further computed, and listed in Table 1. Then, a comparison analysis of the prediction value and the real value was implemented by using the SPNN-PSI, MA, ES, and ARIMA methods, so as to show a more direct observation of the prediction.

Results and Discussion
Three groups of actual time series datasets [23][24][25] are shown in Table 1. In these three samples, two-cycle-years datasets from January 2016 to December 2017 are used as reference historical data to forecast the data of 2018. In order to evaluate the confidence of the period-sequential index algorithm with sigma-pi neural network (SPNN-PSI), the correlation coefficients of datasets for 2016 and 2017 are further computed, and listed in Table 1. Then, a comparison analysis of the prediction value and the real value was implemented by using the SPNN-PSI, MA, ES, and ARIMA methods, so as to show a more direct observation of the prediction.

Periodic Recognition and Prediction on Electric Motors Exports (EME) Dataset
For the EME datasets, the electric motors exports in China between January 1995 and June 2019 is used as an example [23] to show the correlation coefficient detection and prediction results. Using the reference historical two-cycle-years dataset from January 2016 to December 2017, the monthly electric motors exports from January to December of 2018 are predicted by using the SPNN-PSI, MA, ES, and ARIMA methods, as shown in Figure 4. It can be seen from Figure 4 that, compared with MA, ES, and ARIMA methods, the SPNN-PSI method demonstrates a better prediction trend with good volatility and following quality.

Periodic Recognition and Prediction on Electric Motors Exports (EME) Dataset
For the EME datasets, the electric motors exports in China between January 1995 and June 2019 is used as an example [23] to show the correlation coefficient detection and prediction results. Using the reference historical two-cycle-years dataset from January 2016 to December 2017, the monthly electric motors exports from January to December of 2018 are predicted by using the SPNN-PSI, MA, ES, and ARIMA methods, as shown in Figure 4. It can be seen from Figure 4 that, compared with MA, ES, and ARIMA methods, the SPNN-PSI method demonstrates a better prediction trend with good volatility and following quality.   Table 2 presents a more visual view of prediction errors of each model. According to the results obtained in Table 2, the corresponding error of MAPE is 5.34%, 6.79%, 8.11%, and 6.97% for the PSI, MA, ES, and ARIMA methods, respectively. It also can be noticed that the four errors are within a reasonable range, but the developed SPNN-PSI algorithm is more suitable for the prediction of the EME dataset used in this paper due to lower MAPE, RMSE, and MAE, as compared to the other three prediction methods. In addition, the correlation coefficient of its historical reference data is shown in Table 1. The example shows that the proposed SPNN-PSI algorithm achieves satisfactory accuracy in time series prediction on the EME dataset whose historical reference data has a relatively higher correlation coefficient value of 0.9388. Table 2. Error indicators of each prediction model on the EME dataset.

Periodic Recognition and Prediction on Volkswagen Sales (VS) Dataset
For the VS dataset, the Volkswagen sales in China between January 2007 and June 2019 is used as another example [24] to show the correlation coefficient detection and prediction results. Figure 5 shows the prediction results in 2018 by using the PSI, MA, ES, and ARIMA methods. As shown in Figure 5, between January 2018 and May 2018, the predicted VS using the PSI method can be almost identical with actual VS, comparing to the MA, ES, and ARIMA methods, and a lower MAPE of 4.48% is also achieved by Equation (18). Meanwhile, it can be seen that, the trend of the predicted values between June 2018 and December 2018 is similar to that of the actual values when using PSI method, but the difference between them at each time point is larger than that of using the MA, ES, and  Table 2 presents a more visual view of prediction errors of each model. According to the results obtained in Table 2, the corresponding error of MAPE is 5.34%, 6.79%, 8.11%, and 6.97% for the PSI, MA, ES, and ARIMA methods, respectively. It also can be noticed that the four errors are within a reasonable range, but the developed SPNN-PSI algorithm is more suitable for the prediction of the EME dataset used in this paper due to lower MAPE, RMSE, and MAE, as compared to the other three prediction methods. In addition, the correlation coefficient of its historical reference data is shown in Table 1. The example shows that the proposed SPNN-PSI algorithm achieves satisfactory accuracy in time series prediction on the EME dataset whose historical reference data has a relatively higher correlation coefficient value of 0.9388.

Periodic Recognition and Prediction on Volkswagen Sales (VS) Dataset
For the VS dataset, the Volkswagen sales in China between January 2007 and June 2019 is used as another example [24] to show the correlation coefficient detection and prediction results. Figure 5 shows the prediction results in 2018 by using the PSI, MA, ES, and ARIMA methods. As shown in Figure 5, between January 2018 and May 2018, the predicted VS using the PSI method can be almost identical with actual VS, comparing to the MA, ES, and ARIMA methods, and a lower MAPE of 4.48% is also achieved by Equation (18). Meanwhile, it can be seen that, the trend of the predicted values Appl. Sci. 2019, 9, 4386 8 of 12 between June 2018 and December 2018 is similar to that of the actual values when using PSI method, but the difference between them at each time point is larger than that of using the MA, ES, and ARIMA methods, achieving a value of MAPE at 19.36% for the period between June 2018 and December 2018.  To clearly show the correlation coefficient value and the prediction results, we show correlation coefficient detection results from January 2016 to December 2017 (shown in Table 1), and Table 3 presents the further prediction errors of each model. Compared with the EME dataset, the time series dataset of VS has a lower correlation coefficient value (=0.8392) of reference historical dataset. However, a higher MAPE value of 9.94% for the VS dataset in 2018 can be achieved in Table 3. Therefore, the example shows that the prediction accuracy and quality of the proposed SPNN-PSI algorithm can be decreased due to the decreasing correlation coefficient value of reference historical dataset (shown in Table 1), as compared to that of the EME dataset.  Table 3 are all lower than that by MA, ES, and ARIMA methods. Thus, the forecasting quality of SPNN-PSI method is better, and the developed SPNN-PSI algorithm is still suitable for the prediction of VS dataset used in this paper.

Periodic Recognition and Prediction on Cumulative Electricity Output (CEO) Dataset
For the CEO dataset, a time series subset of cumulative electricity output in China [25] is used as an example to show the correlation coefficient detection and prediction results. For the large-scale reference historical time series dataset from January 2016 to December 2017, there is an obvious similarity periodicity in the CEO dataset with a length of 12 months (one year), and shows an upward trend from January to December. Based on the detected periodic model from January 2016 to December 2017, we predict the CEO for the next periodic (the whole year of 2018), as shown on the right side of Figure 6. We can see that the predicted values are very close to the actual values when using PSI method, while there are large deviations when using other methods, especially in January 2018. To clearly show the correlation coefficient value and the prediction results, we show correlation coefficient detection results from January 2016 to December 2017 (shown in Table 1), and Table 3 presents the further prediction errors of each model. Compared with the EME dataset, the time series dataset of VS has a lower correlation coefficient value (=0.8392) of reference historical dataset. However, a higher MAPE value of 9.94% for the VS dataset in 2018 can be achieved in Table 3. Therefore, the example shows that the prediction accuracy and quality of the proposed SPNN-PSI algorithm can be decreased due to the decreasing correlation coefficient value of reference historical dataset (shown in Table 1), as compared to that of the EME dataset. To sum up, the MAPE, RMSE, and MAE values by SPNN-PSI method in Table 3 are all lower than that by MA, ES, and ARIMA methods. Thus, the forecasting quality of SPNN-PSI method is better, and the developed SPNN-PSI algorithm is still suitable for the prediction of VS dataset used in this paper.

Periodic Recognition and Prediction on Cumulative Electricity Output (CEO) Dataset
For the CEO dataset, a time series subset of cumulative electricity output in China [25] is used as an example to show the correlation coefficient detection and prediction results. For the large-scale reference historical time series dataset from January 2016 to December 2017, there is an obvious similarity periodicity in the CEO dataset with a length of 12 months (one year), and shows an upward trend from January to December. Based on the detected periodic model from January 2016 to December 2017, we predict the CEO for the next periodic (the whole year of 2018), as shown on the right side of Figure 6. We can see that the predicted values are very close to the actual values when using PSI method, while there are large deviations when using other methods, especially in January 2018. Then, we evaluate the forecasting accuracy of the SPNN-PSI method by comparing the actual values and the predicted values, shown in Table 4. Because the correlation coefficient for reference historical dataset achieves a very high value (equal to 1.0, shown in Table 1), the MAPE, RMSE, and MAE by SPNN-PSI method are lower and equal 5.23%, 2912.23 × 10 8 , and 2374.38 KWH, which indicates a smaller difference between the prediction value and actual value. By contrast, the higher MAPE, RMSE, and MAE by other forecasting methods are also given in Table 4. Their MAPE values are all more than 50%, which indicates a relatively large fluctuation of prediction error. In a sum, based on the above error indicators, the proposed SPNN-PSI algorithm achieves very high accuracy and quality in time series prediction on the CEO dataset with its very high correlation coefficient value of 1.0.

Accuracy Analysis of SPNN-PSI Algorithm
After receiving the forecasting results of the above three groups of time series datasets, the forecasting accuracy (FA) was also calculated by using Equation (29). If the FA is close to 100% and the MAPE is close to 0, the model is considered to have excellent forecasting accuracy. Then, the FA of the VS, EME, and CEO datasets (in Table 1) were further analyzed and shown in Table 5. It can be seen that the FA can increase when the correlation coefficient (r) of the reference historical data increases. Thus, we think that there may be a positive correlation between r and FA. Then, we evaluate the forecasting accuracy of the SPNN-PSI method by comparing the actual values and the predicted values, shown in Table 4. Because the correlation coefficient for reference historical dataset achieves a very high value (equal to 1.0, shown in Table 1), the MAPE, RMSE, and MAE by SPNN-PSI method are lower and equal 5.23%, 2912.23 × 10 8 , and 2374.38 KWH, which indicates a smaller difference between the prediction value and actual value. By contrast, the higher MAPE, RMSE, and MAE by other forecasting methods are also given in Table 4. Their MAPE values are all more than 50%, which indicates a relatively large fluctuation of prediction error. In a sum, based on the above error indicators, the proposed SPNN-PSI algorithm achieves very high accuracy and quality in time series prediction on the CEO dataset with its very high correlation coefficient value of 1.0.

Accuracy Analysis of SPNN-PSI Algorithm
After receiving the forecasting results of the above three groups of time series datasets, the forecasting accuracy (FA) was also calculated by using Equation (29). If the FA is close to 100% and the MAPE is close to 0, the model is considered to have excellent forecasting accuracy. Then, the FA of the VS, EME, and CEO datasets (in Table 1) were further analyzed and shown in Table 5. It can be seen that the FA can increase when the correlation coefficient (r) of the reference historical data increases. Thus, we think that there may be a positive correlation between r and FA.
To further illustrate this correlation between the two, it was applied to the first case of the monthly export volume of specifically chosen 31 different kinds of products from January 1995 to December 2018 [23]. The scatter diagram of FA vs. r is shown in Figure 7a. It can be seen in Figure 7a that, dense point clouds are located in one area surrounded by red lines, while some individual points scatter in the other areas. It is evident that, when using SPNN-PSI method in this paper, the higher value of r (r > 0.4) can help to improve occurrence probability of higher value of FA (FA > 70%), and produce more accurate forecasts for the datasets of monthly export volume in the first case. Then, it is further examined empirically by using another case of big data, referring to vibration signal in hydraulic test rig with a sampling frequency of 1 Hz during 132,300 s [26]. This test rig cyclically repeats constant load cycles (duration 60 s) and measures process values. Figure 7b gives the scatter diagram of FA vs. r. It can be seen that Figure 7a,b shows a similar characteristic of scatter distribution, but whether r is positive or not in Figure 7b, the proposed PSI model can predict high values of FA (between 92% and 100%) and produces accurate forecasts of vibration. If we consider r is one dominant factor for forecasting accuracy, in the condition of r > 0.4, higher values of FA > 97% are easier to be applied and predicted using the proposed PSI algorithm. To further illustrate this correlation between the two, it was applied to the first case of the monthly export volume of specifically chosen 31 different kinds of products from January 1995 to December 2018 [23]. The scatter diagram of FA vs. r is shown in Figure 7a. It can be seen in Figure 7a that, dense point clouds are located in one area surrounded by red lines, while some individual points scatter in the other areas. It is evident that, when using SPNN-PSI method in this paper, the higher value of r (r > 0.4) can help to improve occurrence probability of higher value of FA (FA > 70%), and produce more accurate forecasts for the datasets of monthly export volume in the first case. Then, it is further examined empirically by using another case of big data, referring to vibration signal in hydraulic test rig with a sampling frequency of 1 Hz during 132,300 s [26]. This test rig cyclically repeats constant load cycles (duration 60 s) and measures process values. Figure 7b gives the scatter diagram of FA vs. r. It can be seen that Figure 7a,b shows a similar characteristic of scatter distribution, but whether r is positive or not in Figure 7b, the proposed PSI model can predict high values of FA (between 92% and 100%) and produces accurate forecasts of vibration. If we consider r is one dominant factor for forecasting accuracy, in the condition of r > 0.4, higher values of FA > 97% are easier to be applied and predicted using the proposed PSI algorithm.

Conclusions
• The SPNN-PSI method with four indexes-the period index, sequential index, small period index, and super sequential index, by finding index-values implicitly carrying usable structure information, combined with a neural network, is initially proposed for predict the time series datasets.

•
In contrast to the MA, ES, and ARIMA methods, the proposed SPNN-PSI method shows satisfactory forecasting quality due to lower MAPE, RMSE, and MAE, and is more suitable for the prediction of time series datasets.

•
There is a trend that the higher the correlation coefficient value of the reference historical datasets, the higher the prediction quality of SPNN-PSI method; a higher value (>0.4) of the correlation coefficient for the SPNN-PSI method can help to improve the occurrence probability of higher forecasting accuracy, and produce more accurate forecasts for the big datasets.

Conclusions
• The SPNN-PSI method with four indexes-the period index, sequential index, small period index, and super sequential index, by finding index-values implicitly carrying usable structure information, combined with a neural network, is initially proposed for predict the time series datasets.

•
In contrast to the MA, ES, and ARIMA methods, the proposed SPNN-PSI method shows satisfactory forecasting quality due to lower MAPE, RMSE, and MAE, and is more suitable for the prediction of time series datasets.

•
There is a trend that the higher the correlation coefficient value of the reference historical datasets, the higher the prediction quality of SPNN-PSI method; a higher value (>0.4) of the correlation coefficient for the SPNN-PSI method can help to improve the occurrence probability of higher forecasting accuracy, and produce more accurate forecasts for the big datasets.