An Integrated Variational Mode Decomposition and ARIMA Model to Forecast Air Temperature

: Temperature forecasting is a crucial part of climate change research. It can provide a valuable reference, as well as practical signiﬁcance, for understanding the macroscopic evolutionary processes of regional temperature and for promoting sustainable development. This study presents a new integrated model, called the Variational Mode Decomposition-Autoregressive Integrated Moving Average (VMD-ARIMA) model, which reduces the required data input and improves the accuracy of predictions, based on the deﬁciencies of data dependence and the complicated mechanisms associated with current temperature forecasting. In this model, the variational mode decomposition (VMD) was used for mining the trend features and detailed features contained in a time series, as well as denoising. Moreover, the corresponding autoregressive integrated moving average (ARIMA) models were derived to reﬂect the di ﬀ erent features of the components. The ﬁnal forecasted values were then obtained using VMD reconstruction. The annual temperature time series from the Wuhan Meteorological Station were investigated using the VMD-ARIMA model, ARIMA model, and Grey Model (1, 1) based on three statistical performance metrics (mean relative error, mean absolute error, and root mean square error). The results indicate that the VMD-ARIMA model can e ﬀ ectively enhance the accuracy of temperature forecasting.


Introduction
Climate change is a global problem in the 21st century. According to the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report, the average land and ocean surface temperatures, calculated linearly, shows a 0.85 • C of warming from 1880 to 2012 [1]. Global warming has a significant impact on water supply, species distribution, glaciers, and marine ecosystems, as well as agriculture and human health [2][3][4]. It is comprised of a multi-factor of interactions and multi-scale overlap, affected by both human activities and natural factors [5]. Solar radiation [6,7], greenhouse gases [8,9], ocean currents [10,11], polar stratospheric clouds [12], and many other factors may have different impacts on climate change. Conversely, climate change yields a greater impact on both human society and the natural environment, and may have an even greater impact in the future. As one of the goals of sustainable development, urgent action should be taken to combat climate change and its impacts. Temperature changes are characterized by obvious regional features and have significant impacts on the society and environment. Regional temperature forecasting not only provides important theoretical values for understanding the macroscopic evolution of temperature, but also provides some implications for promoting sustainable development.
To date, many theoretical methods have been applied in the studies of forecasting temperature. These methods can be divided into two main categories: numerical and statistical. Numerical prediction solves the partial differential equations of atmospheric physical processes. Observed meteorological data are substituted into this equation, and by integrating the initial values for different meteorological The data available from the Wuhan Meteorological Station is relatively comprehensive, and as a result, we were able to obtain the average annual temperature data from 1956 to 2010 to use in this study. During this period, the social and natural environment in Wuhan dramatically changed and is a typical example of urban development in China. In the process of the rapid urbanization of Wuhan, its climate change needs more attention. The highest average annual temperature was 18.55 • C and the lowest value was 15.42 • C, in 2007 and 1969, respectively. The temperatures generally trended upward, and the anomalous variation in annual average temperatures is shown in Figure 1. The average annual temperature data of Wuhan from 1956 to 2005 was used to train the models, and the data from 2006 to 2010 was used to verify the models.
Sustainability 2019, 11, x FOR PEER REVIEW 3 of 11 study. During this period, the social and natural environment in Wuhan dramatically changed and is a typical example of urban development in China. In the process of the rapid urbanization of Wuhan, its climate change needs more attention. The highest average annual temperature was 18.55 °C and the lowest value was 15.42 °C, in 2007 and 1969, respectively. The temperatures generally trended upward, and the anomalous variation in annual average temperatures is shown in Figure 1. The average annual temperature data of Wuhan from 1956 to 2005 was used to train the models, and the data from 2006 to 2010 was used to verify the models.

VMD-ARIMA Model
In this study, we used a new integrated model named the VMD-ARIMA model that combines the VMD and the ARIMA models. Compared with the traditional ARIMA model, the integrated model used VMD as a data preprocessing method to mine features from the original time series on different scales. The low-frequency signal reflects the characteristic trends in the original data, while the high-frequency signal reflects the detailed features. The corresponding ARIMA models were established for the components at different scales. Finally, the forecasted data were obtained using VMD reconstruction of the results from each component model. The VMD-ARIMA model includes three main steps: (1) variational mode decomposing of the original time series data, (2) training the ARIMA models of the residual (trend component) and each intrinsic mode function (IMF) signal to forecast, and (3) reconstructing the forecasted residual and IMF signals to obtain the final forecasted results.
The empirical mode decomposition (EMD) was proposed by Norden E. Huang in 1998 for analyzing nonlinear and non-stationary data. It can decompose a complex data set into finite IMF signals. Since its introduction, EMD has been gradually applied to different fields, including mechanical fault diagnosis, atmospheric analysis, and geology [24][25][26]. Compared to EMD, the VMD can artificially set the number of IMF signals, and it is not prone to modal aliasing.
The goal of VMD is to decompose a real valued input signal f into a discrete number of subsignals (modes), , that have specific sparsity properties while reproducing the input. In terms of the data processing capabilities of the integrated model, it is mainly divided into the construction of the VMD model and its parsing process [27][28][29]. The processes of assessing the bandwidth are as follows: (1) compute the associated analytic signal by using the Hilbert transform in order to obtain a unilateral frequency spectrum, (2) shift the mode's frequency spectrum to "baseband" by mixing with an exponential tuned to the respective estimated center frequency, and (3) estimate the bandwidth using the Gaussian smoothness of the demodulated signal [30].
The resulting constrained variational problem is described using Equations (1) and (2) as follows:

VMD-ARIMA Model
In this study, we used a new integrated model named the VMD-ARIMA model that combines the VMD and the ARIMA models. Compared with the traditional ARIMA model, the integrated model used VMD as a data preprocessing method to mine features from the original time series on different scales. The low-frequency signal reflects the characteristic trends in the original data, while the high-frequency signal reflects the detailed features. The corresponding ARIMA models were established for the components at different scales. Finally, the forecasted data were obtained using VMD reconstruction of the results from each component model. The VMD-ARIMA model includes three main steps: (1) variational mode decomposing of the original time series data, (2) training the ARIMA models of the residual (trend component) and each intrinsic mode function (IMF) signal to forecast, and (3) reconstructing the forecasted residual and IMF signals to obtain the final forecasted results.
The empirical mode decomposition (EMD) was proposed by Norden E. Huang in 1998 for analyzing nonlinear and non-stationary data. It can decompose a complex data set into finite IMF signals. Since its introduction, EMD has been gradually applied to different fields, including mechanical fault diagnosis, atmospheric analysis, and geology [24][25][26]. Compared to EMD, the VMD can artificially set the number of IMF signals, and it is not prone to modal aliasing.
The goal of VMD is to decompose a real valued input signal f into a discrete number of sub-signals (modes), x k , that have specific sparsity properties while reproducing the input. In terms of the data processing capabilities of the integrated model, it is mainly divided into the construction of the VMD model and its parsing process [27][28][29]. The processes of assessing the bandwidth are as follows: (1) compute the associated analytic signal by using the Hilbert transform in order to obtain a unilateral frequency spectrum, (2) shift the mode's frequency spectrum to "baseband" by mixing with an exponential tuned to the respective estimated center frequency, and (3) estimate the bandwidth using the Gaussian smoothness of the demodulated signal [30]. The resulting constrained variational problem is described using Equations (1) and (2) as follows: where {x nk } := {x n1 , . . . , x nk } is the set of k components obtained by decomposing the n th variable; {ω nk } := {ω n1 , . . . , ω nk } is the set of center frequencies corresponding to the components obtained by decomposing the n th variable; k x nk represents the sum of all the components; δ(t) is the pulse function; j is the imaginary unit; * represents convolution; f represents the original data; t is the number of analysis signals; and e − jωk t represents the exponential harmonic term of the analytical signal obtained by the Hilbert transformation.
Subsequently, the Lagrange multiplier, λ n (t), and the quadratic penalty term, α, are introduced to transform the constrained variational problem into a unconstrained variational problem, thus guaranteeing the fidelity and reconstruction accuracy of the reconstructed signal. The alternating direction multiplier method is then used to update the unknown value, x m+1 nk , using iterations. The iterations are not stopped until the accuracy, E, satisfies E < ε. The decomposition combination, {x n1 , x n2 , . . . , x nk }, of the input data is obtained after this process.
The components obtained after the VMD are normally called IMF signals, which can also be regarded as a set of different time series. The IMF signals weaken the data non-stationarity compared to the original time series. The information contained in each component covers a different part of the original series. There are considerable dependencies or correlations in a time series, and although there is some randomness for various reasons, time series analysis and forecasting can be accomplished based on correlations.
The ARIMA method proposed by Box and Jenkins in 1976 [31] for time series analysis and forecasting has been widely used in hydrology, meteorology, energy consumption, and other fields [32][33][34][35][36]. In general, a time series model includes a deterministic trend and a random residual for the trend, where the residual is assumed to represent natural variability [37]. The ARIMA (p, d, q) performs a d-order differential to the original series data (it is often a non-stationary series) to make it stationary, and then the autoregressive (AR) model is used to fit the deterministic trend, which includes the multivariate linear correlation between the value of the series at time t, and the previous p values of the series. The moving average (MA) model is used to fit the random residual by calculating the correlation between the values of the series at the time, t, and the previous q values of the white noise. The algorithm of ARIMA is as follows: For a time series, {X t , t = 0, ±1, ±2, . . .}, with a mean, E(X t ) = µ, the series can be expressed as: where ε t is a stationary white noise with a mean value of zero, σ 2 ε is variance, ϕ i is the AR coefficient, and θ j is the MA coefficient. First, in a unit root test (Agumented Dickey-Fuller test), if where S is standard deviation, the series is described as a stationary series; otherwise, it is an unstable series, and a d-order difference (d = 1, 2, 3, ...) is required until it becomes a stationary series. Second, by calculating the autocorrelation function (ACF) and the partial autocorrelation function (PACF), the type of model is determined according to the tailing and truncation of ACF and PACF. The time series is suitable for ARMA models if the two functions are tailing. Subsequently, the optimal order of the model is fixed according to the Akaike information criterion (AIC). The formula for the AIC is as follows: where p and q are the optimal orders of the ARIMA model with minimum AIC values. After determining the order, the least squares method is used to estimate the model parameters σ 2 ε , ϕ i , and θ j . Finally, the determinacy and randomness of the original time series are expressed by a linear model of the AR and MA parts. The final forecast series of the original data could be obtained using the VMD reconstruction of the forecast residual and IMF signals using the ARIMA model.

Metrics for Comparison
The mean relative error (MRE), mean absolute error (MAE), and root mean square error (RMSE) were calculated to assess the performance of the model as follows (Equations (6)-(8)): where T f (i) is the forecast value of the sample i, and T o (i) is the observation of the sample i.

Results and Discussion
The optimal number of decomposition layers is determined by the mean value of the component instantaneous frequency. If the number of decomposition layers is too small, the information contained in the original data under different scales cannot be completely reflected. On the contrary, if the number of decomposition layers is too large, the components will be absolutely discrete, especially in the high-frequency region. This results in a lower average instantaneous frequency, even at high frequencies, and a sharp drop in the instantaneous frequency curve. By comparing the instantaneous frequency curves of different decomposition layers, this study chose the three-layer VMD. The components are shown in Figure 2. The residual shows that the original temperature data had an initial downward trend, but it gradually trended upwards thereafter. The IMF1 and IMF2 reflected the detailed characteristics in different scales of the fluctuations of the original temperature data. The negative values indicated a decrease from the previous year, and the positive values implied an increase.
Based on the tailing characteristics of the ACF and PACF, the ARIMA models were established for each component, and the model structure and parameters are shown in Table 1. The prediction results are shown in Figure 3. The residual is a non-stationary series, and thus, the ARIMA (4, 1, 6) model was established after the one order difference. The AR part had four lag orders, and the adjusted R 2 was 0.975. The IMF1 was a stationary series, and the ARIMA (4, 0, 6) model was established according to the AIC criterion. The AR part also had four lag orders, and the adjusted R 2 was 0.932. Further, the IMF2 was a stationary series, and the ARIMA (3, 0, 2) model was established according to the AIC criterion. The AR part had three lag orders, and the adjusted R 2 was 0.959. The adjusted R 2 values of all components were higher than 0.900, which indicated the models fit the sample data well. The maximum lag order of the AR part was four, indicating that an obvious correlation existed between the average annual temperature value in one year and the average annual temperature values in the previous four years.   Overall, the training accuracy was 99.575% and the forecasting accuracy was 97.349%, illustrating the fact that the model is appropriate for fitting and forecasting the average annual temperature. For the performance of the integrated model in different temperature ranges, the model had higher accuracy in years when the temperature changes were moderate. For years with higher temperature fluctuations (such as in 2010), the extent of the fluctuations depicted by the model was much lower than the observed values. Two reasons can be attributed to this phenomenon. First, it could be due to the inherent limitation of the time series model, in which the deterministic trend is expressed by the values of the previous years; therefore, the greater the difference between the current value and the values in the previous years, the greater the difference between the deterministic trend of the model and the observed trend. Moreover, the average annual temperature presented a long-term cycle and a short-term cycle. The       18.024 • C, and 17.739 • C, respectively. Overall, the training accuracy was 99.575% and the forecasting accuracy was 97.349%, illustrating the fact that the model is appropriate for fitting and forecasting the average annual temperature. For the performance of the integrated model in different temperature ranges, the model had higher accuracy in years when the temperature changes were moderate. For years with higher temperature fluctuations (such as in 2010), the extent of the fluctuations depicted by the model was much lower than the observed values. Two reasons can be attributed to this phenomenon. First, it could be due to the inherent limitation of the time series model, in which the deterministic trend is expressed by the values of the previous years; therefore, the greater the difference between the current value and the values in the previous years, the greater the difference between the deterministic trend of the model and the observed trend. Moreover, the average annual temperature presented a long-term cycle and a short-term cycle. The correlation between temperatures of different years in the same cycle was weaker than that between temperatures of different years in different cycles. Compared with the transition period of two temperature cycles, the VMD-ARIMA model was, therefore, more suitable for forecasting the average annual temperature in a temperature cycle. correlation between temperatures of different years in the same cycle was weaker than that between temperatures of different years in different cycles. Compared with the transition period of two temperature cycles, the VMD-ARIMA model was, therefore, more suitable for forecasting the average annual temperature in a temperature cycle. A data sequence without the influencing factors is usually considered as a time series or a grey system. Grey systems theory is a method for studying problems of uncertainty with few data points and poor information. GM (1, 1) is the most commonly used grey model that requires less data; therefore, in order to test the effectiveness of the integrated model, the commonly used GM (1, 1) and the traditional ARIMA model were used on the same data, in which the training samples and the forecasting samples were consistent with the VMD-ARIMA model. The results of the different models are shown in Figure 4. For comparison, the errors of the three models are shown in Table 2, and the residual distributions are shown in Figure 5.
As the ADF value is higher than the t-value with a 5% test critical value, the original data is a non-stationary time series data and the results showed that the three models reflected its overall trend. Nonetheless, for the details of the interannual fluctuations, as can be seen from the residual distribution, the performance of three models were quite different. The residual errors of GM (1, 1) and ARIMA models were higher, and their distributions were more dispersed, while the residual distribution of the VMD-ARIMA model was closer to the zero line. At the same time, the residuals of the GM (1, 1) and ARIMA models in the low and high-value ranges were much larger than the medium range, while the VMD-ARIMA model achieved a better performance in both the middle and low-value ranges. Despite the increase in the error of the VMD-ARIMA model in the high-value range, the error was less than that of the GM (1, 1) and ARIMA models. For the overall training effect, the RMSE of the VMD-ARIMA model was 96.9% lower than that of the GM (1, 1), and 94.9% lower than that of the ARIMA model, indicating that the training effect of the integrated model was much better than the other two models. For the overall forecasting accuracy, the RMSE of the VMD-ARIMA model was 35.5% lower than that of the GM (1, 1) model, and 23.8% lower than that of the ARIMA model. Therefore, the VMD-ARIMA model presented a great improvement in both training and forecasting.  A data sequence without the influencing factors is usually considered as a time series or a grey system. Grey systems theory is a method for studying problems of uncertainty with few data points and poor information. GM (1, 1) is the most commonly used grey model that requires less data; therefore, in order to test the effectiveness of the integrated model, the commonly used GM (1, 1) and the traditional ARIMA model were used on the same data, in which the training samples and the forecasting samples were consistent with the VMD-ARIMA model. The results of the different models are shown in Figure 4. For comparison, the errors of the three models are shown in Table 2, and the residual distributions are shown in Figure 5. As the ADF value is higher than the t-value with a 5% test critical value, the original data is a non-stationary time series data and the results showed that the three models reflected its overall trend. Nonetheless, for the details of the interannual fluctuations, as can be seen from the residual distribution, the performance of three models were quite different. The residual errors of GM (1, 1) and ARIMA models were higher, and their distributions were more dispersed, while the residual distribution of the VMD-ARIMA model was closer to the zero line. At the same time, the residuals of the GM (1, 1) and ARIMA models in the low and high-value ranges were much larger than the medium range, while the VMD-ARIMA model achieved a better performance in both the middle and low-value ranges. Despite the increase in the error of the VMD-ARIMA model in the high-value range, the error was less than that of the GM (1, 1) and ARIMA models. For the overall training effect, the RMSE of the VMD-ARIMA model was 96.9% lower than that of the GM (1, 1), and 94.9% lower than that of the ARIMA model, indicating that the training effect of the integrated model was much better than the other two models. For the overall forecasting accuracy, the RMSE of the VMD-ARIMA model was 35.5% lower than that of the GM (1, 1) model, and 23.8% lower than that of the ARIMA model. Therefore, the VMD-ARIMA model presented a great improvement in both training and forecasting. . As a result, the trend components of VMD might be similar. The detailed components and optimal numbers of decomposition layers are different. The modeling process is, however, universal and can be applied to different areas.

Conclusions
Improving the accuracy of temperature forecasting is an important yet difficult task in current climate change research. In this study, an attempt was made to investigate the performance of a new integrated model, named the VMD-ARIMA, for forecasting annual temperature time series. The VMD-ARIMA model was derived using the average annual temperature data from Wuhan from 1956 to 2005 as a training sample, and the data from 2006 to 2010 as a verifying sample. Three standard statistical performance evaluation measures (MRE, MAE, and RMSE) were adopted to evaluate the performances of the different models. The results indicate that, compared with GM (1, 1) and ARIMA models, the proposed VMD-ARIMA model can better reveal the characteristics of the observations and effectively improve forecasting accuracy. There are several advantages to this integrated model. First, ARIMA forecasting only requires data from the time series in question. Second, the IMF components are more suitable for ARIMA modeling than observations because of their zero mean. Besides, the integrated model can also be applied to other fields with complicated influential factors and obvious time series characteristics, such as meteorology, hydrology, and social economics.
In future studies, we will consider using wavelet analysis to diagnose time series or introduce the weakening operator to correct samples that deviate from the mean. These modifications will help improve the applicability of the model, while also achieving a higher accuracy. In addition, we will try to explore the quantitative response between regional sustainability indicators and temperature changes. Three main reasons can explain the accuracy improvements: (1) the denoising of VMD, (2) the zero-mean property of the IMF, and (3) the accurate short-term memory of the time series model. The likely cause of the uncertainties in the VMD-ARIMA model was the data preprocessing. During this process, two steps might help contribute to reducing the uncertainties: (1) determining the optimal number of IMF for different data, and (2) smoothing the non-stationary IMF to ensure the stability of each component model. According to the IPCC report [1] and other studies [38,39], global warming on decadal time scales is continuing. Although the time series of regional temperature are different, most of them show similar upward trends with fluctuations [40]. As a result, the trend components of VMD might be similar. The detailed components and optimal numbers of decomposition layers are different. The modeling process is, however, universal and can be applied to different areas.

Conclusions
Improving the accuracy of temperature forecasting is an important yet difficult task in current climate change research. In this study, an attempt was made to investigate the performance of a new integrated model, named the VMD-ARIMA, for forecasting annual temperature time series. The VMD-ARIMA model was derived using the average annual temperature data from Wuhan from 1956 to 2005 as a training sample, and the data from 2006 to 2010 as a verifying sample. Three standard statistical performance evaluation measures (MRE, MAE, and RMSE) were adopted to evaluate the performances of the different models. The results indicate that, compared with GM (1, 1) and ARIMA models, the proposed VMD-ARIMA model can better reveal the characteristics of the observations and effectively improve forecasting accuracy. There are several advantages to this integrated model. First, ARIMA forecasting only requires data from the time series in question. Second, the IMF components are more suitable for ARIMA modeling than observations because of their zero mean. Besides, the integrated model can also be applied to other fields with complicated influential factors and obvious time series characteristics, such as meteorology, hydrology, and social economics.
In future studies, we will consider using wavelet analysis to diagnose time series or introduce the weakening operator to correct samples that deviate from the mean. These modifications will help improve the applicability of the model, while also achieving a higher accuracy. In addition, we will try to explore the quantitative response between regional sustainability indicators and temperature changes.