1. Introduction
Time series data are of vital importance across a wide range of essential domains, including economics, finance, environmental science, meteorology, and health care, and its importance cannot be ignored. Time series analysis is also of vital importance as a basic means to deeply analyze the intrinsic evolution of time series data and accurately predict the future trends. Looking back on the history of time series analysis, a key moment with milestone significance should date back to 1927. At that time, Yule, while carefully studying the sunspot data recorded by Wolfer, cleverly introduced the concept of autoregressive models (ARMs) in [
1]. This breakthrough ushered in the era of modern time series analysis and research. Shortly thereafter, Walker [
2] was inspired by the concept of autoregressive models, and conducted a comprehensive analysis of atmospheric pressure data from the Port of Darwin, India in 1931, which not only greatly broadened the application boundaries of autoregressive models and made them more widely applicable, but also innovatively proposed moving average models (MAMs) and autoregressive moving average models (ARMAMs), which combine the characteristics of ARM and MAM. These contributions have laid a firm theoretical foundation for the development of the field of time series analysis, so that it can move forward more stably. In addition, for researchers looking to gain a more profound comprehension of methods and applications of time series analysis, the works of Box and Jenkins [
3] and Shumway and Stoffer [
4] are indispensable resources.
It should be particularly pointed out that traditional time series analysis generally assumes that the non-deterministic phenomena inside the data originate from random fluctuations, and it usually adopts classical statistics based on probability theory to handle time series data. Nevertheless, in real-world application scenarios, the sources of non-deterministic phenomena in time series data are diverse and intricate, and there frequently occurs a situation of frequency instability. In view of this situation, numerous scholars have carried out a substantial number of empirical studies with the aim of exploring whether the non-deterministic phenomena in time series data can be reduced to randomness in real situations. For example, Yang and Liu [
5] utilized partial differential equations to characterize the size of the Chinese population under different age structures, and discovered that if the non-deterministic phenomena in the population system were treated as randomness, then the resulting outcomes would conflict with the relevant assumptions. Another example is the study of Liu et al. [
6], which modeled the order data of an online car-hailing platform in Beijing with the help of the theory of renewal process, and found that if the numbers of passengers were regarded as random variables, then the actual data collected would deviate from the corresponding 99% confidence interval very seriously. Therefore, if the non-deterministic phenomenon in time series data is simply reduced to randomness, it is highly likely to cause bias in the prediction of the constructed time series model and the support it provides for decision making. As a result, the accuracy and credibility of the results are also affected. For the purpose of handling this problem, and more comprehensively describing and effectively dealing with the non-deterministic phenomenon in time series data among practical applications, we are in urgent need of introducing a new axiomatic mathematical system, uncertainty theory, which was established by Liu [
7] and further refined by Liu [
8]. The time series analysis based on this theory is called uncertain time series analysis. Specifically, it is a collection of technical methods to collect, model and interpret time series data, evolving from uncertainty theory as a technical tool. The pioneering research work in this field commenced with the study by Yang and Liu [
9], which defined the uncertain autoregressive model based on uncertainty theory.
Although some scholars have conducted corresponding studies on the least squares estimation of unknown parameters and the least squares estimation of disturbance terms in the uncertain moving average model, the previous studies completely ignored the existence of disturbance terms when estimating unknown parameters, did not comprehensively describe the influence of disturbance terms on the observed data, and separated the estimation of unknown parameters from the estimation of disturbance terms. In this paper, a symmetric statistical invariant containing unknown parameters of the uncertain moving average model is constructed by deforming the uncertain moving average model and combining the uncertainty distribution of the disturbance terms. Then, the statistical inference problem of the uncertain moving average model is transformed into the parameter estimation problem of the symmetric statistical invariant, and the corresponding estimators are determined by the least square principle. Specifically, the main contributions of this paper are as follows:
- •
Based on the uncertain moving average model after deformation and the properties of uncertain disturbance terms, a symmetric statistical invariant containing unknown parameters of the uncertain moving average model is constructed.
- •
The corresponding estimators are determined by combining the least squares principle, and a numerical algorithm is designed to solve the corresponding estimators.
- •
A numerical example is provided to illustrate the corresponding theoretical results.
2. Literature Review of Uncertain Time Series Analysis
Since the concept of uncertain time series analysis was put forward, a large number of scholars have carried out in-depth research on it, and uncertain time series analysis has gradually developed into a relatively large-scale and mature theoretical branch within the field of uncertain statistics.
In the research of uncertain time series analysis, the primary and core content is the exploration of various uncertain time series models. The earliest uncertain time series model is the uncertain autoregressive model proposed by Yang and Liu [
9]. In their research work, they made the assumption that the specific values of the time series will be influenced by both the data observed in the past and the disturbance term at the current moment. Moreover, they defined the disturbance term as an uncertain variable. This pioneering research achievement has established an important theoretical foundation for subsequent research on uncertain time series analysis. Following that, Lu et al. [
10] pointed out that the future values of time series data may not only be affected by historical observed data, but may even be jointly influenced by past observed data and historical disturbance terms. In order to deal with this special phenomenon, they creatively proposed the uncertain moving average model and the uncertain autoregressive moving average model, enabling more accurate modeling and analysis of the corresponding time series data. In the meantime, Tang [
11] keenly perceived the complex characteristics of multivariate time series and then proposed the uncertain vector autoregressive model. This model performs more excellently in capturing the complex relationships between variables and can more accurately reflect the actual situation. On the basic of the above results, Tang [
12] then conducted related research on the time series data evolving from nonlinear relations and proposed the uncertain max autoregression model. The main purpose of this model is to effectively describe and characterize time series systems that are heavily influenced by historical extreme observations. Soon afterwards, in further research work, Tang [
13] proposed the concept of the uncertain threshold autoregression model for modeling and analogizing the time series data evolving from the system with different change rules at different stages. In addition, when the system generating the time series data is completely unknown, that is, it is impossible to determine the specific form of the uncertain time series model, the research results of Zhang and Gao [
14] provide us with a very powerful and effective tool for studying non-parametric time series systems. Overall, this series of rich research achievements has not only greatly enriched and improved the theoretical system framework of uncertain time series analysis, but also provided very solid and reliable technical support and a guarantee for numerous practical application scenarios.
Another important research direction of uncertain time series analysis lies in the statistical inference of uncertain time series models. This aspect plays a vital bridging role in uncertain time series analysis, closely linking theoretical methods with practical applications. To be specific, the statistical inference work for uncertain time series models mainly encompasses two vitally important core tasks. Firstly, it is necessary to accurately estimate the unknown parameters within the model, enabling the model to better fit the time series data. Secondly, it is essential to conduct a reasonable quantitative analysis of the uncertain disturbance term of the model. In terms of estimating unknown parameters, Yang and Liu [
9] first proposed the least squares estimation method, whose core idea is to continuously adjust the parameters to minimize the sum of the squares of the deviations between the predicted values of the model and the actual observed values. However, considering that extreme outliers may exist in actual data and these outliers can significantly affect the results of the least squares estimation, Yang et al. [
15] further proposed the least absolute deviation estimation method. This method effectively improves the accuracy of parameter estimation by minimizing the sum of the absolute values of the deviations. Beyond that, many scholars are also actively exploring other estimation methods. For example, Chen and Yang [
16] successfully applied the idea of uncertain maximum likelihood to the estimation of unknown parameters in uncertain autoregressive models. Other parameter estimation methods can be found in lasso estimation (Zhang et al. [
17]), ridge estimation (Chen and Yang [
18]), Huber estimation (Liu [
19]), moment estimation (Liu and Qin [
20]), and so on. These different estimation methods all play important roles in the field of parameter estimation of uncertain time series models and demonstrate their respective values. In the field of estimating the uncertain disturbance terms, the pioneering research conducted by Yang and Liu [
9] laid a solid foundation for the subsequent moment estimation method. Later, Chen and Yang [
16] introduced the uncertain maximum likelihood estimation method and further promoted the development of the research of this field. However, due to the presence of outliers in actual data, which may lead to biases in the estimation results, Liu and Liu [
21] studied an improved uncertain maximum likelihood estimation method. This new method has a stronger adaptability to potential data biases. In addition, Liu and Liu [
22] also studied a least squares estimation method. This method estimates the uncertain disturbance terms by minimizing the sum of the deviations between the empirical distribution of the observed values and the population distribution.
3. Deformation of Uncertain Moving Average Model
In the field of uncertain time series analysis, a common assumption is that a series of observations indexed in time are available, called time series data, as follows:
In order to model the time series data in the scenario where current data are more likely to be affected by previous disturbance items, Yang and Ni [
23] introduced the uncertain moving average model to smooth such time series data, and then explored the long-term trend and periodic changes of time series data. Based on the assumption of a linear relationship between observations and uncertain disturbance terms, the model successfully removes short-term fluctuations and noise components in the data, and has practical applications in many fields such as market forecasting and climate change research.
Consider an uncertain moving average model of order
p which is denoted as
Here,
,
, ⋯,
are called the uncertain disturbance terms, and are assumed to be independent and follow the same uncertainty distribution, uncertain normal distribution
, while
,
, ⋯,
are a set of unknown parameters, which reflect the weights given to each uncertain disturbance term. With the help of the backshift operator
B (satisfying
), the
p order uncertain moving average model (
2) can be reformulated as
where
In order to calculate the corresponding disturbance term based on time series data, Xin et al. [
24] proved that Formula (
3) is reversible, and can be written as
where
Thus, we have
Since the minimum index of time series data (
1) starts from 1, we set
if
. Then, we have
Based on the above analysis, we can rewrite the
p order uncertain moving average model (
2) as
for
.
4. Parameter Estimation of Uncertain Moving Average Model
In this section, we will discuss the parameter estimation of uncertain moving average model based on time series data and least squares principle.
For a given set of time series data (
1), it follows from the deformation expression (
4) and the normality assumption of the uncertain disturbance terms that
By defining the following
n real functions with unknown parameters
,
, ⋯,
and
as
it is not difficult to infer that when the unknown parameters
,
, ⋯,
and
in the
p order uncertain moving average model (
2) can take on true values, these
n real functions
can be regarded as
n samples of the standard normal uncertainty distribution
. That is, we have
Here, Equation (
6) is the statistical invariant that we want to construct, and notice that its uncertainty distribution, standard normal uncertainty distribution
, is symmetric about the origin, so it is also a symmetric statistical invariant.
Based on the above analysis, we know that the
n real functions
defined above can be regarded as a set of samples of standard normal uncertainty distribution
when the unknown parameters
,
, ⋯,
and
take true values. Therefore, in order to estimate the values of the unknown parameters, we should find the set of parameters that makes the empirical distribution function of
most close to the uncertainty distribution of
. It should be noted that the empirical distribution corresponding to
is given by
Here,
is the indicator function which takes the value of 1 when
and takes the value of 0 when
. Moreover, the expression of the uncertainty distribution of
is
According to the idea of the least squares principle, we can choose the parameter values that minimizes the sum of squares of the deviation between the empirical distribution
and the uncertainty distribution
as the estimated parameters, that is, the least squares estimation of
p order uncertain moving average model (
2) should solve the following minimization problem:
It is worth noting that the core challenge in solving the above optimization problem is the nonlinear connection between the decision variables and the objective function, because the decision variables are not only contained in the normal uncertainty distribution function, but also in the indicator function. This form makes the process of directly solving the optimization problem very difficult. To address this issue, we carefully designed and effectively implemented an algorithmic approach (Algorithm 1) whose key goal is to effectively process and approximate the least squares solution of the optimization problem, thereby achieving efficient exploration and solution of the problem.
Algorithm 1 Numerical algorithms for least squares estimation |
Step 0: Input time series . Step 1: Determine the feasible regions of unknown parameters vector . Step 2: For each and , compute , ,⋯, by and , , ⋯, by Step 3: Set and . and . Step 5: If , then go to Step 4. Step 6: Find and such that reaches its minimum value. Step 7: Output and . |
Remark 1. The feasible regions in the above algorithm refer to a set of all possible values of unknown parameters vector . In addition, the function defined above is essentially a process variable used to calculate the objective function in the optimization problem (8). In particular, when , the obtained is just the objective function in the optimization problem (8). 6. Numerical Example
In this section, we will provide a numerical example to illustrate the proposed parameter estimation methods and the model testing and forecast methods.
Suppose we have a set of time series data that contains 15 observations in total, as shown in
Table 1. Next, we will describe the time series data in
Table 1 based on the 1 order uncertain moving average model, and carry out corresponding research on parameter estimation, model testing and forecast. Specifically, we consider the following 1 order uncertain moving average model,
In this model,
and
are unknown parameters,
,
are a set of uncertain disturbance terms and are assumed to obey the normal uncertainty distribution
.
For the 1 order uncertain moving average model (
14), it is easy to infer that
Thus, it follows from the constructed statistical invariants (
6) that we can obtain 15 real functions
as
by substituting the time series data from
Table 1. Then, according to minimization problem (
8), the least squares estimations of
,
, and
should solve the following minimization problem:
By using Matlab with version of ‘9.13.0.2049777 (R2022b)’to solve the above optimization problem, we obtain
Thus, the estimated 1 order uncertain moving average model is obtained as
where
,
are a set of uncertain disturbance terms and are assumed to obey the normal uncertainty distribution
.
Next, we will combine the time series data in
Table 1 to test the estimated 1 order uncertain moving average model (
15) to evaluate whether it is suitable. By substituting the 15 time series data and the estimated parameters
,
, and
into (
5) and then using
we can obtain data of 15 residuals
, as shown in
Table 2. Suppose we consider the suitability test of the estimated 1 order uncertain moving average model (
15) with a significance level of
. Then, it follows from
and Equation (
10) that the test is
As shown in
Table 2, we can verify that all the residuals do not exceed the interval
. Therefore, we have
That is, according to the relevant results of the previous model test, we can reach the conclusion that the estimated 1 order uncertain moving average model (
15) has a good fit to the time series data in
Table 1 and is reasonable.
Finally, based on the estimated 1 order uncertain moving average model (
15), we make a forecast of the time series data at the next moment, i.e., forecast
. On the basis of Equation (
11), the forecast uncertain variable of
can be calculated by
Then, it follows from (
12) and (
13) that the forecast value of
is
and the
confidence interval is
By putting time series data, forecast value and forecast interval in the same graph (please see
Figure 1), it can be seen that point forecast and interval forecast based on the estimated 1 order uncertain moving average model (
15) can effectively inherit the trend of existing data and produce an effective forecast.
Furthermore, given that only moment estimation (proposed by Liu and Liu [
26]) has been studied among the parameter estimation methods of uncertain moving average models based on uncertainty theory, here we apply the moment estimation method based on the same dataset shown in
Table 1 to compare with the method proposed in this paper. It follows from the work of Liu and Liu [
26] that the moment estimations of
,
, and
should solve the following system of equations:
By using Matlab to solve the above system of equations, we obtain
Thus, the estimated 1 order uncertain moving average model by moment estimation method is obtained as
where
,
are a set of uncertain disturbance terms and are assumed to obey the normal uncertainty distribution
. It can be seen from the estimated models (
17) and (
15) that the standard deviation of the residuals estimated based on the moment estimation method is 4.0660, which is much larger than the standard deviation of the residuals estimated based on the least squares method, which is 1.8473. This indicates that the residuals estimated based on the moment estimation method still contain a large amount of uncertainties, and its prediction accuracy is much worse than the method proposed in this paper. On the other hand, we can also calculate the mean absolute error (MAE) and mean square error (MSE) of these two estimated models based on the dataset shown in
Table 1, as shown in
Table 3, respectively. It can also be seen from
Table 3 that the prediction error of the least squares method proposed in this paper is much smaller than that of the moment estimation method, which fully demonstrates the effectiveness of the method proposed in this paper.
7. Real Data Example
In this section, we will provide a real data example to illustrate the proposed parameter estimation methods and the model testing and forecast methods, in which the dataset is global daily carbon dioxide emission per day during 1 July 2023 to 31 July 2023 and was collected by Liu and Liu [
26], as shown in
Table 4. In the work of Liu and Liu [
26], the 3 order uncertain moving average model was obtained as
where
,
are a set of uncertain disturbance terms and are assumed to obey the normal uncertainty distribution
.
Next, we will describe the above time series data in
Table 4 based on the 3 order uncertain moving average model and the least squares estimation method, and carry out corresponding research on parameter estimation, model testing and forecast. For the 3 order uncertain moving average model, it is easy to infer that
Thus, it follows from the constructed statistical invariants (
6) that we can obtain 31 real functions
as
by substituting the time series data from
Table 4. Then, according to minimization problem (
8), the least squares estimations of
,
,
,
, and
should solve the following minimization problem:
By using Matlab to solve the above optimization problem, we obtain
Thus, the estimated 3 order uncertain moving average model is obtained as
where
,
are a set of uncertain disturbance terms and are assumed to obey the normal uncertainty distribution
. It can be seen from the estimated models (
18) and (
19) that the standard deviation of the residuals estimated based on the moment estimation method is 3.494, which is much larger than the standard deviation of the residuals estimated based on the least squares method, which is 3.0614. This indicates that the residuals estimated based on the moment estimation method still contain a large amount of uncertainties, and its prediction accuracy is much worse than the method proposed in this paper.
Next, we will combine the time series data in
Table 4 to test the estimated 3 order uncertain moving average model (
19) to evaluate whether it is suitable. By substituting the 31 time series data and the estimated parameters
,
,
,
, and
into (
5) and then using
we can obtain 31 residuals data
as shown in
Table 5. Suppose we consider the suitability test of the estimated 3 order uncertain moving average model (
19) with a significance level of
. Then, it follows from
and Equation (
10) that the test is
As shown in
Table 5, we can verify that only
. Therefore, we have
That is, according to the relevant results of the previous model test, we can reach the conclusion that the estimated 3 order uncertain moving average model (
19) has a good fit to the time series data in
Table 4 and is reasonable.
Finally, based on the estimated 3 order uncertain moving average model (
19), we make a forecast of the time series data at the next moment, i.e., forecast
. On the basic of Equation (
11), the forecast uncertain variable of
can be calculated by
Then, it follows from (
12) and (
13) that the forecast value of
is
and the
confidence interval is
By putting time series data, forecast value and forecast interval in the same graph (please see
Figure 2), it can be seen that point forecast and interval forecast based on the estimated 3 order uncertain moving average model (
19) can effectively inherit the trend of existing data and produce an effective forecast.