Next Article in Journal
CNN-Based Automatic Mobile Reporting System and Quantification for the Concrete Crack Size of the Precast Members of OSC Construction
Previous Article in Journal
Pullout Behavior of Titanium Alloy Reinforcing Bars in Ultra-High Performance Concrete
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers †

by
Fernanda Catarina Pereira
1,*,
Arminda Manuela Gonçalves
2,‡ and
Marco Costa
3,‡
1
Centre of Mathematics, University of Minho, 4710-057 Braga, Portugal
2
Department of Mathematics and Centre of Mathematics, University of Minho, 4710-057 Braga, Portugal
3
Centre for Research and Development in Mathematics and Applications, Águeda School of Technology and Management, University of Aveiro, 3810-193 Aveiro, Portugal
*
Author to whom correspondence should be addressed.
Presented at the 9th International Conference on Time Series and Forecasting, Gran Canaria, Spain, 12–14 July 2023.
These authors contributed equally to this work.
Eng. Proc. 2023, 39(1), 36; https://doi.org/10.3390/engproc2023039036
Published: 29 June 2023
(This article belongs to the Proceedings of The 9th International Conference on Time Series and Forecasting)

Abstract

:
Most real time series exhibit certain characteristics that make the choice of model and its specification difficult. The objective of this study is to address the problem of parameter estimation and the accuracy of forecasts k-steps ahead in non-stationary time series with outliers in the context of state-space models. In this paper, three methods for detecting and treating outliers are proposed. We also present a comparative study of the proposed methods using data simulated from a local level model with sample sizes of 50 and 500 and with various combinations of parameters, with a 5% contamination error rate of the observation equation. The results were evaluated in terms of the accuracy of model parameters and the forecasts k-steps ahead, as well as the detection rate of true outliers. These methodologies are applied to three real examples. This study shows that the local level model is sufficiently robust even for non-stationary contaminated series, in the sense that they are able to handle non-stationary time series and outliers in a satisfactory way.

1. Introduction

State-space models were originally developed in aerospace engineering in the early 1960s for the purpose of monitoring and correcting the trajectory of a spacecraft headed to the moon. Today, these models have wide applicability in many areas, such as finances [1], ecology [2], machine learning [3], and time series analysis and forecasting [4,5,6,7]. These models, associated with the Kalman filter algorithm [8], are a very powerful tool given their ability to update predictions both in real time and in a recursive procedure as new observations of the time series become available, thus improving the accuracy of predictions. In addition, state-space models are very flexible due to their ability to incorporate fixed effects and stochastic components that can represent the different unobserved components, such as periodic structures, trends, seasonality, and temporal correlation. These components describe the structural variation of the time series under study. Furthermore, potential covariates can be added because they are important to explain the process and complement the information introduced by the different stochastic components of the model. These models include two sources of variability: one corresponding to measurement errors and the other to process variations. In this way, it becomes simpler to interpret both errors separately. One advantage of these models is that they do not require the assumption of stationarity and can handle time series with missing values in a particularly simple way [4,9]. However, the existence of outliers in real data can influence the estimation and prediction accuracy of both the parameters.
Outliers can be a problem for model specification and prediction accuracy, since the Kalman filter is not generally robust to the presence of outliers. An incorrectly specified model can lead to incorrect covariance matrices of predictions given by the Kalman filter, and thus there is no way to describe the actual quality of the filter [10]. According to [11], the presence of outliers in a time series can induce non-Gaussian heavy-tailed noise, leading to misspecified models, biased estimates, and inaccurate forecasts. The authors of [12] showed that simple linear Gaussian state-space models can present estimation problems. Therefore, in this paper, several methods of detecting and treating outliers are discussed. These methods will be compared and illustrated with a simulation study that considers a simple Gaussian stationary state-space model with 5% data contamination. To create the non-stationarity scenario, the local level model, which is a particular case of the state-space model, will be considered for the sake of simplicity. Detection and treatment of the methods’ performance is evaluated by the root-mean-square error (RMSE) and the mean absolute error (MAE) of the Gaussian likelihood of the parameters’ estimates and the one-step ahead predictions of the time-series variable. Several scenarios are considered accounting for different combinations of parameters and times series sizes, n in this specific case, (n = 50,500). Time series simulations are generated until 1000 time series have a state-space model with valid estimates, i.e., estimates within the space parameter.

2. Methodologies

The univariate state-space model can be represented by the observation and state equations, respectively, given by
Y t = W t β t + e t
β t = μ + ϕ ( β t 1 μ ) + ε t
where t represents the time, Y t is the observed data, W t is a factor assumed to be known that relates the observation Y t to the latent variable β t at time t. The disturbances e t and ε t are independent and identically distributed, with Gaussian distribution of zero mean and variances σ e 2 and σ ε 2 , respectively, and are uncorrelated with each other.
The state β t is a latent variable and therefore must be estimated. The Kalman filter algorithm ([8]) provides optimal unbiased linear one-step ahead and update estimators of the unobservable state β t . Let Θ = { ϕ , σ e 2 , σ ε 2 } be the vector of the model’s unknown parameters, let β ^ t | t 1 denote the predictor of β t based on the observations Y 1 , Y 2 , , Y t 1 and P t | t 1 be its mean square error, i.e., E [ ( β ^ t | t 1 β t ) 2 ] . The one-step ahead forecast for the observable vector Y t is given by Y ^ t | t 1 = W t β ^ t | t 1 . When, at time t, Y t is available, the prediction error or innovation, η t = Y t Y ^ t | t 1 , is used to update the estimate of β t (filtering) through the equation
β ^ t | t = β ^ t | t 1 + K t η t ,
where K t is called the Kalman gain and is given by K t = P t | t 1 W t ( W t 2 P t | t 1 + σ e 2 ) 1 . The mean square error of the updated estimator β ^ t | t , represented by P t | t , verifies the relationship P t | t = P t | t 1 K t W t P t | t 1 . Furthermore, the predictor of β t + k at time t is given by
β ^ t + k | t = μ + ϕ k β ^ t | t μ ,
and its mean square error is P t + k | t = ϕ 2 k P t | t + i = 0 k 1 ϕ 2 i σ ε 2 .

Outlier Detection and Treatment Procedures

Three approaches to outlier detection and treatment are presented. The first approach is based on linear interpolation, which represents the naive method. The other two approaches are based on iterative processes from the robust Kalman filter and from the Kalman filter in the missing values perspective.
1
Linear interpolation (LI)
  • Outlier detection: Observations are considered outliers if they are less than Q 1 1.5 I Q R or greater than Q 3 + 1.5 I Q R , where Q 1 and Q 3 denote the first and third quartiles, respectively, and I Q R (interquartile range) is the difference between the third and first quartiles ( I Q R rule).
  • Outlier treatment: Any outliers that are identified are replaced by LI using the neighbouring observations [13].
2
Iterative method based on the robust Kalman filter (RKF)
  • Outlier detection: Outlier detection is performed by applying the I Q R rule on the standardized residuals after fitting a state-space model to the data.
  • Outlier treatment: An alternative to the state estimator β ^ t | t , inspired by the work by [14] and subsequently by [15], is proposed. In this approach, the state prediction β ^ t | t is replaced by
    β ^ t | t * = argmin β β ^ t | t 1 β 2 P t | t 1 1 + Y t out W t β 2 σ e 2
    where Y t out is an identified outlier that is replaced by Y ^ t * = W t β ^ t | t * . This proposal considers the robust version of the Kalman filter only at moments at which outliers are detected, as opposed to the original work, in which it is applied at all moments. In the end, the model is iteratively fitted j times to the corrected time series until Θ ^ M L ( j ) Θ ^ M L ( j 1 ) < δ , j N , or for some value j.
3
Iterative method based on the Kalman filter for time series with missing values (naKF)
  • Outlier detection: Outlier detection is performed by applying the I Q R rule to the standardized residuals after fitting a state-space model to the data.
  • Outlier treatment: Outlier observations Y t out are assumed to be missing values and the state estimator β ^ t | t and its mean square error P t | t * are replaced by β ^ t | t * = β ^ t | t 1 and P t | t * = P t | t 1 , respectively. The missing observations Y t out are replaced by Y ^ t * = W t β ^ t | t * and the state-space model is fitted j times to the corrected time series until Θ ^ M L ( j ) Θ ^ M L ( j 1 ) < δ , j N , or for some value j.
The aim of this paper is to investigate under which conditions the presence of outliers affects the estimation of parameters and states in the state-space model and to propose competitive approaches for outlier detection and treatment. Thus, we simulate time series of size n ( n = 50,500), considering for all simulation studies the local level model, which is a simple and particular case of the state-space model (2)–(4), where W t = 1 , t and ϕ = 1 , which will be used to illustrate the non-stationary case. The local level model is given by:
Y t = β t + e t β t = β t 1 + ε t
In the literature, some approaches have been proposed for the initialization of the Kalman filter for non-stationary stochastic processes. Perhaps the best known is the diffuse initialization ([16]). In this paper, we will use the approximate diffuse initialization, assuming a zero mean and a very large variance of the state ( σ e 2 × 10 7 ).
This study examines two distinct situations: one characterized by non-contaminated data, i.e., the clean data where e t N ( 0 , σ e 2 ) ; ε t N ( 0 , σ ε 2 ) , and the other involving data that has been contaminated at a rate of p = 0.05 , i.e., e t ( 1 p ) N ( 0 , σ e 2 ) + p N ( 10 σ e , σ e 2 ) ; ε t N ( 0 , σ ε 2 ) .
For each of the scenarios, the simulation design was formulated with a sample sizes of n = (50,500), and σ ε 2 and σ e 2 (0.10, 1.00, 0.05). For each parameter combination, 1000 replicates with valid estimates were considered, i.e., σ ε > 0 , and σ e > 0 ; It was considered as convergence criteria Θ ^ M L ( j ) Θ ^ M L ( j 1 ) < 10 4 or until j = 100 . To initialize the Kalman filter, μ 1 = 0 and P 1 = σ e 2 × 10 7 was taken.
To evaluate the quality of the parameter estimates and the k-steps ahead forecasts, it was considered that
  • RMSE ( Θ ) = 1 n i = 1 n Θ i Θ ^ i 2 ;
  • MAE ( Θ ) = 1 n i = 1 n Θ i Θ ^ i .
To evaluate the rate of true outliers detected, two rates were used rate 1 = A / B ; rate 2 = A / C , where A is the number of true outliers detected, B is the total number of outliers detected by the method (total number of true and false outliers), and C is the total number of true outliers.

3. Results

In this section, the results obtained from the proposed methodologies are presented. The results of the simulation study are represented in the first subsection. In the second subsection, the application of outlier detection and treatment methodologies are demonstrated via three illustrative examples.

3.1. Simulation Results

Table 1 and Table 2 show the RMSE and MAE of the local level model parameters and the one-step ahead forecasts for sample sizes n = 50 and n = 500 for the simulation study, respectively. In most scenarios, the methodologies improved the accuracy of model parameters and one-step ahead forecasts. However, this improvement was minimal. In fact, there are scenarios where the RMSE and MAE evaluation measures are lower in the non-treated case compared to when outliers are treated; for example, for the scenario n = 500 , σ ε 2 = 0.10 and σ e 2 = 0.05 . In particular, LI performed least favourably in comparison to RKF and naKF, especially to estimate the variance of the observation error σ e 2 . For example, for n = 500 , σ ε 2 = 0.10 and σ e 2 = 1.00 , in the case of treating outliers by LI, the RMSE of σ e 2 was 2.0559, while for RKF it was 0.2428 and for naKF it was 0.1168. Overall, it can also be seen that naKF was the method that showed the better performance to improve the accuracy of the parameters and one-step ahead forecasts, especially for n = 500 . The proposed methodologies had problems in improving the accuracy of the estimates of the level variance σ ε 2 . Finally, regarding the detection of outliers, it is clearly seen the advantage of identifying outliers over standardized residuals, whose means of rate 1 and 2 were higher.

3.2. Illustrative Examples

In this subsection, a comparative analysis of the proposed outlier detection and treatment methods using the local level model is presented based on three illustrative examples. The aim is to evaluate the performance of the methodologies from a practical point of view, in terms of outlier detection and treatment and validation of the assumptions (normality and independence of residuals). The three time series that present outliers and are used for illustrative purposes are the following:
  • TS1: Number of earthquakes per year of magnitude 7.0 or greater, between 1900 and 1998 (Figure 1);
  • TS2: Kiewa River at Kiewa, Victoria, Australia, between 1885 and 1954 (Figure 2);
  • TS3: Tree: Beyond Burn, Australia. Pencil Pine, between 1028 and 1975 (Figure 3).
The data is available on GitHub (https://github.com/FinYang/tsdl (accessed on 27 June 2023)) in the Time Series Data Library (TSDL), created by Professor Rob Hyndman.
The data was divided into a training sample (80%) and a test sample (20%). TS1 presents one outlier in the training sample corresponding to the year 1943; TS2 presents one outlier in the training sample (1916) and one in the test sample (1955). TS3 presents 18 outliers in the training sample (16 outliers before 1335 and two outliers corresponding to the years 1770 and 1777, respectively) and three outliers in the test sample, namely 1972, 1973 and 1975.
The results of the local level model fit to the three time series are shown in Table 3.
After fitting the model to the non-treated data, outliers were detected in the standardized residuals, and these outliers were treated in the two iterative methods, RKF and naKF. In TS1, two outliers were detected (1943 and 1957). In example TS2, the detected outlier initially remained (1916). Finally, in TS3, where eighteen outliers were initially detected, after the adjustment the residuals showed eight outliers, of which three (1042, 1158 and 1777) were initially detected in the time series.
Table 4 shows the observed evaluation measures and predicted values in the test sample. This table highlights the lowest RMSE and MAE values, with the naKF method performing best. However, the difference between these values is minimal, especially in the case of TS3; therefore, these results are in line with those obtained in the simulation study.
Figure 4, Figure 5 and Figure 6 show TS1, TS2 and TS3 in black, respectively, the forecasts in red, and the 95% prediction intervals using naKF for the treatment of outliers. The amplitude of the prediction intervals for TS1 (Figure 4) and TS3 (Figure 6) show a considerable increase over time, whereas for TS2 (Figure 5) this increase is minimal, and the interval does not cover all the observations in the test sample.
Regarding the analysis of the model assumptions, the residuals should behave similarly to white noise. Normality was verified for all models and for all time series: Kolmogorov–Smirnov p values between 0.398 (RKF and TS2) and 0.967 (RKF and TS1). The models for TS1 and TS2 verified the independence assumption: p values ranging between 0.314 (non-treated and TS1) and 0.574 (NA and TS1) from the Ljung–Box test. However, this assumption was not verified for TS3 (all p values of the Ljung–Box test were less than 0.003).

4. Discussion

In this work, three methods for detecting and treating outliers in time series were proposed. This study highlighted the problem of contaminated non-stationary time series from a state-space modelling perspective. To study the impact of outliers on parameter estimates and the observation forecasts, and to make a comparative analysis of the proposed methods, a simulation study was conducted with sample sizes of 50 and 500 with various combinations of parameters, generated using a non-stationary local level model. The data were contaminated at a 5% error rate of the observations. It was found that the proposed methods overall improved the accuracy of the parameters and forecasts; however, this improvement was minimal compared to the contaminated data. The treatment of outliers by naKF and RKF were found to be the most favourable, therefore highlighting the performance of naKF. LI was overall performed the worse. These proposed methodologies were applied to three real time series, where the same conclusion was drawn. In other words, in view of the study’s results, the state-space models are generally sufficiently robust, given that they are able to handle non-stationary time series and outliers in a satisfactory way.

Author Contributions

F.C.P., A.M.G. and M.C. contributed to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available in GitHub (https://github.com/FinYang/tsdl (accessed on 27 June 2023)).

Acknowledgments

F. Catarina Pereira was funded by national funds through FCT (Fundação para a Ciência e a Tecnologia) through the individual PhD research grant UI/BD/150967/2021 of CMAT-UM. A. Manuela Gonçalves was partially financed by Portuguese Funds through FCT within the Projects UIDB/00013/2020 and UIDP/00013/2020 of CMAT-UM. Marco Costa was partially supported by The Center for Research and Development in Mathematics and Applications (CIDMA-UA) through the Portuguese Foundation for Science and Technology—FCT, references UIDB/04106/2020 and UIDP/04106/2020.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Triantafyllopoulos, K. The State Space Model in Finance. In Bayesian Inference of State Space Models; Springer Texts in Statistics; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  2. Auger-Methe, M.; Newman, K.; Cole, D.; Empacher, F.; Gryba, R.; King, A.A.; Leos-Barajas, V.; Flemming, J.M.; Nielsen, A.; Petris, G.; et al. A guide to state–space modeling of ecological time series. Ecol. Monogr. 2021, 91, 1–38. [Google Scholar] [CrossRef]
  3. Wu, H.; Matteson, D.; Wells, M. Interpretable Latent Variables in Deep State Space Models. arXiv 2022, arXiv:2203.02057. [Google Scholar]
  4. Matsuura, K. Time Series Data Analysis with State Space Model. In Bayesian Statistical Modeling with Stan, R, and Python; Springer: Singapore, 2022. [Google Scholar] [CrossRef]
  5. Monteiro, M.; Costa, M. Change Point Detection by State Space Modeling of Long-Term Air Temperature Series in Europe. Stats 2023, 6, 7. [Google Scholar] [CrossRef]
  6. Pereira, F.C.; Gonçalves, A.M.; Costa, M. Short-term forecast improvement of maximum temperature by state-space model approach: The study case of the TO CHAIR project. Stoch. Environ. Res. Risk Assess. 2023, 37, 219–231. [Google Scholar] [CrossRef]
  7. Shumway, R.H.; Stoffer, D.S. Time Series Analysis and its Applications: With R Examples; Springer: New York, NY, USA, 2017. [Google Scholar]
  8. Kalman, R. A New Approach to Linear Filtering and Prediction Problems. ASME J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
  9. Harvey, A. Forecasting, Structural Time Series Models and the Kalman Filter; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar] [CrossRef]
  10. Teunissen, P.J.G.; Khodab, A.; Psychas, D. A generalized Kalman filter with its precision in recursive form when the stochastic model is misspecified. J. Geod. 2021, 95, 108. [Google Scholar] [CrossRef]
  11. Huang, Y.; Zhang, Y.; Zhao, Y.; Shi, P.; Chambers, J.A. A Novel Outlier-Robust Kalman Filtering Framework Based on Statistical Similarity Measure. IEEE Trans. Autom. Control 2021, 66, 2677–2692. [Google Scholar] [CrossRef]
  12. Auger-Méthé, M.; Field, C.; Albertsen, C.M.; Derocher, A.E.; Lewis, M.A.; Jonsen, I.D.; Flemming, J.M. State-space models’ dirty little secrets: Even simple linear Gaussian models can have estimation problems. Sci. Rep. 2016, 6, 26677. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  14. Cipra, T.; Romera, R. Kalman filter with outliers and missing observations. Test 1997, 6, 379–395. [Google Scholar] [CrossRef]
  15. Crevits, R.; Croux, C. Robust estimation of linear state space models. Commun. Stat.- Simul. Comput. 2019, 48, 1694–1705. [Google Scholar] [CrossRef] [Green Version]
  16. Durbin, J.; Koopman, S.J. Time Series Analysis by State Space Methods, 2nd ed.; Oxford Statistical Science Series; Oxford University Press: Oxford, UK, 2013. [Google Scholar] [CrossRef]
Figure 1. Number of earthquakes per year of magnitude 7.0 or greater, between 1900 and 1998 (TS1).
Figure 1. Number of earthquakes per year of magnitude 7.0 or greater, between 1900 and 1998 (TS1).
Engproc 39 00036 g001
Figure 2. Kiewa River at Kiewa, Victoria, Australia, between 1885 and 1954 (TS2).
Figure 2. Kiewa River at Kiewa, Victoria, Australia, between 1885 and 1954 (TS2).
Engproc 39 00036 g002
Figure 3. Tree: Beyond Burn, Australia. Pencil Pine, between 1028 and 1975 (TS3).
Figure 3. Tree: Beyond Burn, Australia. Pencil Pine, between 1028 and 1975 (TS3).
Engproc 39 00036 g003
Figure 4. TS1 (black), the k-steps ahead forecasts (red) and the 95% prediction intervals using naKF (red shadow).
Figure 4. TS1 (black), the k-steps ahead forecasts (red) and the 95% prediction intervals using naKF (red shadow).
Engproc 39 00036 g004
Figure 5. TS2 (black), the k-steps ahead forecasts (red) and 95% prediction intervals using naKF (red shadow).
Figure 5. TS2 (black), the k-steps ahead forecasts (red) and 95% prediction intervals using naKF (red shadow).
Engproc 39 00036 g005
Figure 6. TS3 (black), the k-steps ahead forecasts (red) and 95% prediction intervals using naKF (red shadow).
Figure 6. TS3 (black), the k-steps ahead forecasts (red) and 95% prediction intervals using naKF (red shadow).
Engproc 39 00036 g006
Table 1. Root-mean-square error (RMSE), mean absolute error (MAE), rate 1, and rate 2 of Θ with 1000 simulations of non-stationary time series of sample sizes n = 50 , considering Gaussian errors (NC = non-contaminated; C = contaminated; RKF = robust Kalman filter; naKF = Kalman filter for time series with missing values).
Table 1. Root-mean-square error (RMSE), mean absolute error (MAE), rate 1, and rate 2 of Θ with 1000 simulations of non-stationary time series of sample sizes n = 50 , considering Gaussian errors (NC = non-contaminated; C = contaminated; RKF = robust Kalman filter; naKF = Kalman filter for time series with missing values).
Parameters RMSEMAEOutlierMeanMean
σ ε 2 σ e 2 σ ε 2 σ e 2 Y ^ t | t 1 vs . Y t σ ε 2 σ e 2 Y ^ t | t 1 vs . Y t DetectionRate 1Rate 2
0.100.05NC0.04160.02760.42710.03350.02170.3399 --
C0.06210.26140.52430.04750.22140.4033 --
LI0.05840.17720.49100.04380.12860.3781Time series84%42%
RKF0.06650.09100.49100.04560.07180.3781Standardized74%88%
naKF0.05360.05560.46670.03930.03370.3607residuals
1.000.10NC0.31140.14531.07340.24880.10880.8539 --
C0.46380.62751.22160.36440.49510.9507 --
LI0.42550.57231.21270.34320.44990.9421Time series45%8%
RKF0.42160.43471.20480.33840.34220.9387Standardized61%42%
naKF0.42850.38211.22100.34220.27060.9383residuals
0.101.00NC0.08400.24561.16750.06180.19770.9326 --
C14.5332468.24791.46901.363877.86061.1298 --
LI0.10250.32661.16530.07190.23730.9250Time series91%99%
RKF0.37680.59581.28600.12450.35870.9876Standardized78%98%
naKF0.45100.31551.28440.15820.25250.9620residuals
0.050.10NC0.02750.03290.44130.02120.02600.3517 --
C0.05640.42420.54160.03330.35160.4180 --
LI0.03430.15010.46630.02370.08300.3652Time series91%83%
RKF0.05860.07100.49140.03270.05570.3798Standardized75%97%
naKF0.04760.03910.47140.02790.02940.3635residuals
Table 2. Root-mean-square error (RMSE), mean absolute error (MAE), rate 1, and rate 2 of Θ with 1000 simulations of non-stationary time series of sample sizes n = 500 , considering Gaussian errors (NC = non-contaminated; C = contaminated; RKF = robust Kalman filter; naKF = Kalman filter for time series with missing values).
Table 2. Root-mean-square error (RMSE), mean absolute error (MAE), rate 1, and rate 2 of Θ with 1000 simulations of non-stationary time series of sample sizes n = 500 , considering Gaussian errors (NC = non-contaminated; C = contaminated; RKF = robust Kalman filter; naKF = Kalman filter for time series with missing values).
Parameters RMSEMAEOutlierMeanMean
σ ε 2 σ e 2 σ ε 2 σ e 2 Y ^ t | t 1 vs . Y t σ ε 2 σ e 2 Y ^ t | t 1 vs . Y t DetectionRate 1Rate 2
0.100.05NC0.01380.00860.43150.01090.00680.3443 --
C0.01700.22280.53030.01370.21870.4115 --
LI0.01840.21560.55610.01470.21120.4193Time series52%4%
RKF0.01890.06960.49130.01460.06840.3822Standardized77%91%
naKF0.01810.01330.46560.01370.01030.3613residuals
1.000.10NC0.11560.05241.08910.09340.04190.8685 --
C0.13760.49551.23740.11120.47750.9679 --
LI0.14540.49621.31170.11650.47880.9915Time series19%1%
RKF0.13660.32611.22260.11020.31140.9550Standardized65%41%
naKF0.16430.20651.25610.12720.18030.9634residuals
0.101.00NC0.02350.07711.16850.01880.06100.9324 --
C0.03514.70131.43340.02754.62311.1320 --
LI0.03412.05591.27540.02421.39781.0019Time series94%68%
RKF0.02990.24281.21910.02270.22550.9664Standardized89%100%
naKF0.04230.11681.19500.02540.09760.9436residuals
0.050.10NC0.00860.01000.44660.00680.00790.3561 --
C0.01250.46140.56470.00980.45170.4417 --
LI0.01190.36050.56280.00940.33480.4290Time series81%23%
RKF0.01160.07220.48930.00880.07020.3854Standardized84%99%
naKF0.01040.01290.46170.00770.01060.3644residuals
Table 3. Parameter estimates and respective standard errors of the non-stationary state-space model (local level model); LI—linear interpolation; RKF—robustified Kalman filter; naKF—Kalman filter for time series with missing values.
Table 3. Parameter estimates and respective standard errors of the non-stationary state-space model (local level model); LI—linear interpolation; RKF—robustified Kalman filter; naKF—Kalman filter for time series with missing values.
σ ε σ e log L
Estimate(SE)Estimate(SE)
TS1Non-treated2.7103(0.6932)4.8341(0.5760)−192.8515
LI2.6438(0.6735)4.6330(0.5578)−190.1958
RKF2.9174(0.6983)4.0890(0.5653)−185.7237
naKF3.0671(0.7237)3.8387(0.5844)−183.6041
TS2Non-treated1.6446(0.8822)9.3662(0.9774)−170.7793
LI1.2913(0.7006)7.8502(0.8092)−161.2743
RKF1.1704(0.6859)7.7522(0.7959)−160.3136
naKF1.0999(0.6905)7.7692(0.7967)−158.2455
TS3Non-treated0.0623(0.0058)0.1054(0.0046)1096.8770
LI0.0597(0.0057)0.0971(0.0045)1149.8800
RKF0.0614(0.0055)0.1000(0.0044)1129.9350
naKF0.0601(0.0055)0.1020(0.0044)1124.1500
Table 4. Root-mean-square error (RMSE) and mean absolute error (MAE) between the observed and forecasted values via the local level model in the test sample.
Table 4. Root-mean-square error (RMSE) and mean absolute error (MAE) between the observed and forecasted values via the local level model in the test sample.
Non-TreatedLIRKFnaKF
RMSEMAERMSEMAERMSEMAERMSEMAE
TS1 Y t vs. Y ^ t + k | t 7.02456.04967.00876.03536.82055.86096.73425.7788
Percentage reduction--0.22%4.14%2.90%3.12%4.13%4.48%
TS2 Y t vs. Y ^ t + k | t 11.40918.145911.36248.145611.28338.145511.22498.1455
Percentage reduction--0.41%0.004%1.10%0.01%1.61%0.01%
TS3 Y t vs. Y ^ t + k | t 0.37590.32310.37570.32290.37560.32280.37420.3213
Percentage reduction--0.05%0.06%0.08%0.09%0.45%0.56%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pereira, F.C.; Gonçalves, A.M.; Costa, M. Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers. Eng. Proc. 2023, 39, 36. https://doi.org/10.3390/engproc2023039036

AMA Style

Pereira FC, Gonçalves AM, Costa M. Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers. Engineering Proceedings. 2023; 39(1):36. https://doi.org/10.3390/engproc2023039036

Chicago/Turabian Style

Pereira, Fernanda Catarina, Arminda Manuela Gonçalves, and Marco Costa. 2023. "Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers" Engineering Proceedings 39, no. 1: 36. https://doi.org/10.3390/engproc2023039036

Article Metrics

Back to TopTop