Decomposition of Dynamical Signals into Jumps , Oscillatory Patterns , and Possible Outliers

In this note, we present a component-wise algorithm combining several recent ideas from signal processing for simultaneous piecewise constants trend, seasonality, outliers, and noise decomposition of dynamical time series. Our approach is entirely based on convex optimisation, and our decomposition is guaranteed to be a global optimiser. We demonstrate the efficiency of the approach via simulations results and real data analysis.


Motivations
The goal of the present work is to propose a simple and efficient scheme for time series decomposition into trend, seasonality, outliers, and noise components.As is well known, estimating the trend and the periodic components play a very important role in time series analysis [1][2][3].Important applications include financial time series [4,5], epidemiology [6], climatology [7], control engineering [8], management [9], and airline data analysis [10].
The trend and seasonality decomposition is often dealt with in a suboptimal sequential scheme and rarely in a joint procedure.Estimating the trend and the seasonality components can be addressed via many different techniques relying on filtering and/or optimisation of various criteria.It might not be always clear what global optimisation problem is actually solved by the successive methods employed.Moreover, we are not aware of any method that incorporates outlier detection into the procedure.
In this short note, we present a joint estimation technique based on recent approaches based on convex optimisation, namely 1 -trend Filtering [11] and nuclear norm penalised low rank approximation of Hankel matrices, combined with an outlier detection scheme.Low rank estimation and outlier detection is performed using robust principal component analysis (PCA) [12].Projection onto Hankel matrices is straightforward using averaging along diagonals.The whole scheme aims at providing a flexible and efficient method for time series decomposition based on convex analysis and fast algorithms, and as such aligns with the philosophy of current research in signal processing and machine learning [13].
Let us now introduce some notations.The time series we will be working with, denoted by X t is assumed to be decomposable into four components where (T t ) t denotes the trend, (S t ) t denotes a signal that can be decomposed into a sum of few complex exponential sums, (O t ) t is a sparse signal representing rare outliers, and (e t ) t denotes the noise term.

Previous Works
More than often, the components of time series are estimated in a sequential manner.

Piecewise Constant Trend Estimation
A nice survey on trend estimation is [14].Traditional methods include model based approaches using, e.g., state space models [15] or nonparametric approaches such as the Henderson, LOESS [16], and Hodrick-Prescott filters [17].Some comparisons can be found in [4].More recent approaches include empirical mode decomposition [14] and wavelet decompositions [18].A very interesting approach called 1 -trend filtering was recently proposed by [11] among others.This method allows for retrieval of piecewise constants or, more generally, piecewise polynomial trends from signals using 1 -penalised least squares.In particular, this type of analysis is the perfect tool for segmenting time series and detecting model changes in the dynamics.These methods have been recently approach by statistical tools in [19][20][21][22].

Seasonality Estimation
Seasonality can also be addressed using various approaches [23,24].Unit root tests sometimes allow for detection of some seasonal components.Periodic autoregressive models have been proposed as a simple and efficient model with many applications in, e.g., business [25], medicine [26], and hydrology [27].The relationship between seasonality and cycle can be hard to formalise when dependencies are present [24].Another line of research is based on Prony's method [28,29].Prony's method is based on the fact that any signal which is the sum of exponential functions can be put into a Hankel matrix, whose rank will be the number of exponential components.Applications are multiple in signal processing [30], electromagnetics [31], antennas [32], medicine [33], etc., although they are not robust to noise.This approach can be enhanced using alternating projections such as in [34].There are now many works on improving Prony's method.For our purposes, we will concentrate on a rank regularised Prony-type approach.The rank regularisation will be performed by using a surrogate such as the nuclear norm for matrices.

Joint Estimation
Seasonality and trend estimation can be performed using, e.g., general linear abstraction of seasonality (GLAS), widely used in econometrics [4].Existing software, e.g., X-12-ARIMA, TRAMO-SEATS, and STAMP, is available for general purpose time series decomposition, but such programs do not provide the same kind of analysis as the decomposition we provide in the present paper.For instance, it is well known that ARMA estimation is based on a non-convex likelihood optimisation step for which no current method can be proved to provide a global optimiser.
Singular spectrum analysis (SSA) is also widely used, and the steps of this algorithm are similar to those of PCA, but especially tailored for time series [14].Neither a parametric model nor stationarity are postulated for the method to work.SSA seeks to decompose the original series into a sum of a small number of interpretable components such as trend, oscillatory components, and noise.It is based on the singular value decomposition of a specific matrix built from the time series.This makes SSA a model-free method and hence enables SSA to have a very wide range of applicability [35].

Our Contribution
The methodology developed in this work brings together several important discoveries from the last 10 years based on sparse estimation, in particular, non-smooth convex surrogates for cardinality and rank-type penalisations.As a result, we obtain a decomposition that is easily interpretable and that detects the location of possible abrupt model changes and of outliers.These features are not supplied by, e.g., the SSA approach, albeit of fundamental importance for quality assessment.

Background on Penalised Filtering, Robust PCA, and Componentwise Optimisation
We start with a short introduction to sparsity promoting penalisation in statistics, based on the main properties of one of the simplests avatars of this approach: the least absolute selection and shrinkage operator (LASSO) [19].
2.1.The Nonsmooth Approach to Sparsity Promoting Penalised Estimation: Lessons from the LASSO All of the techniques that are going to be used in the paper resort to sparse estimation for vectors and spectrally sparse estimation for matrices.Sparse estimation has been a topic of extensive research in the last 15 years [36].Our signal decomposition problem strongly relies on sparse estimation because jumps have sparse derivative [20], piecewise affine signals have sparse second order derivative [11], seasonal components can be estimated using spectrally sparse Hankel matrices (Prony's method and its robustified version [37]), and outliers are sparse matrices [12].
Sparse estimation, the estimation of a sparse vector, matrix, or spectrally sparse matrix with possibly additional structure, usually consists in least squares minimisation under a sparsity constraint.Unfortunately, sparsity constraints are often computationally hard to deal with and, as surveyed in [36], one has to resort to convex relaxations.
Efficient convex relaxations for sparse estimation originally appeared in statistics in the context of regression [38].The main idea is that the number of non-zero components of a vector can be convexified using the Fenchel bi-dual over the ∞ unit-ball.The resulting functional is nothing but the 1 -norm of the vector.Incorporating an 1 -norm penalty into the least-squares scheme is a procedure known as the LASSO in the statistics literature [39][40][41] and for joint variance and regression vector estimation [42]; see also [43] for very interesting modified LASSO-type procedures.Based on the same idea, spectral sparsity constraints can be relaxed using nuclear norm-type of penalisations in the matrix setting [36] and even in the tensor setting [44][45][46].
In the next paragraphs, we describe how these ideas apply to the different components of the signal.

Piecewise Constant Signals
When the observed signal is of the form for t = 1, . . ., n, where (T t ) t∈N is piecewise constant, with less than j 0 jumps, the corresponding estimation problem can be written as min where D is the operator which transforms a sequence into a sequence of successive differences.This problem is unfortunately non-convex, and the standard approach to the estimation of (T t ) t∈N is to replace the • 0 -norm with the • 1 norm of the vector of sucessive differences, which turns out to be known as the total variation penalised least-squares estimator defined as a solution to the following optimisation problem [20]: Here, the penalisation term ∑ n t=2 |T t − T t−1 | is a convexification of the function that counts the number of jumps in T. This estimation/filtering procedure suffers from the same hyper-parameter calibration as the LASSO.Many different procedures have been devised for this problem such as [47].The method of [48] based on online learning can also be adapted to this context.

Prony's Method
We will make great use of Prony's method for our decomposition.We will indeed look for oscillating functions in the seasonal component.This can be expressed as the problem of finding a small sum of exponential functions.In mathematical terms, these functions are where z k , k = −K, . . ., K are complex numbers.
Remark 1.The adjoint operator H * consists of extracting a signal of size n from a Hankel matrix in the most canonical way, i.e. by extracting following the first column and then the last row in the most obvious way.
When the z k values have unit moduli, we obtain sigmoidal functions.Otherwise, a damping factor appears for certain frequencies, and one obtains a very flexible model for a general notion of seasonality.
Consider the sequence (S t ) t=1,...,n defined by (5).It is easy to notice that the matrix is singular.Conversely, if (a 0 , . . ., a r−1 ) belongs to the kernel of S r , then the sequence (S t ) t=1,...,n satisfies a difference equation of the type The important fact to recall now is that the solutions to such difference equations form a complex vector space of dimension 2K + 1 and all the exponential sequences (z t k ) t=1,...,n are independent particular solutions that form a basis of this vector space.
Thus, and this is the cornerstone of Prony-like approaches to oscillating signals modelling, the highly nonlinear problem of finding the exponentials whose combinations reconstructs a given signal is equivalent to a simple problem in linear algebra.The main question now is whether this method is robust to noise.Unfortunately, the answer is no, and some enhancements are necessary in order to use this approach innoisy signals or time series analysis.

Finding a Low Rank Hankel Approximation
We will also need to approximate the Hankel matrix from the previous section by a low rank matrix.If no other constraint is imposed on this approximation problem, it turns out that this task can be efficiently achieved by singular value truncation.This is what is usually done in data analysis under the name of PCA.
From the applied statistics perspective, PCA, defined as a tool for high-dimensional data processing, analysis, compression, and visualisation, has wide applications in scientific and engineering fields.It assumes that the given high-dimensional data lie near a much lower-dimensional linear subspace.The goal of PCA is to efficiently and accurately estimate this low-dimensional subspace.This turns out to be equivalent to truncating the singular value decomposition (SVD), a fact also known as the Eckhart-Young Theorem, and this is exactly what we need for our low rank approximation problem.In mathematical terms, given a matrix H, the goal is to simply find a low rank matrix L, such that the discrepancy between L and H is minimised, leading to the following constrained optimisation L * ≤ ρ 0 and L = H(S).

Main Results
We now describe and analyse our method, which combines the total variation minimisation, the Hankel projection, and the SVD.

Putting It All Together
If one wants to perform the full decomposition and remove the outliers at the same time, we need to find a threefold decomposition of H = L + H(O) + H(T) + E, where L is low rank and Hankel, O is a sparse signal-vector representing the outliers, and E is the error vector.This can be performed by solving the following optimisation problem.min Using the discussions of the former section based on using the 1 norm as a surrogate to the 0 non-convex functional, this problem can be efficiently relaxed into the following convex optimisation problem min where • * denotes the nuclear norm, i.e., the sum of the singular values.This optimisation problem can be treated as a general convex optimisation problem and solved by any off-the-shelf interior point solver, after being reformulated as a semi-definite program.Many standard approaches are available for this problem.In the present paper, we choose an alternating projection method, which performs optimisation with respect to each variable one at a time.

A Component-Wise Optimisation Method
Our goal is to solve the following optimisation problem min Remark 2. The optimisation step with respect to T is just the 1 trend filtering optimisation problem applied to X − L − H(O).
One way to approach this problem is to devise an Alternating Direction of Multipliers Method (ADMM) type of algorithm.For this purpose, let us define the Lagrange function and let X denote the constraint set Let Λ π denote the augmented Lagrange function given by The ADMM consists in putting the subspace alternating method to work, including an update of the Lagrange mutiplier G.It is described in Algorithm 1 below.A very nice presentation of this type of algorithm is given in [50].

Algorithm 1: ADMM-based jump-seasonality-outlier decomposition
Result: The ADMM based time series decomposition Set it = 1 and L (1) = X, O (1) = 0, T (1) = 0, S (1) = 0, G (0) , it = 1, f lag = 0 ; while f lag = 0 do Optimise Λ π in T and extract the piecewise constant term T (it+1) ; Optimise Λ π in O and extract the outliers O (it+1) ; Optimise Λ π in L and obtain the low-rank matrix L (it+1) (this step corresponds to singular value soft-thresholding of H − H(O + T); Optimise Λ π in S and obtain S (it+1) from this low rank matrix; Set For the sake of practical relevance, we also include the following method consisting of successive optimisation steps and can be seen essentially as an acceleration of the ADMM approach.
Each loop successively solves the PCA and the 1 and total variation penalised least-squares optimisation problems, as well as the projection onto the space of Hankel matrices.It is described in Algorithm 2 below.

Convergence of the Algorithm
The convergence of the ADMM Algorithm 2 is guaranteed by the theory provided in [51].Concerning Algorithm 2, one can check that it corresponds to taking G = 0 in the Lagrange function Λ.The convergence of the method is then straightforward.As π is set to to grow to +∞, the respective solutions provided by the two algorithms will become correspondingly closer to each other, up to any pre-specified accuracy.

Numerical Experiments
In this section, we present some preliminary experiments which illustrate the behavior of the method on simulated data and real data.The method we implemented uses additional debiasing, and Frobenius projection instead of 1 projection for better scalability.

Simulated Data
In this section, we provide some illustrations of the algorithm on simulated data.Signals are generated with various seasonal patterns, piece-wise constant components and outliers.In all cases, we assume that the data is subject to Gaussian noise with an associated standard deviation of 2 units.
Figures 1-3 demonstrate the decomposition of a signal with 2, 3, and 5 outliers, respectively.The top block of each figure is the signal itself.The second block is the extracted piecewise constant component.The red line is the true signal, and the blue one is the estimate.The third block is the extracted seasonality component, while the fourth captures information on outliers.As before, the red line is the true signal, and the blue one is the estimate.The final block is the noise component that remains after the other components have been estimated.In all three cases, there is very good agreement between the true signal and the various components extracted using the algorithm.
Figure 4 illustrates the decomposition of a signal with four piecewise constant jumps.Even as the number of jumps increases, there is a good agreement between the true components of the signal and their estimates.Further experiments were run in order to gauge the performance of the algorithm over a range of piecewise constant jump magnitudes and outlier magnitudes.Two hundred fifty signals were randomly generated in each case with minimum jump and spike magnitudes of 3σ to 6σ.In each case, the index sets of jump location and outlier location were compared with those used to generate the data using the k-nearest neighbour distance.Histograms of the k-nearest neighbour distance are shown in Figure 5 for piecewise constant detection and in Figure 6 for outlier detection.A larger distance indicates that the outlier and jump locations detected are further away from those used to generate the data.For both jump locations and outlier locations, the distribution of distance shifts left (towards smaller values of distance) as the minimum jump or outlier magnitude increases from 3σ to 6σ.In general, outliers are detected quite well using the time series decomposition algorithm as the modal bin of the histogram corresponds to very small distances (no distance in most cases).

Real Data
The algorithm has also been used to study time series data relating to ground movement.The data in the first block of Figure 7 seems to have a decreasing trend which is captured by the piecewise affine component in Block 2. The algorithm does not detect any outliers in the data as reflected in Block 4. Seasonal variation is displayed in Block 3. The algorithm seems to converge after as few as 25 iterations.

2 F
s. t. rank(L) ≤ r 0 where r min(m, n) is the target dimension of the subspace.The problem we have to address here is the following generalisation: we have to approximate a Hankel matrix by a low rank Hankel matrix.These additional linear constraints seem innocent, but it nevertheless introduces an additional level of complexity.This problem can be efficiently addressed by solving the following decoupled problem minL,S H − L 2 F s. t. rank(L) ≤ r 0 and L = H(S)where r min(m, n) is the target dimension of the subspace, and π is a sufficiently large positive constant.Approximating this problem with a convex problem can be achieved by solving the following

Figure 1 .
Figure 1.Decomposition of a signal with two outliers.

Figure 2 .
Figure 2. Decomposition of a signal with three outliers.

Figure 3 .
Figure 3. Decomposition of a signal with five outliers.

Figure 4 .
Figure 4. Decomposition of a signal with four piecewise constant jumps.

Figure 7 .
Figure 7. Decomposition of a signal relating to ground movement.