Next Article in Journal
Financial Big Data Solutions for State Space Panel Regression in Interest Rate Dynamics
Next Article in Special Issue
Filters, Waves and Spectra
Previous Article in Journal
Econometric Fine Art Valuation by Combining Hedonic and Repeat-Sales Information

Metrics 0

## Export Article

Econometrics 2018, 6(3), 33; doi:10.3390/econometrics6030033

Article
Some Results on 1 Polynomial Trend Filtering
Graduate School of Social Sciences, Hiroshima University, 1-2-1 Kagamiyama, Higashi-Hiroshima 739-8525, Japan
*
Author to whom correspondence should be addressed.
Received: 22 May 2018 / Accepted: 4 July 2018 / Published: 10 July 2018

## Abstract

:
$ℓ 1$ polynomial trend filtering, which is a filtering method described as an $ℓ 1$-norm penalized least-squares problem, is promising because it enables the estimation of a piecewise polynomial trend in a univariate economic time series without prespecifying the number and location of knots. This paper shows some theoretical results on the filtering, one of which is that a small modification of the filtering provides not only identical trend estimates as the filtering but also extrapolations of the trend beyond both sample limits.
Keywords:
ℓ1 trend filtering; Hodrick–Prescott filtering; Whittaker–Henderson method of graduation; Lasso regression; basis pursuit denoising; total variation denoising
MSC:
62G05
JEL Classification:
C22

## 1. Introduction

The $ℓ 1$-norm penalized least-squares problem, defined as:
$min x 1 , … , x T ∑ t = 1 T ( y t − x t ) 2 + λ ∑ t = 3 T | Δ 2 x t | ,$
where $y 1 , … , y T$ are observed time-series data, was developed by Kim et al. (2009), who called it $ℓ 1$ trend filtering.1 Here, $λ > 0$ is a tuning parameter and $Δ$ denotes the backward difference operator such that $Δ x t = x t − x t − 1$. Accordingly, $Δ 2 x t = Δ ( Δ x t ) = x t − 2 x t − 1 + x t − 2$. Recall that $∑ t = 3 T | Δ 2 x t |$ in (1) is $ℓ 1$-norm of $[ Δ 2 x 3 , … , Δ 2 x T ] ⊤$. Unlike Hodrick and Prescott (1997) filtering, which is defined as the following squared $ℓ 2$-norm penalized least-squares problem:
$min x 1 , … , x T ∑ t = 1 T ( y t − x t ) 2 + ψ ∑ t = 3 T ( Δ 2 x t ) 2 ,$
where $ψ > 0$ is a smoothing/tuning parameter, the solution of $ℓ 1$ trend filtering becomes a continuous piecewise linear trend. The relationship between HP filtering and $ℓ 1$ trend filtering corresponds to that between ridge regression of Hoerl and Kennard (1970) and Lasso (least absolute shrinkage and selection operator) regression of Tibshirani (1996)/BPDN (basis pursuit denoising) of Chen et al. (1998). Econometric applications of $ℓ 1$ trend filtering include Yamada and Jin (2013), Yamada and Yoon (2014), Winkelried (2016), and Yamada (2017a).
It has been well-known that HP filtering is a form of the Whittaker–Henderson (WH) method of graduation, which is defined as:
$min x 1 , … , x T ∑ t = 1 T ( y t − x t ) 2 + ψ ∑ t = p + 1 T ( Δ p x t ) 2 .$
For historical surveys of WH filtering, see Weinert (2007), Phillips (2010), and Nocon and Scott (2012). Likewise, as shown in Kim et al. (2009), Tibshirani and Taylor (2011), and Tibshirani (2014), $ℓ 1$ trend filtering may be generalized as:
$min x 1 , … , x T ∑ t = 1 T ( y t − x t ) 2 + λ ∑ t = p + 1 T | Δ p x t | .$
We refer to it as $ℓ 1$ polynomial trend filtering.2 This filtering method is promising because it enables us to estimate a piecewise $( p − 1 )$-th order polynomial trend of a univariate economic time series without prespecifying the number and location of knots. For more details, see Yamada (2017b).
Let $x ^ 1 , … , x ^ T$ denote the solution of (3) and define $x ^ T + 1 , … , x ^ T + h$, where h denotes the length of extrapolation by:
$Δ p x ^ T + j = 0 , ( j = 1 , … , h ) .$
Recently, Yamada and Du (2018) introduced the following three modifications of the WH method of graduation:3
$( a ) min x 1 , … , x T + h ∑ t = 1 T ( y t − x t ) 2 + ψ ∑ t = p + 1 T + h ( Δ p x t ) 2 ,$
$( b ) min x 1 , … , x T + h ∑ t = 1 T + h ( y t − x t ) 2 + ψ ∑ t = p + 1 T + h ( Δ p x t ) 2 ,$
$( c ) min x 1 , … , x T + h ∑ t = 1 T + h ( y t − x t ) 2 + ψ ∑ t = p + 1 T ( Δ p x t ) 2 ,$
where $y T + j = x ^ T + j$ for $j = 1 , … , h$. Denote the solution of (a), (b), and (c) by $x ^ t ( i )$ for $i = a , b , c$ and $t = 1 , … , T + h$. Yamada and Du (2018) showed that, for $i = a , b , c$ and $t = 1 , … , T + h$, it follows that:
$x ^ t ( i ) = x ^ t .$
Among the above results, $x ^ t ( a ) = x ^ t$ is of practical use because it provides not only a smoothed series identical to that of the WH graduation, but also an extrapolation beyond the sample limit of current data. Also, $x ^ t ( b ) = x ^ t$ is of interest because it shows that $x ^ T + 1 , … , x ^ T + h$ based on (5) are useless to reduce the end-point problem of the WH graduation.4 In addition, Yamada and Du (2018) proved that, for $i = a , b , c$ and $t = 1 , … , T + h$:
$lim ψ → ∞ x ^ t ( i ) = β ^ 0 t 0 + ⋯ + β ^ p − 1 t p − 1 ,$
where $( β ^ 0 , … , β ^ p − 1 ) = arg min β 0 , … , β p − 1 ∑ t = 1 T ( y t − β 0 t 0 − ⋯ − β p − 1 t p − 1 ) 2$.
In this paper, we present three modifications of $ℓ 1$ polynomial trend filtering and show that they provide not only identical trend estimates as $ℓ 1$ polynomial trend filtering, but also extrapolations of the trend beyond both sample limits. In addition, we show some other results on the modified filtering. We also provide a MATLAB function for calculating the solution of one of the modified filtering methods.
The paper is organized as follows. In Section 2, we present three modifications of $ℓ 1$ polynomial trend filtering. In Section 3, we state the main results of the paper. In Section 4, we make some remarks on the results provided in Section 3. Section 5 provides some concluding remarks.
Notation. Let $y = [ y 1 , … , y T ] ⊤$ and $I T$ be the $T × T$ identity matrix. For an n-dimensional column vector, $η = [ η 1 , … , η n ] ⊤$, $∥ η ∥ 1 = ∑ i = 1 n | η i |$, $∥ η ∥ 2 2 = ∑ i = 1 n η i 2$, and $∥ η ∥ ∞ = max ( | η 1 | , … , | η n | )$. $D n$ is the $( n − p ) × n$p-th order difference matrix such that $D n η = [ Δ p η p + 1 , … , Δ p η n ] ⊤$. We denote $D T$ by $D$. $Π g + T + h$ is a $( g + T + h ) × p$ Vandermonde matrix, defined by
$Π g + T + h = ( 1 − g ) 0 ( 1 − g ) 1 ⋯ ( 1 − g ) p − 1 ⋮ ⋮ ⋮ 1 0 1 1 ⋯ 1 p − 1 ⋮ ⋮ ⋮ T 0 T 1 ⋯ T p − 1 ⋮ ⋮ ⋮ ( T + h ) 0 ( T + h ) 1 ⋯ ( T + h ) p − 1 ,$
and we denote $Π 0 + T + 0$, which is a $T × p$ matrix, by $Π$.

## 2. Three Modifications of $ℓ 1$ Polynomial Trend Filtering

Let $x ˜ 1 , … , x ˜ T$ denote the solution of (4) and define $x ˜ 1 − g , … , x ˜ 1 − 1$ and $x ˜ T + 1 , … , x ˜ T + h$, where g and h denote the length of extrapolations:
$Δ p x ˜ p + 1 − i = 0 , ( i = 1 , … , g ) ,$
$Δ p x ˜ T + j = 0 , ( j = 1 , … , h ) .$
For example, $x ˜ T + 1 , … , x ˜ T + h$, defined by (12) for $p = 1 , 2 , 3$, are explicitly expressed as follows:
$( p = 1 ) x ˜ T + j = x ˜ T , ( j = 1 , … , h ) ,$
$( p = 2 ) x ˜ T + j = x ˜ T + j ( Δ x ˜ T ) , ( j = 1 , … , h ) ,$
$( p = 3 ) x ˜ T + j = x ˜ T + j ( Δ x ˜ T ) + j ( j + 1 ) 2 ( Δ 2 x ˜ T ) , ( j = 1 , … , h ) .$
For a proof of (15), see the Appendix A.
Consider the following three modifications of $ℓ 1$ polynomial trend filtering:
$( d ) min x 1 − g , … , x T + h ∑ t = 1 T ( y t − x t ) 2 + λ ∑ t = p + 1 − g T + h | Δ p x t | ,$
$( e ) min x 1 − g , … , x T + h ∑ t = 1 − g T + h ( y t − x t ) 2 + λ ∑ t = p + 1 − g T + h | Δ p x t | ,$
$( f ) min x 1 − g , … , x T + h ∑ t = 1 − g T + h ( y t − x t ) 2 + λ ∑ t = p + 1 T | Δ p x t | ,$
where $y 1 − i = x ˜ 1 − i$ for $i = 1 , … , g$ and $y T + j = x ˜ T + j$ for $j = 1 , … , h$. Note that (16) is equivalent to $ℓ 1$ polynomial trend filtering if $g = h = 0$. We denote the solution of (d), (e), and (f) by $x ˜ t ( i )$ for $i = d , e , f$ and $t = 1 − g , … , T + h$.
Among (16)–(18), the objective function of (16) may be represented in matrix notation as:
$∥ y − S x g + T + h ∥ 2 2 + λ ∥ D g + T + h x g + T + h ∥ 1 ,$
where $S = [ 0 , I T , 0 ]$ is a $T × ( g + T + h )$ matrix and $x g + T + h$ is a $( g + T + h )$-dimensional column vector. Let $x ˜ g + T + h ( d ) = [ x ˜ g ( d ) ⊤ , x ˜ ( d ) ⊤ , x ˜ h ( d ) ⊤ ] ⊤$, where $x ˜ g ( d ) = [ x ˜ 1 − g ( d ) , … , x ˜ 1 − 1 ( d ) ] ⊤$, $x ˜ ( d ) = [ x ˜ 1 ( d ) , … , x ˜ T ( d ) ] ⊤$, and $x ˜ h ( d ) = [ x ˜ T + 1 ( d ) , … , x ˜ T + h ( d ) ] ⊤$. The MATLAB function for calculating $x ˜ g ( d )$, $x ˜ ( d )$, and $x ˜ h ( d )$, which depends on CVX developed by Grant and Boyd (2013), is as follows:
function [x_g,x,x_h]=m_l1_pt_filtering(y,lambda,p,g,h)
% y: T-dimensional column vector
% lambda: positive real number
% p, g, h: positive integer
% x_g: g-dimensional column vector
% x: T-dimensional column vector
% x_h: h-dimensional column vector
T=length(y);
S=[sparse(T,g),speye(T),sparse(T,h)];
D=diff(speye(g+T+h),p);
cvx_begin
variables z(g+T+h)
minimize(sum((y-S*z).^2)+lambda*norm(D*z,1))
cvx_end
x_g=z(1:g); x=z(g+1:g+T); x_h=z(g+T+1:g+T+h);
end

## 3. Main Results

Theorem 1.
Denote the solution of (d), (e), and (f) by $x ˜ t ( i )$ for $i = d , e , f$. For $i = d , e , f$, and $t = 1 − g$, …, $T + h$, it follows that:
$x ˜ t ( i ) = x ˜ t ,$
where $x ˜ 1 , … , x ˜ T$ are the solution of (4) and $x ˜ 1 − g , … , x ˜ 1 − 1$ and $x ˜ T + 1 , … , x ˜ T + h$ are defined by (11) and (12).
Proof.
Because the objective function of (4) is coercive and strictly convex with respect to $x 1 , … , x T$, $x ˜ 1 , … , x ˜ T$ are the unique global minimizer of the function. It follows that:
$∑ t = 1 T ( y t − x t ) 2 + λ ∑ t = p + 1 T | Δ p x t | ≥ ∑ t = 1 T ( y t − x ˜ t ) 2 + λ ∑ t = p + 1 T | Δ p x ˜ t | ,$
where the equality holds only if $x t = x ˜ t$ for $t = 1 , … , T$.5 In addition, from (11) and (12), $y 1 − i = x ˜ 1 − i$ for $i = 1 , … , g$, and $y T + j = x ˜ T + j$ for $j = 1 , … , h$, we have the following inequalities:
$λ ∑ t = p + 1 − g p + 1 − 1 | Δ p x t | ≥ 0 = λ ∑ t = p + 1 − g p + 1 − 1 | Δ p x ˜ t | ,$
$λ ∑ t = T + 1 T + h | Δ p x t | ≥ 0 = λ ∑ t = T + 1 T + h | Δ p x ˜ t | ,$
$∑ t = 1 − g 1 − 1 ( y t − x t ) 2 ≥ 0 = ∑ t = 1 − g 1 − 1 ( y t − x ˜ t ) 2 ,$
$∑ t = T + 1 T + h ( y t − x t ) 2 ≥ 0 = ∑ t = T + 1 T + h ( y t − x ˜ t ) 2 .$
Combining (21)–(23) yields
$∑ t = 1 T ( y t − x t ) 2 + λ ∑ t = p + 1 − g T + h | Δ p x t | ≥ ∑ t = 1 T ( y t − x ˜ t ) 2 + λ ∑ t = p + 1 − g T + h | Δ p x ˜ t | ,$
where the equality in (26) holds only if $x t = x ˜ t$ for $t = 1 − g , … , T + h$, which proves that $x ˜ t ( d ) = x ˜ t$ for $t = 1 − g , … , T + h$. Likewise, combining (21)–(25) proves that $x ˜ t ( e ) = x ˜ t$ for $t = 1 − g , … , T + h$ and combining (21), (24) and (25) proves that $x ˜ t ( f ) = x ˜ t$ for $t = 1 − g , … , T + h$. ☐
As an illustration of the above theorem, we give a numerical example. Consider the case where $T = 5$, $g = 1$, and $h = 2$. Suppose that we obtained
$x ˜ 1 = 3 , Δ x ˜ 2 = 2 , [ Δ 2 x ˜ 3 , Δ 2 x ˜ 4 , Δ 2 x ˜ 5 ] ⊤ = [ 0 , − 1 , 0 ] ⊤$
by applying $ℓ 1$ polynomial trend filtering of order 2 (i.e., $ℓ 1$ trend filtering) to a T-dimensional time-series data.6 Because $2 = Δ x ˜ 2 = Δ x ˜ 3 ≠ Δ x ˜ 4 = Δ x ˜ 5 = 1$, the line plot of $( t , x ˜ t )$ for $t = 1 , … , 5$ becomes a continuous piecewise linear line such that $( 3 , x ˜ 3 )$ is a knot. $x ˜ t$ for $t = 1 , … , 5$ are explicitly $[ x ˜ 1 , x ˜ 2 , x ˜ 3 , x ˜ 4 , x ˜ 5 ] ⊤ = [ 3 , 5 , 7 , 8 , 9 ] ⊤$. Then, from the above theorem, in the case, $x ˜ t ( i )$ for $i = d , e , f$ and $t = 1 − 1 , … , 5 + 2$ are as follows:
$[ x ˜ 1 − 1 ( i ) , x ˜ 1 ( i ) , x ˜ 2 ( i ) , x ˜ 3 ( i ) , x ˜ 4 ( i ) , x ˜ 5 ( i ) , x ˜ 5 + 1 ( i ) , x ˜ 5 + 2 ( i ) ] ⊤ = [ 1 , 3 , 5 , 7 , 8 , 9 , 10 , 11 ] ⊤ .$
Theorem 2.
If $λ ≥ 2 ∥ ( D D ⊤ ) − 1 D y ∥ ∞$, for $i = d , e , f$ and $t = 1 − g , … , T + h$, it follows that
$x ˜ t ( i ) = β ^ 0 t 0 + ⋯ + β ^ p − 1 t p − 1 ,$
where $( β ^ 0 , … , β ^ p − 1 ) = arg min β 0 , … , β p − 1 ∑ t = 1 T ( y t − β 0 t 0 − ⋯ − β p − 1 t p − 1 ) 2$.
Proof.
Because $D g + T + h$ is a $( g + T + h − p ) × ( g + T + h )$$( p + 1 )$-diagonal Toeplitz matrix, such that:
$D g + T + h = a 0 ⋯ a p 0 ⋯ 0 0 ⋱ ⋱ ⋱ ⋮ ⋮ ⋱ ⋱ ⋱ 0 0 ⋯ 0 a 0 ⋯ a p ,$
where $a k = ( − 1 ) p − k p k$ for $k = 0 , … , p$, it may be expressed as
$D g + T + h = G 1 G 2 0 0 D 0 0 H 1 H 2 ,$
where $G 1$ is a $g × g$ upper triangular matrix, $G 2$ is a $g × T$ matrix, $H 1$ is an $h × T$ matrix, and $H 2$ is an $h × h$ unit lower-triangular matrix. For example, when $p = 3$, $g = h = 2$, and $T = 5$:
$D 2 + 5 + 2 = − 1 3 − 3 1 0 0 0 0 0 0 − 1 3 − 3 1 0 0 0 0 0 0 − 1 3 − 3 1 0 0 0 0 0 0 − 1 3 − 3 1 0 0 0 0 0 0 − 1 3 − 3 1 0 0 0 0 0 0 − 1 3 − 3 1 .$
Let $x ˜ g = [ x ˜ 1 − g , … , x ˜ 1 − 1 ] ⊤$, $x ˜ = [ x ˜ 1 , … , x ˜ T ] ⊤$, $x ˜ h = [ x ˜ T + 1 , … , x ˜ T + h ] ⊤$, and $x ˜ g + T + h = [ x ˜ g ⊤ , x ˜ ⊤ , x ˜ h ⊤ ] ⊤$, which is a $( g + T + h )$-dimensional column vector. Then, by definition of $x ˜ g$ and $x ˜ h$, it follows that:
$G 1 x ˜ g + G 2 x ˜ = 0 ,$
$H 1 x ˜ + H 2 x ˜ h = 0 ,$
$D g + T + h x ˜ g + T + h = 0 D x ˜ 0 .$
From Kim et al. (2009), if $λ ≥ 2 ∥ ( D D ⊤ ) − 1 D y ∥ ∞$, it follows that $x ˜ = Π β ^$, where $β ^ = ( Π ⊤ Π ) − 1 Π ⊤ y$. Recalling that $D Π = 0$, we obtain $D g + T + h x ˜ g + T + h = 0$ if $λ ≥ 2 ∥ ( D D ⊤ ) − 1 D y ∥ ∞$, which implies that $x ˜ g + T + h$ may be represented as $Π g + T + h γ$. Because $x ˜ = Π β ^$, $γ$ must equal $β ^$. Therefore, if $λ ≥ 2 ∥ ( D D ⊤ ) − 1 D y ∥ ∞$, then $x ˜ g + T + h = Π g + T + h β ^$. ☐
Theorem 3.
Suppose that $y = Π α$, where $α ≠ 0$ is a p-dimensional column vector. Then, for $i = d , e , f$, it follows that:
$x ˜ g + T + h ( i ) = Π g + T + h α ,$
where $x ˜ g + T + h ( i ) = [ x ˜ 1 − g ( i ) , … , x ˜ T + h ( i ) ] ⊤$.
Proof.
If $y = Π α$, it follows that: $x ˜ = Π α$. Accordingly, $D g + T + h x ˜ g + T + h = 0$, which indicates that $x ˜ g + T + h$ may be represented as $Π g + T + h γ$. Because $x ˜ = Π α$ if $y = Π α$, $γ$ must equal $α$. Therefore, we obtain $x ˜ g + T + h = Π g + T + h α$ if $y = Π α$. ☐
Corollary 1.
Let $x ˜ g + T + h ( i ) = [ x ˜ 1 − g ( i ) , … , x ˜ T + h ( i ) ] ⊤$ for $i = d , e , f$.
(i)
Denote the $( j + 1 )$-th column of Π and that of $Π g + T + h$, respectively, by $τ j$ and by $τ g + T + h , j$ for $j = 0 , … , p − 1$. If $y = τ j$, then $x ˜ g + T + h ( i ) = τ g + T + h , j$ for any $λ > 0$.
(ii)
Let $z$ be a T-dimensional column vector. If $y = Π ( Π ⊤ Π ) − 1 Π ⊤ z$, then $x ˜ g + T + h ( i ) = Π g + T + h ( Π ⊤ Π ) − 1 Π ⊤ z$ for any $λ > 0$.

## 4. Some Remarks on the Main Results

First, we make a remark on Theorem 1. Because $| G 1 | = ( − 1 ) g · p$, from (29), $x ˜ g$ may be expressed with $x ˜$ as $x ˜ g = − G 1 − 1 G 2 x ˜$. Likewise, because $| H 2 | = 1$, from (30), $x ˜ h$ may be expressed with $x ˜$ as $x ˜ h = − H 2 − 1 H 1 x ˜$. Thus, the modified $ℓ 1$ polynomial trend filtering, (16), may be characterized as a filtering that calculates
$− G 1 − 1 G 2 I T − H 2 − 1 H 1 x ˜$
from $y$.7 In addition, from Kim et al. (2009), it follows that $x ˜ → y$ as $λ → 0$. Therefore, we obtain:
$x ˜ g + T + h ( d ) → − G 1 − 1 G 2 I T − H 2 − 1 H 1 y , ( λ → 0 ) .$
Second, we provide a remark on Theorems 2 and 3. Yamada (2017b) recently showed that:
$x ˜ = Π β ^ + x ϕ ˜ ,$
where $x = D ⊤ ( D D ⊤ ) − 1$ and $ϕ ˜$, which is a $( T − p )$-dimensional column vector, is the solution of the following Lasso regression/BPDN:
$min ϕ ∥ y − x ϕ ∥ 2 2 + λ ∥ ϕ ∥ 1 .$
Because $x ⊤ Π = 0$, $Π β ^ + x ϕ ˜$ in (35) represents an orthogonal decomposition of $x ˜$. Here, we show that we may prove Theorems 2 and 3 by using (35) and (36). Premultiplying (35) by $D$ yields $D x ˜ = ϕ ˜$. We accordingly obtain:
$D g + T + h x ˜ g + T + h = 0 ϕ ˜ 0 .$
(i)
From (Osborne et al. 2000, p. 324), if $λ ≥ 2 ∥ x ⊤ y ∥ ∞$, then $ϕ ˜ = 0$. Therefore, we obtain $x ˜ = Π β ^$ and $D g + T + h x ˜ g + T + h = 0$, which proves Theorem 2.
(ii)
If $y = Π α$, where $α ≠ 0$, then $x ⊤ y = 0$, which implies that $λ > 2 ∥ x ⊤ y ∥ ∞ = 0$. Again, from Osborne et al. (2000), we obtain $ϕ ˜ = 0$ if $y = Π α$. Therefore, if $y = Π α$, it follows that $x ˜ = Π β ^ = Π α$ and $D g + T + h x ˜ g + T + h = 0$, which proves Theorem 3.
Third, we give an example of Corollary 1 (i). For the case where $y = [ 1 , … , 5 ] ⊤$ and $p = g = h = 2$, it follows that $x ˜ 2 + 5 + 2 ( d ) = [ − 1 , 0 , 1 , … , 5 , 6 , 7 ] ⊤$ for any $λ > 0$.

## 5. Concluding Remarks

The $ℓ 1$ polynomial trend filtering method is a promising piecewise polynomial curve-fitting method because it does not require prespecifying the number and location of knots. We have shown some theoretical results on this method. One of them is that a small modification of the filtering provides identical trend estimates and also extrapolations of the trend beyond both sample limits. Another is that $x ˜ T + 1 , … , x ˜ T + h$ based on (12) are useless to improve the trend estimates of $ℓ 1$ polynomial trend filtering. We also provided a MATLAB function for calculating the solution of one of the modified filtering methods. The main results of the paper are summarized in Theorems 1–3 and Corollary 1.
Finally, we remark that applying the modified $ℓ 1$ polynomial trend filtering (16)–(18) requires specification of the value of $λ$. For this purpose, the methods proposed in Yamada and Yoon (2016) and Yamada (2018) are applicable.

## Author Contributions

H.Y. contributed mainly to the paper. R.D. joined the project and contributed to complete it.

## Funding

This work was supported in part by the Japan Society for the Promotion of Science KAKENHI Grant Number 16H03606.

## Acknowledgments

We appreciate two anonymous referees for their valuable suggestions and comments. An earlier draft entitled “A Small But Practically Useful Modification to the $ℓ 1$ Trend Filtering” was presented at the 12th International Symposium on Econometric Theory and Applications & 26th New Zealand Econometric Study Group 2016 in Hamilton, New Zealand, 17–19 February 2016. Our thanks to the participants for their useful comments. The usual caveat applies.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix A. Proof of (15)

Because $Δ 3 x ˜ T + j = Δ 2 x ˜ T + j − Δ 2 x ˜ T + j − 1$, from $Δ 3 x ˜ T + j = 0$ for $j = 1 , … , h$, we obtain $Δ 2 x ˜ T + k = Δ 2 x ˜ T$ for $k = 1 , … , h$. Then, because $∑ k = 1 l ( Δ 2 x ˜ T + k ) = l ( Δ 2 x ˜ T )$ for $l = 1 , … , h$ and $∑ k = 1 l ( Δ 2 x ˜ T + k ) = Δ x ˜ T + l − Δ x ˜ T$, it follows that
$Δ x ˜ T + l = Δ x ˜ T + l ( Δ 2 x ˜ T ) , ( l = 1 , … , h ) .$
Furthermore, because $∑ l = 1 j ( Δ x ˜ T + l ) = j ( Δ x ˜ T ) + ( ∑ l = 1 j l ) ( Δ 2 x ˜ T )$ for $j = 1 , … , h$ and $∑ l = 1 j ( Δ x ˜ T + l ) = x ˜ T + j − x ˜ T$, we finally obtain:
$x ˜ T + j = x ˜ T + j ( Δ x ˜ T ) + j ( j + 1 ) 2 ( Δ 2 x ˜ T ) , ( j = 1 , … , h ) .$

## References

1. Beck, Amir. 2014. Introduction to Nonlinear Optimization Theory, Algorithms, and Applications with MATLAB. Philadelphia: SIAM. [Google Scholar]
2. Chen, Scott Shaobing, David L. Donoho, and Michael A. Saunders. 1998. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20: 33–61. [Google Scholar] [CrossRef]
3. Grant, M., and Stephen Boyd. 2013. CVX: Matlab Software for Disciplined Convex Programming, Version 2.0 Beta. Available online: http://cvxr.com/cvx (accessed on 9 July 2018).
4. Harchaoui, Zaıd, and Céline Lévy-Leduc. 2010. Multiple change-point estimation with a total variation penalty. Journal of the American Statistical Association 105: 1480–93. [Google Scholar] [CrossRef]
5. Hodrick, Robert J., and Edward C. Prescott. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking 29: 1–16. [Google Scholar] [CrossRef]
6. Hoerl, Arthur E., and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. [Google Scholar] [CrossRef]
7. Kim, Seung-Jean, Kwangmoo Koh, Stephen Boyd, and Dimitry Gorinevsky. 2009. 1 trend filtering. SIAM Review 52: 339–60. [Google Scholar] [CrossRef]
8. Koenker, Roger, Pin Ng, and Stephen Portnoy. 1994. Quantile smoothing splines. Biometrika 81: 673–80. [Google Scholar] [CrossRef]
9. Miller, Morton D. 1946. Elements of Graduation. Philadelphia: Actuarial Society of America and American Institute of Actuaries. [Google Scholar]
10. Mohr, Matthias F. 2005. A trend-Cycle(-Season) Filter. European Central Bank Working Paper, No. 499. Frankfurt am Main, Germany: European Central Bank. [Google Scholar]
11. Nocon, Alicja S., and William F. Scott. 2012. An extension of the Whittaker–Henderson method of graduation. Scandinavian Actuarial Journal 2012: 70–79. [Google Scholar] [CrossRef]
12. Osborne, Michael R., Brett Presnell, and Berwin A. Turlach. 2000. On the lasso and its dual. Journal of Computation and Graphical Statistics 9: 319–37. [Google Scholar]
13. Phillips, Peter C. B. 2010. Two New Zealand pioneer econometricians. New Zealand Economic Papers 44: 1–26. [Google Scholar] [CrossRef]
14. Schuette, Donald R. 1978. A linear programming approach to graduation. Transactions of Society of Actuaries 30: 407–31. [Google Scholar]
15. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 58: 267–88. [Google Scholar]
16. Tibshirani, Robert, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistics Society: Series B 67: 91–108. [Google Scholar] [CrossRef]
17. Tibshirani, Ryan J., and Jonathan Taylor. 2011. The solution path of the generalized lasso. Annals of Statistics 39: 1335–71. [Google Scholar] [CrossRef]
18. Tibshirani, Ryan J. 2014. Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics 42: 285–323. [Google Scholar] [CrossRef]
19. Winkelried, Diego. 2016. Piecewise linear trends and cycles in primary commodity prices. Journal of International Money and Finance 64: 196–213. [Google Scholar] [CrossRef]
20. Weinert, Howard. 2007. Efficient computation for Whittaker–Henderson smoothing. Computational Statistics and Data Analysis 52: 959–74. [Google Scholar] [CrossRef]
21. Yamada, Hiroshi. 2017a. Estimating the trend in US real GDP using the 1 trend filtering. Applied Economics Letters 24: 713–16. [Google Scholar] [CrossRef]
22. Yamada, Hiroshi. 2017b. A trend filtering method closely related to 1 trend filtering. Empirical Economics. [Google Scholar] [CrossRef]
23. Yamada, Hiroshi. 2017c. A small but practically useful modification to the Hodrick–Prescott filtering: A note. Communications in Statistics–Theory and Methods 46: 8430–34. [Google Scholar] [CrossRef]
24. Yamada, Hiroshi. 2018. A new method for specifying the tuning parameter of 1 trend filtering. Studies in Nonlinear Dynamics and Econometrics. [Google Scholar] [CrossRef]
25. Yamada, Hiroshi, and Ruixue Du. 2018. A modification of the Whittaker–Henderson method of graduation. Communications in Statistics–Theory and Methods. forthcoming. [Google Scholar]
26. Yamada, Hiroshi, and Lan Jin. 2013. Japan’s output gap estimation and 1 trend filtering. Empirical Economics 45: 81–88. [Google Scholar] [CrossRef]
27. Yamada, Hiroshi, and Gawon Yoon. 2014. When Grilli and Yang meet Prebisch and Singer: Piecewise linear trends in primary commodity prices. Journal of International Money and Finance 42: 193–207. [Google Scholar] [CrossRef]
28. Yamada, Hiroshi, and Gawon Yoon. 2016. Selecting the tuning parameter of the 1 trend filter. Studies in Nonlinear Dynamics and Econometrics 20: 97–105. [Google Scholar] [CrossRef]
• 1.$ℓ 1$ trend filtering is supported in several standard software packages such as MATLAB, R, Python, and EViews.
• 2.(4) where $p = 1$ has been known as total variation denoising in signal processing, which may be regarded as a form of the fused Lasso by Tibshirani et al. (2005). Harchaoui and Lévy-Leduc (2010) proposed using the filtering to detect multiple change points. (4) may be regarded as a form of the generalized Lasso by Tibshirani and Taylor (2011). In addition, we note that there exist some pioneering works on the filtering that uses the $ℓ 1$-norm penalty. (Miller 1946, sct. 1.7) mentioned that $∑ t = p + 1 T | Δ p x t |$ could be an alternative measure of smoothness to $∑ t = p + 1 T ( Δ p x t ) 2$, Schuette (1978) introduced a filtering, defined as:
$min x 1 , … , x T ∑ t = 1 T | y t − x t | + λ ∑ t = p + 1 T | Δ p x t | ,$
and Koenker et al. (1994) presented $ℓ 1$-norm penalized quantile smoothing spline. Incidentally, Schuette (1978) and Koenker et al. (1994) motivate us to consider a penalized quantile regression that is obtainable by replacing the quadratic loss function in (4) by the check loss function:
$min x 1 , … , x T ∑ t = 1 T ρ τ ( y t − x t ) + λ ∑ t = p + 1 T | Δ p x t | ,$
where, letting $τ ∈ ( 0 , 1 )$,
$ρ τ ( u ) = τ | u | ( u ≥ 0 ) , ( 1 − τ ) | u | ( u < 0 ) ,$
which is suggested by (Kim et al. 2009, sct. 7.3).
• 5.In the objective function of (4), $∑ t = 1 T ( y t − x t ) 2$ is coercive because it is a quadratic function whose Hessian matrix is positive definite. See, e.g., (Beck 2014, Lemma 2.42).
• 6.In the case, $[ Δ 2 x ˜ 3 , Δ 2 x ˜ 4 , Δ 2 x ˜ 5 ] ⊤$ is expected to become sparse, as in the numerical example, because $∑ t = 3 5 | Δ 2 x t |$ is included as a penalty.
• 7.Let us calculate $− H 2 − 1 H 1 x ˜$ for the case where $p = 3$, $g = h = 2$, and $T = 5$. From (28), it follows that
$− H 1 x ˜ = x ˜ T − 2 − 3 x ˜ T − 1 + 3 x ˜ T x ˜ T − 1 − 3 x ˜ T = x ˜ T + ( Δ x ˜ T ) + ( Δ 2 x ˜ T ) − 2 x ˜ T − ( Δ x ˜ T ) .$
$− H 2 − 1 H 1 x ˜ = 1 0 3 1 x ˜ T + ( Δ x ˜ T ) + ( Δ 2 x ˜ T ) − 2 x ˜ T − ( Δ x ˜ T ) = x ˜ T + ( Δ x ˜ T ) + ( Δ 2 x ˜ T ) x ˜ T + 2 ( Δ x ˜ T ) + 3 ( Δ 2 x ˜ T ) ,$