Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator

Aydın, Dursun; Ahmed, Syed Ejaz; Yılmaz, Ersin

doi:10.3390/e23121586

Open AccessArticle

Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator

by

Dursun Aydın

¹

,

Syed Ejaz Ahmed

² and

Ersin Yılmaz

^1,*

¹

Department of Statistics, Faculty of Science, Mugla Sitki Kocman University, Kotekli 48000, Turkey

²

Department of Mathematics and Statistics, Faculty of Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON L2S 3A1, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(12), 1586; https://doi.org/10.3390/e23121586

Submission received: 19 October 2021 / Revised: 20 November 2021 / Accepted: 22 November 2021 / Published: 27 November 2021

(This article belongs to the Special Issue Nonparametric Statistical Inference with an Emphasis on Information-Theoretic Methods)

Download

Browse Figures

Versions Notes

Abstract

:

This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan–Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data’s structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison.

Keywords:

adaptive splines; B-splines; right-censored data; semiparametric regression; synthetic data transformation; time series

1. Introduction

Time series datasets are censored from the right under specific conditions, such as a detection limit or an insufficient observation process. Consider a device which cannot measure values above a certain point, which is known as a detection limit. Since the device cannot determine the real value of an observation above its detection limit, such observations are recorded as right-censored data points. The hourly observed cloud ceiling heights data collected by the National Center for Atmospheric Research (NCAR) and modelled by [1,2] can be used as an example of a right-censored time series. Although right-censored time series are encountered frequently in the real world, in the literature, there are truly few studies completed on the estimation of right-censored time series. This may be because censorship is an unwanted data irregularity for the researchers, and it is therefore often ignored or solved by outdated techniques.

To solve the censorship problem before modelling the time series, reference [1] used the Gaussian imputation technique to estimate the series using modified ARMA models. In a similar manner, references [2,3] solved the censorship problem by using data imputation techniques. The common ground of these studies is the use of imputation and data augmentation methods to estimate the regression models with autoregressive errors for right-censored time series. On the other hand, there is an easier way to handle the censorship problem called synthetic data transformation. Although data imputation techniques have some merits, they are generally based on iterative algorithms and their calculations are costly. Reference [4] estimated the temporally correlated and right-censored series by Nadaraya–Watson estimator nonparametrically, solving the censorship problem using a data transformation technique. Various data transformation (or synthetic data) methods have been proposed and studied in the literature for independent and identically distributed (i.i.d.) datasets; for example, see [5,6,7]. Because synthetic data transformation manipulates the data structure, which is disadvantageous, this solution method is no longer the preferred technique for right-censored time series. This paper aims to propose a method which can overcome the disadvantage of the synthetic data transformation method.

Note that the studies mentioned above consider the modeling of time series data using parametric or nonparametric methods. The data structure of a time series in the real world is generally not suitable for parametric modelling, because it requires rigid assumptions to reach reasonable estimates. Single-index nonparametric models, on the other hand, are very flexible, which is an important advantage over parametric methods and there are valuable studies on the subject [2,8,9]. However, nonparametric approaches lose their statistical efficiencies, when the number of covariates increases. In addition, it should be noted that, when a time series dataset is right-censored, the weaknesses of both methods are further increased.

Considering the issues mentioned above, this paper adopts semiparametric regression model for estimating right-censored time series. Although several researchers have introduced different types of semiparametric estimators for time series data, such as [10,11], there remains a significant gap in the research regarding the modelling of right-censored time series data. To address this absence, our paper proposes a modified semiparametric A-spline (AS) estimator based on synthetic data transformation. Thus, the bidirectional flexibility of the semiparametric model will be used, and the censorship problem will be effectively solved.

The paper is designed as follows: the methodology and fundamental ideas about right-censored semiparametric time series model with autoregressive errors and the synthetic data transformation method are given in Section 2. Section 3 introduces a modified AS estimator for parametric and nonparametric components of the right-censored time series model, and a semiparametric B-spline (BS) is given as a benchmark. Section 4 involves the statistical properties and evaluation criteria for both the modified AS and benchmark BS methods. Section 5 introduces some additional information about the penalty term of the semiparametric AS approach. Section 6 and Section 7 contain a detailed Monte Carlo simulation study and a real-world data example, respectively. Conclusions are presented in Section 8.

2. Background

The classical semiparametric model can be defined as a hybrid model with a finite dimensional parametric component and a nonparametric component having an infinite dimensional nuisance parameter. See [12,13,14,15] for additional information. In both theory and practice, the semiparametric model brings a new perspective to data modeling, since it includes both parametric and nonparametric components. As mentioned in the previous section, it is well-suited to time series data, because it brings the advantages of the semiparametric model to time series analysis.

Suppose that a time series dataset

{Z_{t}, x_{t}, s_{t}, t = 1, 2, \dots, n}

satisfies an uncensored semiparametric time series model of the form:

Z_{t} = x_{t} β + f (s_{t}) + ε_{t}, a = s_{1} < \dots < s_{n} = b,

(1)

where

Z_{t}^{'}

s are the observations of stationary time series,

x_{t} = (x_{t 1}, \dots, x_{t p})

and

x_{1}, \dots, x_{n}

are known p-dimensional vectors of the explanatory variables,

β = {(β_{1}, β_{2}, \dots, β_{p})}^{'}

is an unknown

p

-dimensional vector of the regression coefficients to be estimated,

f (.)

is an unknown smooth function that describes the relationship between

Z_{t}

and a nonparametric temporal covariate

s_{t}

, and finally,

ε_{t}

’s are the stationary autoregressive error terms generated by:

ε_{t} = ρ_{1} ε_{t - 1} + \dots + ρ_{k} ε_{t - k} + u_{t},

(2)

where

ρ_{1}, \dots, ρ_{k}

are the autoregressive coefficients, and

u_{t}

denotes the independent and identically distributed random error terms with mean zero and a constant variance. Model (1) does not include lagged

Z_{t}^{'}

s and has auto-correlated errors. This expression makes it a suitable model for the semiparametric regression analysis of certain kinds of time series.

A common problem in practice is that dependent observations

Z_{t}^{'} s

cannot be perfectly collected due to limitations including the detection limit of an evaluation tool or the end time for the study. To express this situation algebraically, we assume that

Z_{t}^{'} s

are censored from the right by a non-negative random variable representing detection limit

C_{t}

. Therefore, instead of observing the values of

Z_{t}

, we now observe:

Y_{t} = \min (Z_{t}, C_{t}) and δ_{t} = {\begin{matrix} 1 i f Z_{t} \leq C_{t} (uncensored) \\ 0 i f Z_{t} > C_{t} (censored) \end{matrix},

(3)

where

δ_{t}

’s denote the censoring information. Suppose that we are interested in estimating the mean semiparametric regression function. The distribution of the observable random variables does not identify the mean regression function uniquely. However, this problem can be solved as follows.

Let

F_{Z} (α) = P (Z \leq α), G_{C} (α) = P (C \leq α)

, and

H_{Y} (α) = P (Y \leq α)

for

α \in ℝ

be cumulative distribution functions of non-negative random variables

Z_{t}, C_{t}, and Y_{t},

respectively. If random variables

Z_{t}

and

C_{t}

are independent, then the survival function

{\bar{H}}_{Y} (α)

for observed response variable

Y_{t}

can be defined from the basic relationship between

F_{Z} and G_{C}

:

{{\bar{H}}_{Y} (α) = 1 - H_{Y} (α)} = [(1 - F_{Z} (α)) \cdot (1 - G_{C} (α))] .

(4)

Given a random sample from the distribution of (

Y_{t}, X_{t}, s_{t}, δ_{t}

), it is of interest to examine the explanatory variables’ effect on the observations of time series (i.e., response variable) by estimating the survival function

{\bar{H}}_{Y} (α) = P (Y > α)

, which is the regression function

E (Y_{t} | x_{t}, s_{t}) = x_{t} β + f (s_{t})

, the conditional mean of time series

Y_{t}

. However, because of the censoring, ordinary methods cannot be applied directly to estimate the regression function. To overcome censored observations, a data transformation technique should be used. One of the most widely used techniques is the synthetic data transformation, detailed in the section below.

Synthetic Data

To extend the penalized sum of squares approach to right-censored semiparametric regression analysis, we updated the synthetic data approach developed by [5]. The first step is to create an unbiased synthetic response variable of which the expectation is equal to the original and then to obtain the penalized squares estimator by means of this synthetic variable. The main goal of this transaction is to consider the censoring effect on the distribution of response variable. In the case of censored data, the authors of [16,17] used the synthetic data approach.

In the synthetic approach, we replace observed variable

Y_{t}

with transformed data

Y_{t G}

; a transformation maintains the conditional expectation of original variable

Z_{t}

. To describe this situation, it is easier to proceed directly using the cumulative distributions given in Lemma 1 below. Note also that if

G_{C}

is known then it is possible to transform observed data

{(Y_{t}, δ_{t}), t = 1, \dots, n}

into unbiased synthetic data, given by:

Y_{t G} = \frac{δ_{t} Y_{t}}{1 - G_{C} (Y_{t})},

(5)

where

G_{C} (.)

is the distribution function of the censoring time

C_{t}

, as defined before. It should be noted that the distribution of

G_{C}

is rarely known. In this case, we use the Kaplan–Meier estimator defined by:

1 - {\hat{G}}_{c} (y) = \prod_{t = 1}^{n} {(\frac{n - t}{n - t + 1})}^{I [Y_{(t)} \leq y, δ_{(t)} = 0]}, y \geq 0,

(6)

where

Y_{(1)} \leq \dots \leq Y_{(n)}

are the sorted values of

Y_{1}, \dots, Y_{n}

and

δ_{(t)}

is the

δ_{t}

related to

Y_{(t)}

. Equation (5) has the following properties: (a) if distribution

G_{C}

is selected arbitrarily, some

Y_{(i)}

can be identical. In this case, the ranking of

Y_{1}, \dots, Y_{n}

into

Y_{(1)} \dots Y_{(n)}

is not unique. However, the Kaplan–Meier estimator allows us to define the ranking of

Y_{t}

uniquely; (b)

{\hat{G}}_{C} (.)

has jumps only at the censored observations of the time series (see [18]).

Substituting

{\hat{G}}_{C} (.)

for

G_{C} (.)

in Equation (5), we construct the following synthetic data, given by:

Y_{t \hat{G}} = \frac{δ_{t} Y_{t}}{1 - {\hat{G}}_{C} (Y_{t})} .

(7)

Then, one practical consequence of the following Lemma is that synthetic data

Y_{t \hat{G}}

and completely observed response times

Z_{t}

have the same conditional expectations, as claimed in before.

Lemma 1.

Consider time series data

Z_{t}

denoted as a response variable. If the data is censored by random censoring variable

C

with distribution

G_{C}

, transform observed series

Y_{t} = m i n (Z_{t}, C_{t})

to

Y_{t G} = δ_{t} Y_{t} / 1 - G_{C} (Y_{t})

in an unbiased form, as defined in Equation (4). Based on the information, it can be easily verified that

E [Y_{t G} | x_{t}, s_{t}] = E [Z_{t} | x_{t}, s_{t}] = x_{t} β + f (s_{t})

. However, generally,

G_{C}

is unknown as mentioned before. Therefore,

Y_{t \hat{G}}

is used which is defined in Equation (7), instead of

Y_{t G}

. Because of

{\hat{G}}_{c} \to G

when

n \to \infty

, (see [5]), it is ensured that

E [Y_{t \hat{G}} | x_{t}, s_{t}] ≅ E [Y_{t G} | x_{t}, s_{t}] = x_{t} β + f (s_{t})

.

Let us consider that

τ_{H_{Y}} = \sup {α : H_{Y} (α) < 1}

, where

H_{Y} (.)

is defined right after Equation (3). In the literature, the convergence rate of the Kaplan–Meier estimator is examined in two classes: (i) restriction of time-interval as

[0, α]

with

α < τ_{H_{Y}}

; (ii) extension of time-interval

[0, τ_{H_{Y}}]

(see [19] for more detailed discussions). Here, the convergence rate of the Kaplan–Meier estimator is inspected with regard to case (ii). However,

[0, τ_{H_{Y}}]

cannot be used without some strong conditions that can be given by:

(i): $G (τ_{H_{Y}}) < 1 = F (τ_{H_{Y}})$ ;
(ii): $τ_{H_{Y}} < \infty$ ;
(iii): $\int_{0}^{τ_{H_{Y}}} \frac{1}{1 - G (α)} d F < \infty$ .

Details about conditions (i)–(iii) were studied by [20]. The convergence of

\hat{G} \to G

over the interval

[0, τ_{H_{Y}}]

can be provided. Reference [19] clearly shows both strong and weak convergences at the rate

n^{- ϑ}

where

0 \leq ϑ \leq 1 / 2

.

The proof of Lemma 1 is given in Appendix A.

The major concern of this paper is to overcome the censoring problem and to estimate the semiparametric time series model efficiently. To achieve this goal, we used two different approaches, BS and modified AS estimators. In the following section, we applied these approaches to the transformed data to estimate time series observations under random right-censorship.

3. Estimating the Semiparametric Model Based on the BS Estimator

We first introduce the BS considered for estimating the components of model (1). A univariate B-spline is constructed by a piecewise polynomial function of degree

q

such that its derivatives up to order (

q - 1

) is continuous at each knot point

r_{1}, \dots, r_{k} .

The set of BSs of degree

q

over the real numbers

(r_{1}, \dots, r_{k}) = r

is a vector space of dimension

q + k + 1

. In addition, note that

k

denotes the number of interior knots, while

q \geq 0

indicates the polynomial order. For example, the polynomials of order

q = 0, 1, 2, and 3

are defined as constant, linear, quadratic, and cubic BS basis functions, respectively. If the knots are equally spaced (i.e., separated by same distance

h = (r_{k + 1} - r_{k}

)), the knot points and the corresponding BSs are called uniform.

Definition 1.

Given an ordered knot vector

r = {r_{1} \leq r_{2} \leq \dots \leq r_{k}}

in the domain of covariate

s_{t}

, then

i^{t h}

BS basis functions

{B_{i, q} (s_{t}), i = 1, 2, \dots, q + k + 1}

of degree

q = 0

and

q > 0

can be defined in recursive series, respectively, as:

B_{i, 0} (s) = {\begin{matrix} 1 & i f r_{i} \leq s \leq r_{i + 1} \\ 0, & o t h e r w i s e \end{matrix},

(8)

B_{i, q} (s) = \frac{s - r_{i}}{r_{i + q} - r_{i}} B_{i, q - 1} (s) + \frac{r_{i + q + 1} - s}{r_{i + q + 1} - r_{i + 1}} B_{i + 1, q - 1} (s) .

(9)

Note that if the denominator of Equation (9) is equal to zero, then the BS basis function is assumed to be zero. From Equations (8) and (9), a set of

(q + k + 1)

basis functions have the following important properties:

(a) The BS basis functions form a partition of unity,

\sum_{i = 1}^{q + k + 1} B_{i, q} (s) = 1

;

(b) For all values of covariate

s_{t},

B_{i, q} (s) \geq 0

; and

(c)

B_{i, q} (s)

is realized in the interval [

r_{k}, r_{k + q + 1}

].

Reference [21] proposes an algorithm to solve equation (9). See also the work of [22] for more detailed discussions on the BS approximation. Note also that the BS curve can be uniquely represented as a linear combination of the BSs basis functions in Equation (9), as given in the next section. Note that references [23,24] could be counted as recent studies about BSs.

3.1. BS Estimator

As previously noted, in this paper, we fit semiparametric time series model (1) with right-censored data. For this purpose, the BS estimator can be used as an approximation method. Using the synthetic data in Equation (7), we estimated the parametric and nonparametric components of model (1). Therefore, the sum of the squares of the differences between the censored time series values

Y_{t \hat{G}}

and

(x_{t} β + f (s_{t}))

are minimum. Assume that

f (.)

is a smooth function that can be approximated by a linear combination of the BSs basis functions in Equations (8) and (9):

f (s) ≅ \sum_{i = 1}^{m = q + k + 1} α_{i} B_{i, q} (s) = B α,

(10)

where

m = (q + k + 1)

is the total number of BS basis functions being used,

{\hat{α}}_{i}^{'} s

are estimated coefficients (or control points) for each BS,

B

is an

(n \times m)

-dimensional matrix which includes BSs as defined by Equation (9) and

α = (α_{1}, \dots, α_{m})^{'}

is a parameter vector of the BS function. Note also that the autoregressive errors in model (1) follow an

n

-dimensional multivariate normal distribution with a zero mean and stationary

(n \times n)

covariance matrix

Σ

, that is,

{(ε_{1}, \dots, ε_{n})}^{T} \sim N_{n} (0, Σ),

where the covariance matrix

Σ

is a symmetric and positive definite matrix with elements:

Σ = \frac{σ_{u}^{2}}{1 - ρ^{2}} R, R (t, j) = ρ^{| t - j |}, 1 \leq (t, j) \leq n .

(11)

Throughout the paper, the notation is used as

Σ^{- 1} = V

. Note that

V

is generally unknown. However, its elements can be obtained by the generalized least squares (GLS) based on an iterative process. Then, as in [25] which is a penalized BS study combining BS and difference penalties, the estimates of the components of semiparametric model (1) were obtained by minimizing the penalized sum of squares (

P S S

) criterion:

P S S = \sum_{t = 1}^{n} V {Y_{t \hat{G}} - \sum_{j = 1}^{p} x_{t j} β_{j} - \sum_{i = 1}^{m} α_{i} B_{i, q} (s)}^{2} + λ \sum_{i = q + 1}^{m} {(Δ^{q} α_{i})}^{2},

(12)

where

∆ α_{i} = (α_{i} - α_{i - 1})

is the first-order difference penalty on the coefficients of the BSs. The other differences can be defined as follows:

∆^{2} α_{i} = ∆ (∆ α_{i}) = (α_{i} - α_{i - 1}) - (α_{i - 1} - α_{i - 2}) = α_{i} - 2 α_{i - 1} + α_{i - 2},

(13)

and similarly:

Δ^{q} α_{i} = ∆ (Δ^{q - 1} α_{i}) .

(14)

Note that if degree

q = 0

in Equation (12), we obtain semiparametric ridge regression based on BSs. When

λ = 0

in Equation (12), we have the minimization equation of ordinary least squares regression with a correlated error. If

λ > 0

, the penalty only influences the main diagonal and

q

sub-diagonals (on both sides of the main diagonal elements) of the banded structure system due to the limited overlap of the BSs.

We rewrite the minimization criterion described as Equation (12) in a matrix and vector notation:

P S S = {(Y_{\hat{G}} - X β - B α)}^{'} V (Y_{\hat{G}} - X β - B α) + λ ‖ D α ‖^{2},

(15)

where

‖ . ‖

denotes Euclidean norm,

X = (x_{1}, \dots, x_{n})^{'}

,

Y_{\hat{G}} = (Y_{1 \hat{G}}, \dots, Y_{t \hat{G}})^{'}

is the synthetic response vector defined in Equation (7),

λ > 0

is a smoothing parameter, and

D

denotes the matrix notation of the difference operator

(∆^{q})

defined in Equation (13). For example,

D

is an

(n - 2) \times n

-dimensional banded matrix that corresponds to the second-order difference penalty, given by:

D = [\begin{matrix} 1 & - 2 & \begin{matrix} 1 & \dots & 0 \end{matrix} \\ ⋮ & ⋱ & \begin{matrix} ⋱ & ⋱ & ⋮ \end{matrix} \\ 0 & \dots & \begin{matrix} 1 & - 2 & 1 \end{matrix} \end{matrix}] .

(16)

From simple algebraic operations, it follows that the solution to the minimization problem in Equation (15) satisfies the following block matrix equation:

(\begin{matrix} X^{'} V X & X^{'} V B \\ B^{'} V X & (B^{'} V B + λ D^{'} D) \end{matrix}) (\begin{matrix} β \\ α \end{matrix}) = (\begin{matrix} X^{'} \\ B^{'} \end{matrix}) V Y_{\hat{G}} .

(17)

Given a parameter

λ > 0

, the corresponding estimators based on BSs for vectors

β

and

α

can be easily obtained by:

{\hat{α}}_{B S} = {[B^{'} V B + λ D^{'} D]}^{- 1} B^{'} V (Y_{\hat{G}} - X {\hat{β}}_{B S}),

(18)

and:

{\hat{β}}_{B S} = {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} - A_{B S}) V Y_{\hat{G}},

(19)

where

A_{B S} = X^{'} V B {[B^{'} V B + λ D^{'} D]}^{- 1} B^{'} V .

It should be noted that the estimates of the unknown regression function in a censored semiparametric model are obtained by:

{\hat{f}}_{B S} = B {\hat{α}}_{B S} = [\hat{f} (s_{1}), \dots, \hat{f} (s_{n})]^{'} .

(20)

From Equations (19) and (20), we see that the fitted values of dependent time series data can be written as:

{\hat{μ}}_{B S} = (X {\hat{β}}_{B S} + {\hat{f}}_{B S}) = H_{B S} Y_{\hat{G}} = E [Y | X, s],

(21)

where

H_{B S}

is a hat matrix for BSs and computed as follows:

H_{B S} = [X {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} - A_{B S}) V (I - M_{B S}) + M_{B S}],

(22)

where

M_{B S} = B {[B^{'} V B + λ D^{'} D]}^{- 1} B^{'} V

.

3.2. AS Estimator

The adaptive spline (AS) applies an adaptive ridge penalty to the BS method, which makes it more flexible for knot determination. The AS concept is explained in [26] in a nonparametric context. However, in this paper, we generalized this estimation concept to the semiparametric environment based on synthetic response observations. It should be noted that the location and number of knots have crucial importance in terms of synthetic data transformation. This issue is discussed in detail in Section 4.3. The point here is that a more efficient estimator based on synthetic responses is needed, as most of the existing smoothing techniques (spline smoothing, kernel smoothing, etc.) cannot properly handle synthetic data. This article aims to solve this issue with the AS estimator.

When a BS is defined on the knots

r_{1} \leq r_{2} \leq \dots \leq r_{k}

such that

Δ^{q} α_{i} = 0

for some

i^{t h}

knot, it may be reparametrized as a BS on the knots

r_{1}, r_{2}, \dots, r_{i - 1}, r_{i + 1}, \dots, r_{k}

. Accordingly, when

m = (q + k + 1)

, we want to put a penalty on the number of non-zero differences indicated as below:

λ \sum_{i = q + 1}^{m} {‖ Δ^{q} α_{j} ‖}_{0},

(23)

where

Δ^{q} α_{i}

is the

q^{t h}

-order difference operator and

{‖ Δ^{q} α_{i} ‖}_{0}

is the

L_{0}

-norm of the differences, that is,

{‖ Δ^{q} α_{i} ‖}_{0}

= 0 if

Δ^{q} α_{j} = 0

, otherwise,

{‖ Δ^{q} α_{i} ‖}_{0}

= 1, and

λ

is a positive penalty parameter that ensures the tradeoff between the goodness of fit to the data and the smoothness of the fitted curve. This penalty enables us to remove knot

r_{i}

that is not related to the smoothing problem, to join the neighbor intervals

[r_{i - 1}, r_{i})

and

[r_{i}, r_{i + 1})

, and to carry on fitting with a BS described over the remaining knot points. Note also that when

λ \to 0

, the fitted curve becomes a BS with knots

r_{i}, i = 1, 2, \dots, k

and when

λ \to \infty,

the fitted function becomes a polynomial of degree

q

.

It should be emphasized that one of the important points about the adaptive ridge penalty is that Equation (23) cannot be differentiated due to the

L_{0}

-norm. As a result, the fitting process is made numerically untraceable. An approximate solution to dealing with the

L_{0}

-norm is provided by [27,28]. Following the studies of these authors, we approximate the

L_{0}

-norm by using an iterative process referred to as an “adaptive ridge” based on synthetic data. The new criterion function is expressed by the following weighted penalized sum of squares:

W P S S = {(Y_{\hat{G}} - X β - B α)}^{'} V (Y_{\hat{G}} - X β - B α) + λ \sum_{i = q + 1}^{m} w_{i} {(Δ^{q} α_{i})}^{2},

(24)

where

w_{i}

’s denote the positive weights. It should be noted that the penalty is close to the

L_{0}

-norm of the differences when the weights are iteratively calculated from the parameter vector

α

of BS following the equation:

w_{i} = {[{(Δ^{q} α_{i})}^{2} + γ^{2}]}^{- 1}, γ > 0,

(25)

where

γ

is a constant properly determined by the researcher.

Remark 1.

There are a few important points to know about the selection of

γ

. If

(Δ^{q} α_{i}) < γ

, then the magnitudes of

w_{i}

’s might be quite large, resulting in

(Δ^{q} α_{i}) ≅ 0

and the penalty term turning into

w_{i} {(Δ^{q} α_{i})}^{2} ≅ 0

. Furthermore, if

(Δ^{q} α_{i}) ≫ γ

, then

w_{i} {(Δ^{q} α_{i})}^{2} ≅ {‖ Δ^{q} α_{i} ‖}_{0}

. This convergence gives us a measure of how relevant the

i^{t h}

knot point is. In practice, one possible choice, suggested by [28], is

γ = 10^{- 5}

. They select the knots (denoted as

r_{i^{*}}

) with a weighted difference bigger than 0.99. The number of parameters of the chosen BS is

m_{λ} = q + k_{λ} + 1

, where

k_{λ}

denotes the number of selected knot points.

Note that reference [28] provides a figure to show the effects of different norm degrees (

q

) on the quality of estimation. It is seen from that the performance of estimation does not change for different values of

γ

when norm degree is zero

(q = 0)

. However, it affects the performance seriously if

q > 0

.

For some

λ > 0

and non-negative weights, the

W P S S

of Equation (26) can be rewritten as:

W P S S = {(Y_{\hat{G}} - X β - B α)}^{'} V (Y_{\hat{G}} - X β - B α) + λ α^{'} K α,

(26)

where

K

is a penalty matrix and written as

K = D^{'} W D

, where

W = diag (w_{q + 1}, \dots, w_{m})

and

D

is the matrix form of the difference operator

Δ^{q}

, as defined in Equation (13). Simple algebraic operations show that the solution to the minimization problem

W P S S

in Equation (26) satisfies the block matrix equation:

(\begin{matrix} X^{'} V X & X^{'} V B \\ B^{'} V X & (B^{'} V B + λ K) \end{matrix}) (\begin{matrix} β \\ α \end{matrix}) = (\begin{matrix} X^{'} \\ B^{'} \end{matrix}) V Y_{\hat{G}} .

(27)

By similar arguments as in the case of the BS approach, the corresponding estimators

{\hat{α}}_{A S}

and

{\hat{β}}_{A S}

of

α

and

β

, based on the right-censored semiparametric time series model (1) with correlated data, can be easily obtained, respectively, as:

{\hat{α}}_{A S} = {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} (Y_{\hat{G}} - X {\hat{β}}_{A S}),

(28)

and:

{\hat{β}}_{A S} = ({(X^{'} V - A_{A S}) X)}^{- 1} (X^{'} - A_{A S}) V Y_{\hat{G}},

(29)

where

A_{A S} = X^{'} V B {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'}

. The proofs and derivations of Equations (28) and (29) are given in Appendix B. Notice that the estimates corresponding to the nonparametric part of the semiparametric model (1) are obtained using Equation (28) as described in the following equation:

{\hat{f}}_{A S} = B {\hat{α}}_{A S} = [\hat{f} (s_{1}), \dots, \hat{f} (s_{n})]^{'} .

(30)

From Equations (29) and (30), we can see that the fitted values of the dependent time series data can be obtained as:

{\hat{μ}}_{A S} = (X {\hat{β}}_{A S} + {\hat{f}}_{A S}) = H_{A S} Y_{\hat{G}} = E [Y | X, s],

(31)

where

H_{A S}

denotes the hat matrix, given by:

H_{A S} = [X {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} - A_{A S}) V (I - M_{A S}) + M_{A S}],

(32)

with

M_{A S} = B {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'}

.

To make the computation process efficient, all penalty terms

(D^{T} W D)

are calculated by using the iteration process instead of finding matrix

D

and knot set individually. The iterative algorithm is given in Algorithm 1 below.

Algorithm 1. Iterative algorithm process for the modified A-spline (AS) estimator

{\hat{α}}_{A S}

.

Input:

X

,

s

,

Y_{\hat{G}} .

Output:

{\hat{β}}_{A S}^{(i)} = ({\hat{β}}_{1}^{(i)}, {\hat{β}}_{2}^{(i)}, \dots, {\hat{β}}_{p}^{(i)})

{\hat{α}}_{A S}^{(i)} = {({\hat{α}}_{1}^{(i)}, {\hat{α}}_{2}^{(i)}, \dots, {\hat{α}}_{q + k + 1}^{(i)})}^{'}

1: Begin

2: Give initial values,

β^{(0)} = 1_{p}

,

α^{(0)} = 0_{q + k + 1}

and

W^{(0)} = I

to start iterative process

3: do until converges weighted differences to

L_{0}

-norm

4:

{\hat{β}}_{A S}^{(i)} = ({(X^{'} V - A) X)}^{- 1} (X^{'} - A) V Y_{\hat{G}}

5:

{\hat{α}}_{A S}^{(i)} = {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} (Y_{\hat{G}} - X {\hat{β}}_{A S}^{(i)})

6: Determine

γ = 10^{- 5}

7:

w_{i}^{(i)} = {[{(Δ^{q} α_{i}^{(i)})}^{2} + γ^{2}]}^{- 1}

8:

{\hat{β}}_{A S} = β_{A S}^{(i)}, {\hat{α}}_{A S} = {\hat{α}}_{A S}^{(i)}

, W = diag

(w_{i}^{(i)})

9: end

10: Calculate

r_{(i^{*})}

by the criterion of

{(Δ^{q} α_{A S}^{(i)})}^{2} W^{(i)} > 0.99

11: Return

{\hat{β}}_{A S}^{(i)} = ({\hat{β}}_{1}, {\hat{β}}_{2}, \dots, {\hat{β}}_{p}),

{\hat{α}}_{A S}^{(i)} = {({\hat{α}}_{1}, {\hat{α}}_{2}, \dots, {\hat{α}}_{q + k + 1})}^{'}

12: End

Remark 2.

For the constant value of

γ = 10^{- 5}

, the iteration process repeats between step 3 and step 9 until the pre-determined tolerance value

δ = 10^{- 4}

is obtained where

δ = \sum_{i = 1}^{n} n^{- 1} | Y_{i} - {\hat{Y}}_{i \hat{G}} |

. From our experience, the expected number of iterations is observed as

n o . i t e r a t i o n = 20

to achieve the convergence.

Notice that the complexity and efficiency of Algorithm 1 is analyzed from different aspects that are given by:

(i) Number of local searches: algorithm does not involve a local search procedure which is an advantage for the speed of Algorithm 1;

(ii) Number of nested loops: due to the fact that there is only an iteration loop (without nested loops), if an algorithm does not include nested loops, its “order of growth” will be

O (n);

(iii) Asymptotic behaviors: as the former inference mentioned, Algorithm 1 has

O (n)

which means that the limiting case of its convergence speed is considerable when it is compared with its alternative BS method on this issue.

As mentioned at the beginning of this section, the choice of an optimum smoothing parameter λ is required for both semiparametric BS and AS estimators. In this context, the improved Akaike information criterion (

A I C_{c})

proposed by [29] is used, which is computed with the following equation:

A I C_{c} (λ) = \log ({\hat{σ}}^{2}) + 1 + \frac{2 {t r (H) + 1}}{n - t r (H) - 2},

(33)

where

{\hat{σ}}^{2}

is the estimate of the model variance, which is estimated for both methods separately in the next section, and

H

denotes the hat matrix for any of two methods. It is replaced by

H_{A S}

for the AS method and

H_{B S}

for the BS method, respectively.

4. Statistical Properties of the Estimators

In this paper, we introduced the semiparametric AS and BS estimators for the estimation of the right-censored time series model. It should be noted that these two methods were used for the first time in the setting of a time series estimation procedure. Inferences were therefore carried out about their statistical properties. For example, among these, the error terms obtained from the estimates of both methods and the estimators of parametric and nonparametric components were inspected and their properties were extracted.

4.1. Properties of the Semiparametric BS Estimator

Firstly, the parametric component was inspected. As is known, in a parametric context, errors can be decomposed into the bias and the variance terms that provide the quality of the estimator. Accordingly, the estimator

{\hat{β}}_{B S}

of the parametric coefficients vector is expanded as follows:

{\hat{β}}_{B S} = {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} V - A_{B S}) Y_{\hat{G}} = β + {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} V - A_{B S}) f,

(34)

where

V, A_{B S}

and

M_{B S}

matrices are as defined in Section 3.1 and

f = {[f (s_{1}), f (s_{2}), \dots, f (s_{n})]}^{'}

. From here, bias

B ({\hat{β}}_{B S})

and variance-covariance

V ({\hat{β}}_{B S})

of estimator

{\hat{β}}_{B S}

can be computed as follows:

B ({\hat{β}}_{B S}) = E ({\hat{β}}_{B S}) - β = {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} V - A_{B S}) f,

(35)

V ({\hat{β}}_{B S}) = σ^{2} {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} V - A_{B S}) X {[(X^{'} V - A_{B S}) X]}^{- 1},

(36)

where

σ^{2}

is the variance of the fitted semiparametric model. Since the variance is not generally known, instead of

σ^{2}

, the estimation (

denoted by {\hat{σ}}_{B S}^{2}

) based on the BS is used. It can be computed from the residuals sum of squares (RSS) using error terms:

{\hat{σ}}_{B S}^{2} = \frac{R S S}{t r {(I - H_{B S})}^{2}} = \frac{{‖ (I - H_{B S}) {\hat{Y}}_{{\hat{G}}_{B S}} ‖}^{2}}{t r [{(I - H_{B S})}^{'} (I - H_{B S})]},

(37)

where

t r {(I - H_{B S})}^{2} = n - 2 t r (H_{B S}) + t r (H_{B S}^{'} H_{B S})

denotes the degrees of freedom. In addition,

t r (H_{B S}^{'} H_{B S})

needs

O (n)

algebraic operations. In the context of the BS, if the data have a normal distribution,

{\hat{σ}}_{B S}^{2}

is asymptotically unbiased.

Secondly, the properties of estimated nonparametric component

{\hat{α}}_{B S} = {({\hat{α}}_{1}, {\hat{α}}_{2}, \dots, {\hat{α}}_{q + k + 1})}^{'}

are given here. The bias of

\hat{α}

is one of the quality measurements for the estimated model. The bias is denoted as conditional expectation

E [\hat{α} | s_{t}]

, given by:

E [{\hat{α}}_{B S} | s_{t}] = {(B^{'} V B + λ D^{'} D)}^{- 1} B^{'} V B α .

(38)

From that, the bias is given by:

B i a s ({\hat{α}}_{B S}) = E [{\hat{α}}_{B S} | s_{t}] - α = {[(B^{'} V B + λ D^{'} D)]}^{- 1} B^{'} V^{'} f - {[(B^{'} V B + λ D^{'} D)]}^{- 1} B^{'} V^{'} X {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} V - A_{B S}) - {[(B^{'} V B + λ D^{'} D)]}^{- 1} B^{'} V^{'} = {[(B^{'} V B + λ D^{'} D)]}^{- 1} B^{'} V^{'} X {[(X^{'} V - A_{B S}) X]}^{- 1} (X^{'} V - A_{B S}) .

(39)

Accordingly, the covariance of

{\hat{α}}_{B S}

can be computed as:

C o v ({\hat{α}}_{B S}) = {\hat{σ}}_{B S}^{2} \frac{1}{n} {(B^{'} V B + λ D^{'} D)}^{- 1} (B^{'} V B) {(B^{'} V B + λ D^{'} D)}^{- 1},

(40)

where

{\hat{σ}}_{B S}^{2}

is defined by Equation (36). In addition, to reveal the performance of

{\hat{f}}_{B S} = B {\hat{α}}_{B S}

, the root square of mean squared error

R M S E (f, {\hat{f}}_{B S})

is used:

R M S E (f, {\hat{f}}_{B S}) = n^{- 1} \sum_{t = 1}^{n} {[f (s_{t}) - {\hat{f}}_{B S} (s_{t})]}^{2} = n^{- 1} {(f - {\hat{f}}_{B S})}^{'} (f - {\hat{f}}_{B S}) .

(41)

4.2. Properties of the Semiparametric AS Estimator

Similar to in Section 4.1, the same properties for parametric and nonparametric components are given for the AS estimator here. The necessary expansion is written as follows to derivate the bias and variance of

{\hat{β}}_{A S}

:

{\hat{β}}_{A S} = {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}) Y_{\hat{G}} = β + {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}) f,

(42)

where

A_{A S}

and

M_{A S}

are given in Section 3.2. Now, the bias and the covariance matrix of the estimator

{\hat{β}}_{A S}

can be provided by:

B ({\hat{β}}_{A S}) = E ({\hat{β}}_{A}) - β = {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}) f,

(43)

V ({\hat{β}}_{A S}) = σ^{2} {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}) X {[(X^{'} V - A_{A S}) X]}^{- 1},

(44)

where

σ^{2}

is the variance of the fitted semiparametric model. Similar to Equation (40), instead of the model variance,

{\hat{σ}}_{A S}^{2}

is obtained as follows:

{\hat{σ}}_{A S}^{2} = \frac{R S S}{t r {(I - H_{A S})}^{2}} = \frac{{‖ (I - H_{A S}) {\hat{Y}}_{{\hat{G}}_{A S}} ‖}^{2}}{t r [{(I - H_{A S})}^{'} (I - H_{A S})]} .

(45)

The properties of estimated nonparametric component

{\hat{α}}_{A S} = {({\hat{α}}_{1}, {\hat{α}}_{2}, \dots, {\hat{α}}_{q + k + 1})}^{'}

for the AS method are described below. The bias and the variance of the AS estimator

{\hat{α}}_{A S}

can be given, respectively, as:

B i a s ({\hat{α}}_{A S}) = E [{\hat{α}}_{A S} | s_{t}] - α = {[(B^{'} V B + λ D^{'} W D)]}^{- 1} B^{'} V^{'} f - {[(B^{'} V B + λ D^{'} W D)]}^{- 1} B^{'} V^{'} X {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}) - {[(B^{'} V B + λ D^{'} W D)]}^{- 1} B^{'} V^{'} f = {[(B^{'} V B + λ D^{'} W D)]}^{- 1} B^{'} V^{'} X {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}),

(46)

and

C o v ({\hat{α}}_{A S}) = {\hat{σ}}_{A S}^{2} \frac{1}{n} {(B^{'} V B + λ D^{'} W D)}^{- 1} (B^{'} V B) {(B^{'} V B + λ D^{'} W D)}^{- 1} .

(47)

Thus, the value of

R M S E (f, {\hat{f}}_{A S})

for

{\hat{f}}_{A S} = B {\hat{α}}_{A S}

, similar to Equation (41), is calculated as follows:

R M S E (f, {\hat{f}}_{A S}) = n^{- 1} \sum_{t = 1}^{n} {[f (s_{t}) - {\hat{f}}_{A S} (s_{t})]}^{2} = n^{- 1} {(f - {\hat{f}}_{A S})}^{'} (f - {\hat{f}}_{A S}) .

(48)

4.3. Quality Measures for the Fitted Model

After assessing the parametric and nonparametric components of the model in Section 4.1 and Section 4.2, several measurements are introduced in this section to evaluate the overall model performance. In the literature on time series modelling, mean absolute percentage error (

M A P E

), mean absolute error (

M A E

), and mean squared error (

M S E

) are the most commonly used performance criteria. To represent these criteria,

M A P E

is preferred in this study. In addition, median absolute error (

M e d A E

) was used, which allowed us to account for missing or censored data. Generalized

M S E (G M S E)

and the ratio of

G M S E

(R G M S E)

proposed by [30] and [2], respectively, were used to measure the quality of the fitted time series model. The aforementioned criteria can be defined as follows:

\begin{array}{c} M A P E (Y_{t \hat{G}}, {\hat{Y}}_{t \hat{G}}) = \frac{n^{- 1} \sum_{t = 1}^{n} | Y_{t} - {\hat{Y}}_{t \hat{G}} |}{Y_{t \hat{G}}}, & M e d A E (Y_{\hat{G}}, {\hat{Y}}_{\hat{G}}) = M e d i a n (| Y_{\hat{G}} - {\hat{Y}}_{\hat{G}} |), \end{array}

G M S E (Y_{\hat{G}}, {\hat{Y}}_{\hat{G}}) = {(Y_{\hat{G}} - {\hat{Y}}_{\hat{G}})}^{'} E (Y_{\hat{G}} Y_{\hat{G}}^{'}) (Y_{\hat{G}} - {\hat{Y}}_{\hat{G}}),

where

{\hat{Y}}_{t \hat{G}}

and

{\hat{Y}}_{\hat{G}}

denote the fitted dependent variable values and vector for any estimation method. Here,

{\hat{Y}}_{t \hat{G}}

and

{\hat{Y}}_{\hat{G}}

are replaced by

{\hat{Y}}_{t {\hat{G}}_{B S}}

and

{\hat{Y}}_{{\hat{G}}_{B S}}

for the BS, and

{\hat{Y}}_{t {\hat{G}}_{A}}

and

{\hat{Y}}_{{\hat{G}}_{A}}

for the AS. In addition, to make a more considerable comparison between the AS and BS estimators,

R G M S E

is defined below.

Definition 2.

The ratio of GMSE can be defined as follows:

R G M S E (Y_{{\hat{G}}_{B S}}, {\hat{Y}}_{{\hat{G}}_{A S}}) = \frac{G M S E ({\hat{Y}}_{{\hat{G}}_{A S}})}{G M S E ({\hat{Y}}_{{\hat{G}}_{B S}})} .

(49)

Regarding the

RGMSE

criterion, if

RGMSE (Y_{{\hat{G}}_{BS}}, {\hat{Y}}_{{\hat{G}}_{AS}}) < 1

, then it can be seen that the fitted model by the AS method shows better performance then the BS method.

5. Further Information for Adaptive-Ridge Penalty

The semiparametric AS estimator proposed for the right-censored time series model, with its adaptive nature, aims for qualified estimations despite the censorship. To approach the

L_{0}

-norm given in Equation (23), the most suitable knot locations can be chosen due to the weighted penalty term. Thus, the model avoids the disadvantages of synthetic data transformation, which gives higher magnitudes to uncensored observations.

This section is designed to inspect some of the large sample properties of the modified AS estimator under right-censored data. It should be noted that adaptive ridge penalty in the setting of regression has been studied by many authors; see for example [25,26,28]. However, the aforementioned studies consider adaptive ridge penalty individually, not as a part of a semiparametric time series model. This section provides basic information for the large sample properties of the proposed AS estimator in the context of a semiparametric time series model.

As previously stated, the AS approximation is a modified version of the P-splines (penalized BSs) estimator proposed by [31]. Note also that the AS method diverges from BSs with a significant difference of the

L_{0}

-norm in the penalty term. The AS estimator is obtained by an iterative process with determining weights, as expressed in Section 3.2. In addition, apart from the usage of the AS method in the literature, it is also used for modelling censored time series. For these reasons, we can make several important assumptions. The large sample properties are written based on the assumptions given below:

Assumption 1.

The minimization problem for the semiparametric AS is given in Equation (26). To make this expression more general, it can be rewritten as follows:

P S S (α; λ) = \sum_{t = 1}^{n} V {Y_{t \hat{G}} - \sum_{l = 1}^{p} x_{t l} β_{l} - \sum_{j = 1}^{v} α_{j} B_{j, q} (s_{t})}^{2} + λ \sum_{j = q + 1}^{q + k + 1} {‖ Δ^{q} α_{j} ‖}_{τ},_{}

(50)

where

{‖ Δ^{q} α_{j} ‖}_{τ}

represents the

τ

-norm of the penalty term. The first assumption is

τ \to 0

, which allows approximation to the

L_{0}

-norm with the acquisition of weights via the iterative process. Otherwise, the

L_{0}

-norm needs overly complex calculations, which leads to the loss of practicality when using the method. From our knowledge of the literature, when

τ \to 0,

such as in Equation (26), the minimization of Equation (50) works by penalizing the non-zero coefficients

α_{j}

’s, as shown by [32].

Assumption 2.

When

{\hat{α}}_{A S}

is examined asymptotically, the objective function of Equation (26) may not have a global minimum, since it is not clearly convex. However, if we assume that:

R_{n} = \frac{1}{r} \sum_{i}^{r} B_{i} B_{i}^{.^{'}} \to R,

(51)

then it is possible to point out some important aspects of asymptotic consistency. Therefore, it should be presumed that

R

is a non-negative definite matrix and:

\frac{1}{q + k + 1} \underset{1 \leq i \leq r}{m a x} B_{i}^{'} B_{i} \to 0,

(52)

where elements of

d i a g (R_{i}) = 1

.

Assumption 3.

B_{j}^{T} B_{j}

,

{(B_{j}^{T} B_{j})}^{- 1}

, and

R

are assumed to be full rank matrices. Under the assumptions given above, to see asymptotic consistency of

{\hat{α}}_{A S}

and

{\hat{β}}_{A S}

, an equation can be obtained from Equation (50) as follows:

M_{n} ({\hat{α}}_{A S n}, {\hat{β}}_{A S n}) = \sum_{t = 1}^{n} \begin{matrix} V {Y_{t \hat{G}} - \sum_{l = 1}^{p} x_{t l} {\hat{β}}_{A S n l} - \sum_{j = 1}^{r} {\hat{α}}_{A S n j} B_{j, q} (s_{t})}^{2} \\ + λ_{n} \sum_{i = q + 1}^{q + k + 1} {‖ Δ^{q} {\hat{α}}_{A S n_{i}} ‖}_{τ} \end{matrix},

(53)

where

({\hat{α}}_{A S n}, {\hat{β}}_{A S n})

denote the limiting case of the estimators for

λ_{n} = O (n)

. Note that Equation (52) is ensured by following Theorem 1.

Theorem 1.

Based on Assumptions 1–3, and

λ_{n} \to λ \geq 0

, then

({\hat{β}}_{A S_{n}}, {\hat{α}}_{A S_{n}}) \overset{d}{\to} a r g m i n (M_{n})

where:

M_{n} ({\hat{β}}_{A S_{n}}, {\hat{α}}_{A S_{n}}) = {[{({\hat{β}}_{A S_{n}} {\hat{α}}_{A S_{n}})}^{'} - {(β α)}^{'}]}^{'} R [{({\hat{β}}_{A S_{n}} {\hat{α}}_{A S_{n}})}^{'} - {(β α)}^{'}] + λ_{n} \sum_{i = q + 1}^{m = q + k + 1} {‖ Δ^{q} α_{i} ‖}_{τ} .

(54)

Therefore, for optimal

λ_{n} = O (1)

, pair

({\hat{β}}_{A S_{n}}, {\hat{α}}_{A S_{n}})

can be counted as a consistent AS estimator of

(β, α)

. In this context, when

n \to \infty

then

| {\hat{β}}_{A S_{n}}, {\hat{α}}_{A S_{n}} | \to (β, α)

.

For the proof of Theorem 1, see Appendix C.

To clearly indicate the place of Assumptions 1–3 in the estimation process, the following explanations are given for each assumption.

Assumption 1 is independent from the data. We assume that to provide a practical solution when minimizing Equation (50). Therefore, in both empirical and real data studies, this assumption does not impose anything to the dataset, but it is necessary to reduce the computational complexity.
In real data studies, to ensure Assumption 2, “ $B$ ” matrix obtained by using the nonparametric covariate needs to have independent columns. Because $(B^{'} B)$ should be identifiable and avoid the ill-posed problem, $(B^{'} B)$ must be a full-ranked matrix.
Assumption 3 confirms Assumption 2. Thus, it can be seen that asymptotic consistency can be confirmed by Assumption 3. From that it can be said that Assumption 3 is indirectly depended on the dataset.

5.1. Asymptotic Distribution and Consistency of the Proposed Estimator

In this section, the estimate of parametric component

{\hat{β}}_{A S}

is inspected in terms of asymptotic consistency and asymptotic distribution.

Assume the following regularity conditions:

(i): $F_{n} = n^{- 1} (X_{i}^{T} V - A) X_{i} \to F$ for non-negative matrix $F$ ;
(ii): $n^{- 1} \max_{1 \leq t \leq n} (X_{i}^{T} V - A) X_{i} \to 0;$
(iii): Autoregressive errors $ε_{t}$ ’s given in Equation (2) are stationary with independent and identically distributed random error terms $u_{t}$ ’s that have zero mean and finite variance $0 < σ^{2} < \infty$ ;
(iv): $F_{n}^{- 1} = n^{- 1} {[(X_{i}^{T} V - A) X_{i}]}^{- 1}$ exists.

Here, condition (ii) indicates that the diagonal elements of

F

and

F_{n}

are identical and one, because the covariates are scaled. To obtain the asymptotic distribution of

{\hat{β}}_{A S}

, “nearly-singular” designs are performed due to

τ \to 0

for

F_{n}

. Thus, it can be ensured that

F_{n} \to F

asymptotically. On the other hand,

F_{n}

and

F

are assumed as non-singular in Section 5.1.

To show the consistency and asymptotic normality of the semiparametric AS estimator when conditions (i), (ii), and (iii) are ensured with non-singular

F

, first the case of

τ \geq 1

is considered, followed by the case of

τ < 1

.

Let

{\hat{β}}_{A S_{n}}

be an asymptotic estimator. The consistency of

{\hat{β}}_{A S_{n}}

can be reached by using following minimization function:

ψ_{n} ({\hat{β}}_{A S_{n}}, \hat{f} (s_{t})) = n^{- 1} \sum_{t = 1}^{n} {[Y_{t} - X_{t} {\hat{β}}_{A S_{n}} - \hat{f} (s_{t})]}^{2} + λ_{n} n^{- 1} \sum_{j = 1}^{p} {| {\hat{β}}_{(j) A S_{n}} |}^{τ} .

(55)

The following theorem shows the consistency of

{\hat{β}}_{A S_{n}}

for validated additional assumption

λ_{n} = O (n)

.

Theorem 2.

Assume that

F

is non-singular,

\hat{f} (s_{t})

behaves stable, and

λ_{n} n^{- 1} \to λ_{0} \geq 0

. It can then be said that as

n \to \infty

:

{\hat{β}}_{A S_{n}} \overset{d}{\to} β,

(56)

where

{\hat{β}}_{A S_{n}}

is a consistent estimator of

β

. The proofs of this theorem are given in Appendix D. For

λ_{n} = O (n)

,

a r g m i n (ψ) = β

and therefore

{\hat{β}}_{A S_{n}}

is a consistent estimator.

It should be emphasized that the consistency of

{\hat{β}}_{{AS}_{n}}

is sufficient to show that

λ_{n} = O (n)

. However, this depends on the magnitude of growth of

λ_{n}

. When

λ_{n}

grows more slowly, then a limiting distribution

\sqrt{n} ({\hat{β}}_{{AS}_{n}} - β)

exists. It is clear from Theorem 2 that the mean of the limiting distribution of

\sqrt{n} ({\hat{β}}_{{AS}_{n}} - β)

converges to zero for the consistency of

{\hat{β}}_{{AS}_{n}}

. In addition, its asymptotic variance can be obtained based on conditions (i) and (iv) as

σ^{2} F^{- 1}

. Accordingly, the asymptotic distribution of the semiparametric AS estimator is written as:

θ = \sqrt{n} ({\hat{β}}_{{AS}_{n}} - β) \overset{d}{\to} N [0, σ^{2} F^{- 1}] .

(57)

However, the limiting distribution depends on whether

τ < 1

or

τ \geq 1

. In the context of this paper, Theorem 3 is given for the limiting distribution of

{\hat{β}}_{{AS}_{n}}

when

τ < 1

.

Theorem 3.

Assume that

τ < 1

if

λ_{n} / n^{\frac{τ}{2}} \to λ_{0} \geq 0

. Then:

θ = \sqrt{n} ({\hat{β}}_{A S_{n}} - β) \overset{d}{\to} a r g m i n (ξ),

(58)

where

ξ (θ) = - 2 θ^{T} F + θ^{T} F θ + λ_{0} \sum_{j = 1}^{p} {‖ θ_{j} ‖}^{τ} I (β_{j} = 0)

. The proofs of Theorem 3 are given in Appendix E.

6. Simulation Study

In this section, a simulation study was conducted to inspect the finite-sample behaviors and performances of the two semiparametric estimators

({\hat{α}}_{B S}, {\hat{β}}_{B S})

and

({\hat{α}}_{A S}, {\hat{β}}_{A S})

under right-censored time series. These estimators were then compared through the quality measurements given in Section 4. The simulation scenarios are designed as follows:

(a): We use the model $Z_{t} = X_{t} β + f (s_{t}) + ε_{t}, t = 1, 2, \dots, n$ to generate datasets in the simulation experiments.
(b): The unknown smooth regression function $f (s_{t})$ is constructed by combining the functions ${S_{j}, j = 1, \dots, 5}$ that denote seasonal effects on the data, that is, $f (s_{t}) = U_{j = 1}^{5} S_{j} (s_{i})$ , where $S_{j} (s_{i}) = s_{i} \sin^{2} (s_{i})$ with $s_{i} = \frac{(i - 0.5)}{\frac{n}{5}}, i = 1, \dots, (n / 5) .$
(c): The design matrix is generated from a normal distribution: $X_{t} ~ N (μ_{x} = 5, σ_{x}^{2} = 1)$ , where $X_{t}$ is the $(n \times p)$ dimensional matrix for $p = 3$ . Note also that the distribution may not be normal, and that one can thus consider a uniform or other distributions. The vectors of the regression coefficients are $β = (3, 0.5, - 1)$ .
(d): The autoregressive error terms are generated from a one-lagged process $ε_{t} = ρ ε_{t - 1} + u_{t}$ with $ρ = 0.5$ and $u_{t} ~ N (0, 1)$ .
(e): Thus, as stated in (a), completely observed dependent time series $Z_{t}$ ’s are generated from the sum of the parametric, nonparametric, and error terms using (b), (c), and (d).
(f): To produce the right censored variable $Y_{t}$ , as specified in Equation (3), we generate the censoring variable $C_{t}$ from the binomial distribution with proportions or censoring levels (CLs) at $5 %, 20 %, and 40 % .$ The Algorithm 2, given below, demonstrates how the censoring variable is created.

Algorithm 2. Generation of censoring variable

C_{t}

.

Input: Completely observed

Z_{t}

Output: Right-censored dependent variable

Y_{t}

1: For given censoring level (CL), produce

δ_{t} = I (Z_{t} \leq C_{t})

from the binomial distribution

2: for

(t i n 1 t o n)

3: If

(δ_{t} = 0)

4: while

(Z_{t} \leq C_{t})

5: generate

C_{t} ~ N (μ_{Z}, σ_{Z}^{2})

6: Else

7:

C_{t} = Z_{t}

8: end (for loop in step 2)

9: for

(t i n 1 t o n)

10: If

(Z_{t} \leq C_{t})

11:

Y_{t} = Z_{t}

12: Else

13:

Y_{t} = C_{t}

14: end (for loop in Step 9)

(a): To deal with censored observations in $Y_{t}$ obtained with Algorithm 2, we use synthetic data values $Y_{t \hat{G}}$ obtained through the Kaplan and Meier estimator [18], as described in Equation (6).
(b): AR(1) model is used as a naïve model to estimate the right-censored time-series as in [1,2]. Thus, the finite sample performance of the introduced methods can be made.

For each CL in the simulation experiments, we generated 1000 random samples for size

n = 50, 100, and 200

.

The results of the simulation study were divided into three parts for parametric components, nonparametric components, and overall model performance. Accordingly, the outcomes of the estimated models, comparative results, and corresponding comments are given together in the following tables and figures. To understand the simulated datasets and the scenarios, examples of some of the simulation configurations are given in Figure 1. Panel (a) shows the dataset for small sample size and low censorship. Panel (b) is drawn to show the case when the censoring level is really high. Panels (c)–(d) indicates the cases for medium and large sample sized data with censoring levels 20% and 40% respectively.

6.1. Assessing the Parametric Component

In this section, the performances of the two methods were compared in terms of the parametric components of the right-censored semiparametric linear models generated by the simulation. It should be also noted that in this simulation study, 54 different configurations were analyzed to provide a broad perspective of the adequacy of each method. The results from the parametric components in the simulation study are displayed in Table 1 and Figure 2. Note that bold colored scores indicate the best (minimum) scores.

From the careful inspection of Table 1, it can be demonstrated that the behaviors of the BS and AS change noticeably in different scenarios. Let us look at low and medium CLs for

n = 50

; under these conditions, the BS has remarkable superiority over the AS. This can be interpreted as the BS fitting the data better when the data’s structure is distorted less by censorship. However, for

C L = 40 %,

which means the data are heavily censored, the AS method gives better scores.

As the sample size increases, although the bias and variance values from the methods are obtained more closely, the AS provides more efficient performance in estimating the parametric component. Regarding the parametric component, it should be emphasized that the AS behaves as expected and gives the best scores for cases of heavy censorship.

In general, the best scores for each method can be evaluated in terms of bias and variance results. When we examined the bias results of the regression coefficients, the AS method gives the best score in only 12 out of 27 configurations while the BS method gives the best score in 15. However, regarding the variances, the AS gives the best score in 18 of 27 configurations, while the BS is superior in only 9 configurations. In Figure 2, Panels (a–c) shows the calculated biases for each simulation repetition for all cases when sample size is small, medium, and large.

6.2. Evaluating the Nonparametric Component

As in the case of parametric components, we constructed 1000 estimates of the regression function

f (.)

, which is the nonparametric component of model (1). For each method, 1000 replications were carried out, and the estimated bias, variance and RMSE values were computed for each estimator. This section is designed to show the simulated results related to the nonparametric component.

The results in Table 2 showed that the AS method proves its efficiency for the estimation of the nonparametric component when time series data are moderately to heavily censored. On the other hand, for

C L = 5 %

, the BS method gives better results for all sample sizes according to our evaluation metrics. One of the main reasons for this is that the BS adapted to the knots more than the AS. Consequently, when the data points are manipulated by censorship, these knots force the BS to make inefficient estimates. At this point, the knot determination of the AS based on the weights given in Equation (24) diminishes the effect of the censorship. That is why the AS method performs better under moderately and heavily censored time series data.

Figure 3, consisting of four panels (a), (b), (c), and (d), is drawn to illustrate the performance of the AS and BS methods in nonparametric curve estimation and to present different simulation configurations. Panel (a) show the estimated curves for small sample size and medium censoring level. Similarly, Panel (b) shows the case when medium sample size and high censoring level. Panel (c) indicates the estimated curves for small sample size and low censoring level. Finally, Panel (d) shows the estimated curves when sample size is large and censoring level is medium. When panels (a) and (c) are analyzed comparatively, the effect of the censorship level can be seen. At the first glance, the distortion of both curves is noticeable. However, the BS method is insufficient to represent censored time series compared to the AS method. In addition, panel (b) shows that when data are heavily censored, the BS curve is drawn towards the x = 0 line, due to the presence of zero values in the synthetic response variable. Finally, panel (d) indicates that although the time series contains censored data points, the qualities of the estimates for both the AS and BS methods become better as the sample size increases.

6.3. Assessing the Performances of Methods

This section involves the results for overall model estimations obtained from the AS and BS methods. Although results are given for parametric and nonparametric components in the previous sections, a separate review for the whole model estimation is required for a healthy comparison. Accordingly, the performance scores for

M A P E, M e d A E

, and

G M S E

are given in Table 3, and Figure 4 is drawn to illustrate the

R G M S E

values.

When Table 3 is examined, it can be seen that the results obtained for the model estimates are slightly different from the previous results, as expected. The total error obtained from the estimation of parametric and nonparametric components is one of the reasons for this discrepancy. In addition, considering the situations where the two methods produce extremely similar scores, this difference can be understood better. Note that AR(1) model shows poor performance, which depends on its parametric and linear structure. However, for the large sample size (n = 200), the scores of models obtained are close to each other. However, it is clearly seen that the AS and BS methods are much better on the estimation of right-censored time series.

As can be seen from the bolded scores, the AS method generally performs better. From Table 3, it can be seen that the

M A P E

values obtained by BS are better for

n = 50

. However, as mentioned earlier, in this study, the

M e d A E

criterion, which is not frequently used for time series data, is used to measure the durability of the predictions. When the scores of this criterion are examined, it is understood that, as stated from the beginning of the study, the BS method has more successful estimates under low censorship levels, but the AS method is superior for medium and high censorship levels.

Figure 4 includes the

R G M S E

scores for both the AS and BS methods that are formed by the ratio of the

G M S E

values of each method. In Figure 4, the difference between the qualities of the estimates is clearly very small for

C L = 5 %

. However, the difference becomes more significant for

C L = 20 %

and

C L = 40 %

. Note that for

C L = 5 %

, the BS method gives smaller ratio values, which confirms the results given in Table 3. As stated before, the AS method is demonstrably superior at higher censorship levels, which can be seen in Figure 4 for all sample sizes.

7. Real-World Data

This section is designed to show how the newly introduced semiparametric estimator AS and benchmark BS method behave with a real right-censored time series dataset. For this purpose, we consider unemployment duration data involving the monthly unemployment period rates years between 2004 and 2019 for Turkey; this dataset is available at https://ec.europa.eu/eurostat/databrowser/view/UNE_RT_M__custom_1635127/default/table?lang=en. In the dataset, the last three months of 2004 and the last three months of 2019 cannot be observed correctly. Therefore, these data points can be censored from the right by the detection limit zero, because none of the data points are negative values. Accordingly, the introduced semiparametric methods, AS and BS, can be used for this time series analysis. In addition, as in the simulation study, the results of the AR model are given in the following tables. However, different from the simulation study, AR(2) model was used for the real data study, because the optimal lag values is determined as

l a g = 2

from Table 4. Before the modelling procedure, the stationarity of the time series data was tested with the augmented Dickey–Fuller (ADF) test, the suitable lag is determined under null hypothesis

H_{0} : y_{t} i s n o n - s t a t i o n a r y

. The test results are given in Table 4 below:

Table 4 shows that the second lag for this time series is suitable for the modelling. From this information, the semiparametric time series model can be given by:

U E D_{t} = β_{1} U E D_{(t - 1)} + β_{2} U E D_{(t - 2)} + f (s_{t}) + ε_{t}, t = 1, \dots, 186,

(59)

where

U E D_{t}

s represent the dependent time series of the unemployment duration ratio,

U E D_{(t - 1)}

and

U E D_{(t - 2)}

denote the first and second lags of the dependent series

U E D_{t}

that are used as covariates, respectively,

s_{t} = {(1, \dots, n)}^{T}

denotes the seasonality, and finally,

ε_{t}

’s are the stationary autoregressive error terms as given in Equation (2). The estimation of model (6.1) is realized by both the AS and BS methods, and then, results are presented in Table 5 and Table 6 and Figure 5.

Table 5 involves the bias and variance values for estimated regression coefficients

\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})

and

\hat{α} = {({\hat{α}}_{1}, {\hat{α}}_{2}, \dots, {\hat{α}}_{q + k + 1})}^{T}

. Accordingly, the AS method gives smaller bias and variance values than the BS method regarding

\hat{β}

. Moreover, the AS method has better bias values for

\hat{α}

, but the BS method gives smaller variance values for

\hat{α}

than the AS method. In overview, the AS and BS methods give similar values, because the data properties are

n = 186

and

C L = 8.1 %

. Thus, it can be seen that the results of the unemployment duration data ensure the simulation outputs.

In addition, it should be noted that the outcomes obtained from estimated model (7.1) are given in Table 6 with

R M S E

scores for the estimated nonparametric function

f (s_{t})

. Upon close inspection, it is obviously seen from the results that the AS method produces the best scores. It should be emphasized that the largest difference between the methods regarding performance criteria is in

M e d A E

, which indicates the strength of the AS method under censorship. Table 6 indicates the results of AR(2) model that are worse than the results of the other two as in the simulation study. Note that because of the sample size of the real data of

n = 186

which is close to the simulation configurations when

n = 200

, scores are relatively close to each other. Figure 5 is given to compare the AS and BS methods in representing data under censorship.

As can be seen in Figure 5, the estimated curves are quite similar due to the data properties of a large sample size and a low CL. The effect of synthetic data manipulation is obvious in the figure with zero values. Like the simulation study, the BS method is affected by these zero values more than the AS method. The reason for this is that the knots of the AS method are determined by iteratively calculated weights. Therefore, the optimal knot sequence diminishes the effect of censorship.

8. Concluding Remarks

This paper demonstrated the estimation of right-censored time series data using a newly introduced semiparametric AS estimator and making a comparison with the BS method as a benchmark. The results obtained from both a simulation study and a real data example proved that the introduced method (AS) achieves the superior modelling of right-censored time series data in a semiparametric context. Comparative outcomes also support that the AS method provides better performance scores over the BS method in most simulation configurations and the real data example. The most important factor in the success of the AS method is the adaptive nature of the method based on iteratively calculated weights. In the AS method, weights are responsible for determining and controlling the penalty term and for dependently obtaining the optimal knot points. Accordingly, our findings showed that the proposed method provides an advantage in modelling right-censored time series over the benchmark.

The simulation study examined the performance of the methods in three parts: the outcomes for the estimated parametric component (Table 1 and Figure 2), the nonparametric component (Table 2 and Figure 3), and the whole semiparametric model (Table 3 and Figure 4). The unemployment data estimation was evaluated for bias and variance (Table 5) using the criteria of

M A P E

,

M e d A E

,

G M S E

, and

R G M S E

(Table 6). Given the outcomes of the simulation study and the real data example, our general and detailed conclusions are as follows:

As expected, the estimation qualities for both the AS and BS methods change for different CLs and sample sizes. The performances of the methods are affected negatively by increasing CLs, and they give better results for larger sample sizes. This claim is seen clearly from Table 1, Table 2 and Table 3.
When unemployment duration data were analyzed, it can be seen that the results agreed with the corresponding configuration ( $n = 200; C L = 20 %$ ) of the simulation study.
One of the striking results of this paper is that, as Table 1, Table 2 and Table 3 demonstrate, while the AS method gives worse results at low censorship levels than the BS method, it provides significantly better results at medium and high censorship levels. This conclusion proves the claim of the paper, which is that using the AS method reduces the effect of the data manipulation of synthetic data transformation.
When all the results obtained from simulation and real data studies were inspected, the AS method gives better results than the BS method, except in the configurations for low CLs, which supports the targeted conclusion.
Unemployment duration data were modelled by the BS and AS methods using two lagged parametric components and the seasonal effect as a nonparametric component. Table 5 and Table 6 show each method’s scores using four evaluation metrics, which indicate the superiority of the AS method. Figure 5 shows the estimated curves for both methods, which are similar. However, the estimated curves show that the AS method is less affected by zero values of synthetic data and thus gives more satisfying estimates for the right-censored time series model than the BS method.

Finally, as can be understood from the whole paper, the AS method is superior for estimating right-censored time series over the BS method in both theory and practice.

Author Contributions

Conceptualization, S.E.A. and D.A.; methodology, S.E.A. and D.A.; software, D.A. and E.Y.; validation, S.E.A., D.A. and E.Y.; formal analysis, D.A. and E.Y.; investigation, D.A. and E.Y.; resources, S.E.A. and D.A.; data curation, E.Y.; writing—original draft preparation, S.E.A., D.A. and E.Y.: writing—review and editing, S.E.A., D.A. and E.Y.; visualization, E.Y.; supervision, S.E.A.; funding acquisition, S.E.A. All authors have read and agreed to the published version of the manuscript.

Funding

Professor Ahmed research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We consider unemployment duration data involving the monthly unemployment period rates years between 2004 and 2019 for Turkey; this dataset is available at https://ec.europa.eu/eurostat/databrowser/view/UNE_RT_M__custom_1635127/default/table?lang=en.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 1.

Lemma 1 can be ensured based on the common censorship assumption that

Z_{t}

and

C_{t}

are independent. From that, the proof can be written as follows:

\begin{array}{c} E [Y_{t G} | x, s] = E [\frac{δ_{t} Z_{t}}{1 - G (Z_{t})} | x, s] = E [\frac{δ_{t} Z_{t}}{\bar{G} (Z_{t})} | x, s] = E [\frac{I (Z_{t} \leq C_{t}) \min (Z_{t}, C_{t})}{\bar{G} [\min (Z_{t}, C_{t})]} | x, s] = \\ E [I (Z_{t} \leq C_{t}) \frac{Z_{t}}{\bar{G} (Z_{t})} | x, s] = E [E [\frac{Z_{t}}{\bar{G} (Z_{t})} I (Z_{t} \leq C_{t}) | x, s] | x, s] = E [\frac{Z_{t}}{\bar{G} (Z_{t})} \bar{G} (Z_{t}) | x, s] = \\ E [Z_{t} | x, s] = x_{t} β + f (s_{t}) \end{array}

(A1)

Thus, Lemma 1 is proven. Here,

\bar{G} (.) = 1 - G (.) .

Generally, distribution

G (.)

is unknown. Therefore, its Kaplan–Meier estimator

\hat{G} (.)

is used instead of

G (.)

, which is given in Equation (5). □

Appendix B

Derivations of Equations (29) and (30).

To show the derivations of Equations (29) and (30), two equations obtained from Equation (27) are written as:

(X^{'} V X) β + X^{'} V B α = X^{'} V Y_{\hat{G}} B^{'} V X β + (B^{'} V B + λ K) α = B^{'} V Y_{\hat{G}}

(A2)

From Equation (B1),

{\hat{α}}_{A S}

can be acquired by the algebraic operations:

(B^{'} V B + λ K) α = B^{'} V Y_{\hat{G}} - B^{'} V X β (B^{'} V B + λ K) α = B^{'} V (Y_{\hat{G}} - X β) .

(A3)

Thus, if

β

is replaced by

{\hat{β}}_{A S}

, then

{\hat{α}}_{A S}

can be written as:

{\hat{α}}_{A S} = {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} (Y_{\hat{G}} - X {\hat{β}}_{A S}) .

(A4)

Therefore, Equation (27) can be derived. Accordingly, the derivation of

{\hat{β}}_{A S}

can be obtained by using (B1):

(X^{'} V X) β + X^{'} V B [{[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} (Y_{\hat{G}} - X β)] = X^{'} V Y_{\hat{G}}, (X^{'} V X) β + X^{'} V B {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} Y_{\hat{G}} - X^{'} V B {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} X β = X^{'} V Y_{\hat{G},} [(X^{'} V X) - X^{'} V B {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} X] β = X^{'} V Y_{\hat{G}} - X^{'} V B {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'} Y_{\hat{G}} .

(A5)

To simplify the calculations, let

A_{A S} = X^{'} V B {[B^{'} V B + λ K]}^{- 1} B^{'} V^{'}

. Therefore,

[(X^{'} V - A_{A S}) X] β = (X^{'} - A_{A S}) V Y_{\hat{G}}, {\hat{β}}_{A S} = ({(X^{'} V - A_{A S}) X)}^{- 1} (X^{'} - A_{A S}) V Y_{\hat{G}} .

(A6)

The derivations of Equations (29) and (30) are thus completed.

Appendix C

Proof of Theorem 1.

To validate the Theorem 1, necessary equations are given by:

\sup_{{\hat{α}}_{A S n} \in Q} | M_{n} ({\hat{α}}_{n}) - M ({\hat{α}}_{A S n}) - σ_{ε}^{2} | \overset{p}{\to} 0,

(A7)

where

σ_{ε}^{2}

is the variance of the model defined in Equation (7),

Q

is a compact set in a metric space and by using Equations (54)–(57), it can be seen that:

| {\hat{α}}_{A S n} | \to α, as n \to \infty .

(A8)

See [33] for more details. □

Appendix D

Proof of Theorem 2.

For ensured regularity conditions (i)–(iv),

p l i m ({\hat{β}}_{A S_{n}})

is written as follows:

p l i m ({\hat{β}}_{A S_{n}}) = β + p l i m (n^{- 1} {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}) f) + p l i m (n^{- 1} {[(X^{'} V - A_{A S}) X]}^{- 1} (X^{'} V - A_{A S}) ε) p l i m ({\hat{β}}_{A S_{n}}) = β + p l i m {n^{- 1} {[(X^{'} V - A_{A S}) X]}^{- 1}} p l i m {n^{- 1} (X^{'} V - A_{A S}) [f + ε]} .

(A9)

Because

f

can be counted as a nuisance parameter, and from assumptions (i) and (ii),

p l i m {n^{- 1} {[(X^{'} V - A_{A S}) X]}^{- 1}} = F_{n}^{- 1}

and

p l i m {n^{- 1} (X^{'} V - A_{A S}) [f + ε]} = o (1)

. Therefore, the expression at the right side in (D1) goes to zero. Thus, from that, it is obtained that:

argmin (ψ_{n}) \overset{p}{\to} argmin (ψ), {\hat{β}}_{A S_{n}} \overset{d}{\to} β .

(A10)

Note that the results obtained above are for

τ \geq 1

, which means

ψ_{n}

has a convex structure (see [34,35]). However, the proposed AS estimator includes the case of

τ < 1

, so that

ψ_{n}

is not convex. In this matter, Equation (D2) is processed differently as:

ψ_{n} ({\hat{β}}_{A S_{n}}, \hat{f} (s_{t})) > n^{- 1} \sum_{t = 1}^{n} {[Y_{t} - X_{t} {\hat{β}}_{A S_{n}} - \hat{f} (s_{t})]}^{2} = ψ_{n}^{(0)} ({\hat{β}}_{A S_{n}}, \hat{f} (s_{t}))

(A11)

Note that Equation (D3) is validated for all

{\hat{β}}_{A S_{n}}

. Moreover,

a r g m i n (ψ_{n}) = O_{p} (1)

, because

(ψ_{n}^{(0)}) = O_{p} (1)

. □

Appendix E

Proof of Theorem 3.

To show the proof of Theorem 3, due to the non-convex structure of

τ < 1

, some complex expressions are needed for minimization criterion

ξ

. These are given by:

ξ_{n} (θ) = \sum_{t = 1}^{n} [{(ε_{t} - \frac{θ^{T} X_{t}}{n^{- 1}})}^{2} - ε_{t}] + λ_{n} \sum_{j = 1}^{p} [{| β_{j} + \frac{θ_{j}}{n^{- 1}} |}^{τ} - {| β_{j} |}^{τ}] .

(A12)

Due to

λ_{n} = O (n^{τ / 2}) = o (\sqrt{n})

, the following expression is obtained similar to Theorem 3:

λ_{n} \sum_{j = 1}^{p} [{| β_{j} + \frac{θ_{j}}{n^{- 1}} |}^{τ} - {| β_{j} |}^{τ}] \overset{d}{\to} λ_{0} \sum_{j = 1}^{p} {| θ_{j} |}^{τ} I (β_{j} = 0) .

(A13)

Then the convergence is realized as follows:

a r g m i n (ξ_{n}) \overset{d}{\to} a r g m i n (ξ) .

(A14)

Thus, the proof is finished. It is important to note that, for

τ < 1

, the non-zero regression coefficients of the model can be estimated without asymptotic bias if zero ones are shrunk to the zero with a positive probability. □

References

Park, J.W.; Genton, M.G.; Ghosh, S.K. Censored time series analysis with autoregressive moving average models. Can. J. Stat. 2007, 35, 151–168. [Google Scholar] [CrossRef] [Green Version]
Aydin, D.; Yilmaz, E. Censored nonparametric time-series analysis with autoregressive error models. Comput. Econ. 2020, 58, 169–202. [Google Scholar] [CrossRef]
Hopke, P.K.; Liu, C.; Rubin, D.B. Multiple imputation for multivariate data with missing and below-threshold measurements: Time-series concentrations of pollutants in the arctic. Biometrics 2001, 57, 22–33. [Google Scholar] [CrossRef] [PubMed]
Ghouch, A.E.; Keilegom, I.V. Non-parametric Regression with Dependent Censored Data. Scand. J. Stat. 2008, 35, 228–247. [Google Scholar] [CrossRef]
Koul, H.; Susarla, V.; Van Ryzin, J. Regression Analysis with Randomly Right-Censored Data. Ann. Stat. 1981, 1276–1285. [Google Scholar] [CrossRef]
Leurgans, S. Linear models, random censoring and synthetic data. Biometrika 1987, 74, 301–309. [Google Scholar] [CrossRef]
Zhou, M. Asymptotic Normality of the ‘Synthetic Data’ Regression Estimator for Censored Survival Data. Ann. Stat. 1992, 20, 1002–1021. [Google Scholar] [CrossRef]
Linton, O.; Nielsen, J.P.; Nielsen, S.F. Non-parametric regression with a latent time series. Econom. J. 2010, 12, 187–207. [Google Scholar] [CrossRef] [Green Version]
Vogt, M. Nonparametric regression for locally stationary time series. Ann. Stat. 2012, 40, 2601–2633. [Google Scholar] [CrossRef]
Gao, J. Nonlinear Time Series: Semiparametric and Nonparametric Methods; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
Chen, J.; Gao, J.; Li, D. Semiparametric trending panel data models with cross-sectional dependence. J. Econom. 2012, 171, 71–85. [Google Scholar] [CrossRef] [Green Version]
Engle, R.F.; Granger, C.W.J.; Rice, J.; Weiss, A. Semiparametric Estimates of the Relation between Weather and Electricity Sales. J. Am. Stat. Assoc. 1986, 80, 310–320. [Google Scholar] [CrossRef]
Härdle, W. Applied Nonparametric Regression (No. 19); Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Green, P.J.; Silverman, B.W. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
Ruppert, D.; Wand, M.P.; Carroll, R.J. Semiparametric Regression (No. 12); Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Guessoum, Z.; Ould-Said, E. On nonparametric estimation of the regression function under random censorship model. Stat. Decis. 2009, 26, 159–177. [Google Scholar] [CrossRef]
Aydin, D.; Yilmaz, E. Modified estimators in semiparametric regression models with right-censored data. J. Stat. Comput. Simul. 2018, 88, 1470–1498. [Google Scholar] [CrossRef]
Kaplan, E.L.; Meier, P. Nonparametric Estimation from Incomplete Observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
Chen, K.; Lo, S.H. On the rate of uniform convergence of the product-limit estimator: Strong and weak laws. Ann. Stat. 1997, 25, 1050–1087. [Google Scholar] [CrossRef]
Gu, M.G.; Lai, T.L. Functional laws of the iterated logarithm for the product-limit estimator of a distribution function under random censorship or truncation. Ann. Probab. 1990, 18, 160–189. [Google Scholar] [CrossRef]
De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 1978. [Google Scholar]
Lyche, T.; Manni, C.; Speleers, H. Foundations of spline theory: B-splines, spline approximation, and hierarchical refinement. In Splines and PDEs: From Approximation Theory to Numerical Linear Algebra; Springer: Cham, Switzerland, 2018; pp. 1–76. [Google Scholar]
Jin, H.; Guan, Y.; Yao, L. Minimum entropy active fault tolerant control of the non-Gaussian stochastic distribution system subjected to mean constraint. Entropy 2017, 19, 218. [Google Scholar] [CrossRef]
Havrylenko, Y.; Kholodniak, Y.; Halko, S.; Vershkov, O.; Miroshnyk, O.; Suprun, O.; Dereza, O.; Shchur, T.; Śrutek, M. Representation of a Monotone Curve by a Contour with Regular Change in Curvature. Entropy 2021, 23, 923. [Google Scholar] [CrossRef] [PubMed]
Eilers, P.; De Menezes, R. Quantile smoothing of array CGH data. Bioinformatics 2005, 21, 1146–1153. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ahmed, S.E.; Aydın, D.; Yılmaz, E. Imputation Method Based on Sliding Window for Right-Censored Data. In International Conference on Management Science and Engineering Management; Springer: Cham, Switzerland, 2020; pp. 433–446. [Google Scholar]
Rippe, R.C.; Meulman, J.J.; Eilers, P.H. Visualization of genomic changes by segmented smoothing using an L₀ penalty. PLoS ONE 2012, 7, e38230. [Google Scholar] [CrossRef] [Green Version]
Frommlet, F.; Nuel, G. An adaptive ridge procedure for l 0 regularization. PLoS ONE 2016, 11, e0148620. [Google Scholar] [CrossRef] [Green Version]
Hurvich, C.M.; Simonoff, J.S.; Tsai, C.L. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B 1998, 60, 271–293. [Google Scholar] [CrossRef]
Li, R.; Liang, H. Variable selection in semiparametric regression modeling. Ann. Stat. 2008, 36, 261–286. [Google Scholar] [CrossRef] [PubMed]
Eilers, P.H.; Marx, B.D. Flexible smoothing with B-splines and penalties. Stat. Sci. 1996, 11, 89–121. [Google Scholar] [CrossRef]
Frank, L.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
Fu, W.; Knight, K. Asymptotics for lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar] [CrossRef]
Anderson, P.K.; Gill, R.D. Cox’s regression model for counting processes: Large sample study. Ann. Stat. 1982, 10, 1100–1120. [Google Scholar] [CrossRef]
Pollard, D. Asymptotics for least absolute deviation regression estimators. Econom. Theory 1991, 7, 186–199. [Google Scholar] [CrossRef]

Figure 1. Some of the datasets generated using Algorithm 2 including both fully observed and censored data points for different censoring levels and sample sizes.

Figure 2. Boxplots of bias values for both the AS and BS methods for all configurations. In the x-axis, b1, b2, and b3 denote

β_{1}, β_{2}

, and

β_{3}

; A1, A2, and A3 denote biases obtained from the AS method for CLs of 5%, 20%, and 40%. Similarly, B1, B2, and B3 denote biases for the BS method, when CLs are 5%, 20%, and 40%.

Figure 2. Boxplots of bias values for both the AS and BS methods for all configurations. In the x-axis, b1, b2, and b3 denote

β_{1}, β_{2}

, and

β_{3}

; A1, A2, and A3 denote biases obtained from the AS method for CLs of 5%, 20%, and 40%. Similarly, B1, B2, and B3 denote biases for the BS method, when CLs are 5%, 20%, and 40%.

Figure 3. Data points, real regression functions, and curves fitted by two methods. In the legend of the plots, f(A) and f(B) represent function estimates obtained from the AS and BS methods, respectively.

Figure 4.

360 °

bar chart for the

R G M S E

s of all simulation combinations.

Figure 4.

360 °

bar chart for the

R G M S E

s of all simulation combinations.

Figure 5. Estimated curves for the seasonality

f (s_{t})

obtained from the AS and BS methods.

Figure 5. Estimated curves for the seasonality

f (s_{t})

obtained from the AS and BS methods.

Table 1. Estimated regression coefficients from the AS and the B-spline (BS) with values of variance and bias.

		β₁ = 3				$β_{2} = 0.5$				β₃ = −1
		$Bias ({\hat{β}}_{1})$		$Var ({\hat{β}}_{1})$		$Bias ({\hat{β}}_{2})$		$Var ({\hat{β}}_{2})$		$Bias ({\hat{β}}_{3})$		$Var ({\hat{β}}_{3})$
n	C.L.	AS	BS	AS	BS	AS	BS	AS	BS	AS	BS	AS	BS
50	5	0.887	0.870	0.936	0.842	0.809	0.786	0.922	0.845	0.867	0.837	0.884	0.804
	20	0.852	0.895	1.180	1.290	0.888	0.892	1.210	1.358	0.963	0.949	1.191	1.336
	40	0.999	1.172	1.455	1.641	0.916	1.108	1.431	1.657	0.946	1.145	1.453	1.674
100	5	0.510	0.470	0.440	0.425	0.539	0.434	0.433	0.422	0.515	0.467	0.439	0.431
	20	0.514	0.610	0.583	0.609	0.538	0.579	0.583	0.609	0.527	0.599	0.590	0.618
	40	0.535	0.433	0.619	0.689	0.525	0.622	0.619	0.689	0.535	0.610	0.629	0.692
200	5	0.285	0.271	0.260	0.253	0.290	0.272	0.255	0.255	0.294	0.271	0.252	0.254
	20	0.310	0.324	0.333	0.355	0.311	0.300	0.325	0.351	0.304	0.296	0.328	0.353
	40	0.314	0.333	0.338	0.352	0.321	0.337	0.332	0.356	0.307	0.336	0.332	0.363

The bolded values indicate the best scores.

Table 2. Outcomes from the fitted nonparametric components.

		$Bias (\hat{α})$		$Var (\hat{α})$		$RMSE (f, \hat{f})$
n	CLs	AS	BS	AS	BS	AS	BS
50	5	1.085	0.629	0.048	0.022	1.135	0.883
	20	1.128	1.498	0.066	0.075	1.099	2.061
	40	1.287	2.510	0.079	0.095	2.511	3.127
100	5	0.961	0.851	0.022	0.025	0.824	0.664
	20	1.040	1.217	0.030	0.041	1.255	1.779
	40	1.070	1.302	0.037	0.070	1.815	2.331
200	5	0.891	0.813	0.009	0.008	0.670	0.435
	20	0.928	0.959	0.013	0.021	1.547	1.871
	40	0.995	1.070	0.017	0.028	2.397	2.882

The bolded values indicate the best scores.

Table 3. The values of performances from the AS and BS methods.

		MAPE			MedAE			GMSE
n	CLs	AS	BS	AR(1)	AS	BS	AR(1)	AS	BS	AR(1)
50	5	0.166	0.157	0.322	0.419	0.383	0.999	3.119	3.510	4.915
	20	0.358	0.348	0.388	0.737	0.896	1.052	4.468	4.920	5.142
	40	0.584	0.688	1.980	1.030	1.519	1.971	7.762	9.542	10.751
100	5	0.154	0.186	0.303	0.323	0.320	0.860	1.001	0.928	3.614
	20	0.333	0.336	0.365	0.668	0.750	0.914	1.870	1.988	4.147
	40	0.514	0.528	1.476	1.025	1.831	1.891	3.663	4.182	6.798
200	5	0.111	0.096	0.283	0.264	0.251	0.717	0.983	0.761	1.935
	20	0.312	0.332	0.364	0.552	0.606	0.847	2.065	2.497	3.411
	40	0.499	0.508	0.654	1.008	1.086	1.501	2.759	2.816	3.131

The bolded values indicated the best scores.

Table 4. Augmented Dickey–Fuller (ADF) test results for the stationarity of time series data and the determination of the appropriate lag.

No. Lag	ADF Test Results	p-Value
0	−2.61	0.318
1	−3.27	0.077
2	−3.52	0.041
3	−3.33	0.066
4	−3.30	0.072

Bold scores are significant score for the 95% confidence level.

Table 5. The performances of the BS and AS methods for the estimation of both parametric and nonparametric components.

Measurement	Bias		Variance
	AS	BS	AS	BS
${\hat{β}}_{1}$	1.941	2.682	1.272	1.703
${\hat{β}}_{2}$	0.915	1.139	1.562	1.624
$\hat{α}$	3.628	4.566	0.067	0.058

The bolded values indicate the best scores.

Table 6. Scores of performance measures for the AS and BS methods obtained from the whole model estimation.

Method	MAPE	MedAE	GMSE	RGMSE	$RMSE (f, \hat{f})$
AS	0.623	0.510	1.275	0.824	1.154
BS	1.315	1.166	1.546	1.212	1.385
AR(2)	1.856	4.506	3.702	2.775	-

The bolded values indicate the best scores.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aydın, D.; Ahmed, S.E.; Yılmaz, E. Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator. Entropy 2021, 23, 1586. https://doi.org/10.3390/e23121586

AMA Style

Aydın D, Ahmed SE, Yılmaz E. Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator. Entropy. 2021; 23(12):1586. https://doi.org/10.3390/e23121586

Chicago/Turabian Style

Aydın, Dursun, Syed Ejaz Ahmed, and Ersin Yılmaz. 2021. "Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator" Entropy 23, no. 12: 1586. https://doi.org/10.3390/e23121586

APA Style

Aydın, D., Ahmed, S. E., & Yılmaz, E. (2021). Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator. Entropy, 23(12), 1586. https://doi.org/10.3390/e23121586

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator

Abstract

1. Introduction

2. Background

Synthetic Data

3. Estimating the Semiparametric Model Based on the BS Estimator

3.1. BS Estimator

3.2. AS Estimator

4. Statistical Properties of the Estimators

4.1. Properties of the Semiparametric BS Estimator

4.2. Properties of the Semiparametric AS Estimator

4.3. Quality Measures for the Fitted Model

5. Further Information for Adaptive-Ridge Penalty

5.1. Asymptotic Distribution and Consistency of the Proposed Estimator

6. Simulation Study

6.1. Assessing the Parametric Component

6.2. Evaluating the Nonparametric Component

6.3. Assessing the Performances of Methods

7. Real-World Data

8. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI