Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator

This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan–Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data’s structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison.


Introduction
Time series datasets are censored from the right under specific conditions, such as a detection limit or an insufficient observation process. Consider a device which cannot measure values above a certain point, which is known as a detection limit. Since the device cannot determine the real value of an observation above its detection limit, such observations are recorded as right-censored data points. The hourly observed cloud ceiling heights data collected by the National Center for Atmospheric Research (NCAR) and modelled by [1,2] can be used as an example of a right-censored time series. Although right-censored time series are encountered frequently in the real world, in the literature, there are truly few studies completed on the estimation of right-censored time series. This may be because censorship is an unwanted data irregularity for the researchers, and it is therefore often ignored or solved by outdated techniques.
To solve the censorship problem before modelling the time series, reference [1] used the Gaussian imputation technique to estimate the series using modified ARMA models. In a similar manner, references [2,3] solved the censorship problem by using data imputation techniques. The common ground of these studies is the use of imputation and data augmentation methods to estimate the regression models with autoregressive errors for right-censored time series. On the other hand, there is an easier way to handle the censorship problem called synthetic data transformation. Although data imputation techniques have some merits, they are generally based on iterative algorithms and their calculations are costly. Reference [4] estimated the temporally correlated and right-censored series by Nadaraya-Watson estimator nonparametrically, solving the censorship problem using a data transformation technique. Various data transformation (or synthetic data) methods have been proposed and studied in the literature for independent and identically distributed (i.i.d.) datasets; for example, see [5][6][7]. Because synthetic data transformation manipulates the data structure, which is disadvantageous, this solution method is no longer the preferred technique for right-censored time series. This paper aims to propose a method which can overcome the disadvantage of the synthetic data transformation method.
Note that the studies mentioned above consider the modeling of time series data using parametric or nonparametric methods. The data structure of a time series in the real world is generally not suitable for parametric modelling, because it requires rigid assumptions to reach reasonable estimates. Single-index nonparametric models, on the other hand, are very flexible, which is an important advantage over parametric methods and there are valuable studies on the subject [2,8,9]. However, nonparametric approaches lose their statistical efficiencies, when the number of covariates increases. In addition, it should be noted that, when a time series dataset is right-censored, the weaknesses of both methods are further increased.
Considering the issues mentioned above, this paper adopts semiparametric regression model for estimating right-censored time series. Although several researchers have introduced different types of semiparametric estimators for time series data, such as [10,11], there remains a significant gap in the research regarding the modelling of right-censored time series data. To address this absence, our paper proposes a modified semiparametric A-spline (AS) estimator based on synthetic data transformation. Thus, the bidirectional flexibility of the semiparametric model will be used, and the censorship problem will be effectively solved.
The paper is designed as follows: the methodology and fundamental ideas about rightcensored semiparametric time series model with autoregressive errors and the synthetic data transformation method are given in Section 2. Section 3 introduces a modified AS estimator for parametric and nonparametric components of the right-censored time series model, and a semiparametric B-spline (BS) is given as a benchmark. Section 4 involves the statistical properties and evaluation criteria for both the modified AS and benchmark BS methods. Section 5 introduces some additional information about the penalty term of the semiparametric AS approach. Sections 6 and 7 contain a detailed Monte Carlo simulation study and a real-world data example, respectively. Conclusions are presented in Section 8.

Background
The classical semiparametric model can be defined as a hybrid model with a finite dimensional parametric component and a nonparametric component having an infinite dimensional nuisance parameter. See [12][13][14][15] for additional information. In both theory and practice, the semiparametric model brings a new perspective to data modeling, since it includes both parametric and nonparametric components. As mentioned in the previous section, it is well-suited to time series data, because it brings the advantages of the semiparametric model to time series analysis.
Suppose that a time series dataset {Z t , x t , s t , t = 1, 2, . . . , n } satisfies an uncensored semiparametric time series model of the form: where Z t s are the observations of stationary time series, x t = x t1 , . . . , x tp and x 1 , . . . , x n are known p-dimensional vectors of the explanatory variables, β = β 1 , β 2 , . . . , β p is an unknown p-dimensional vector of the regression coefficients to be estimated, f (.) is an unknown smooth function that describes the relationship between Z t and a nonparametric temporal covariate s t , and finally, ε t 's are the stationary autoregressive error terms generated by: where ρ 1 , . . . , ρ k are the autoregressive coefficients, and u t denotes the independent and identically distributed random error terms with mean zero and a constant variance.
Model (1) does not include lagged Z t s and has auto-correlated errors. This expression makes it a suitable model for the semiparametric regression analysis of certain kinds of time series. A common problem in practice is that dependent observations Z t s cannot be perfectly collected due to limitations including the detection limit of an evaluation tool or the end time for the study. To express this situation algebraically, we assume that Z t s are censored from the right by a non-negative random variable representing detection limit C t . Therefore, instead of observing the values of Z t , we now observe: where δ t 's denote the censoring information. Suppose that we are interested in estimating the mean semiparametric regression function. The distribution of the observable random variables does not identify the mean regression function uniquely. However, this problem can be solved as follows.
for α ∈ R be cumulative distribution functions of non-negative random variables Z t , C t , and Y t , respectively. If random variables Z t and C t are independent, then the survival function H Y (α) for observed response variable Y t can be defined from the basic relationship between F Z and G C : Given a random sample from the distribution of (Y t , X t , s t , δ t ), it is of interest to examine the explanatory variables' effect on the observations of time series (i.e., response variable) by estimating the survival function H Y (α) = P(Y > α), which is the regression function E(Y t |x t , s t ) = x t β + f (s t ), the conditional mean of time series Y t . However, because of the censoring, ordinary methods cannot be applied directly to estimate the regression function. To overcome censored observations, a data transformation technique should be used. One of the most widely used techniques is the synthetic data transformation, detailed in the section below.

Synthetic Data
To extend the penalized sum of squares approach to right-censored semiparametric regression analysis, we updated the synthetic data approach developed by [5]. The first step is to create an unbiased synthetic response variable of which the expectation is equal to the original and then to obtain the penalized squares estimator by means of this synthetic variable. The main goal of this transaction is to consider the censoring effect on the distribution of response variable. In the case of censored data, the authors of [16,17] used the synthetic data approach.
In the synthetic approach, we replace observed variable Y t with transformed data Y tG ; a transformation maintains the conditional expectation of original variable Z t . To describe this situation, it is easier to proceed directly using the cumulative distributions given in Lemma 1 below. Note also that if G C is known then it is possible to transform observed data {(Y t , δ t ), t = 1, . . . , n} into unbiased synthetic data, given by: where G C (.) is the distribution function of the censoring time C t , as defined before. It should be noted that the distribution of G C is rarely known. In this case, we use the Kaplan-Meier estimator defined by: where Equation (5) has the following properties: (a) if distribution G C is selected arbitrarily, some Y (i) can be identical. In this case, the ranking of Y 1 , . . . , Y n into Y (1) . . . Y (n) is not unique. However, the Kaplan-Meier estimator allows us to define the ranking of Y t uniquely; (b)Ĝ C (.) has jumps only at the censored observations of the time series (see [18]). SubstitutingĜ C (.) for G C (.) in Equation (5), we construct the following synthetic data, given by: Then, one practical consequence of the following Lemma is that synthetic data Y tĜ and completely observed response times Z t have the same conditional expectations, as claimed in before.

Lemma 1.
Consider time series data Z t denoted as a response variable. If the data is censored by random censoring variable C with distribution G C , transform observed series However, generally, G C is unknown as mentioned before. Therefore, Y tĜ is used which is defined in Equation (7), instead of Y tG . Because ofĜ c → G when n → ∞ , (see [5] Let us consider that τ H Y = sup{α : H Y (α) < 1}, where H Y (.) is defined right after Equation (3). In the literature, the convergence rate of the Kaplan-Meier estimator is examined in two classes: (i) restriction of time-interval as [0, α] with α < τ H Y ; (ii) extension of time-interval 0, τ H Y (see [19] for more detailed discussions). Here, the convergence rate of the Kaplan-Meier estimator is inspected with regard to case (ii). However, 0, τ H Y cannot be used without some strong conditions that can be given by: Details about conditions (i)-(iii) were studied by [20]. The convergence ofĜ → G over the interval 0, τ H Y can be provided. Reference [19] clearly shows both strong and weak convergences at the rate n −ϑ where 0 ≤ ϑ ≤ 1/2.
The proof of Lemma 1 is given in Appendix A.
The major concern of this paper is to overcome the censoring problem and to estimate the semiparametric time series model efficiently. To achieve this goal, we used two different approaches, BS and modified AS estimators. In the following section, we applied these approaches to the transformed data to estimate time series observations under random right-censorship.

Estimating the Semiparametric Model Based on the BS Estimator
We first introduce the BS considered for estimating the components of model (1). A univariate B-spline is constructed by a piecewise polynomial function of degree q such that its derivatives up to order (q − 1) is continuous at each knot point r 1 , . . . , r k . The set of BSs of degree q over the real numbers (r 1 , . . . , r k ) = r is a vector space of dimension q + k + 1. In addition, note that k denotes the number of interior knots, while q ≥ 0 indicates the polynomial order. For example, the polynomials of order q = 0, 1, 2, and 3 are defined as constant, linear, quadratic, and cubic BS basis functions, respectively. If the knots are equally spaced (i.e., separated by same distance h = (r k+1 − r k )), the knot points and the corresponding BSs are called uniform. Definition 1. Given an ordered knot vector r = {r 1 ≤ r 2 ≤ . . . ≤ r k } in the domain of covariate s t , then i th BS basis functions B i,q (s t ), i = 1, 2, . . . , q + k + 1 of degree q = 0 and q > 0 can be defined in recursive series, respectively, as: Note that if the denominator of Equation (9) is equal to zero, then the BS basis function is assumed to be zero. From Equations (8) and (9), a set of (q + k + 1) basis functions have the following important properties: Reference [21] proposes an algorithm to solve equation (9). See also the work of [22] for more detailed discussions on the BS approximation. Note also that the BS curve can be uniquely represented as a linear combination of the BSs basis functions in Equation (9), as given in the next section. Note that references [23,24] could be counted as recent studies about BSs.

BS Estimator
As previously noted, in this paper, we fit semiparametric time series model (1) with right-censored data. For this purpose, the BS estimator can be used as an approximation method. Using the synthetic data in Equation (7), we estimated the parametric and nonparametric components of model (1). Therefore, the sum of the squares of the differences between the censored time series values Y tĜ and (x t β + f (s t )) are minimum. Assume that f (.) is a smooth function that can be approximated by a linear combination of the BSs basis functions in Equations (8) and (9): where m = (q + k + 1) is the total number of BS basis functions being used,α i s are estimated coefficients (or control points) for each BS, B is an (n × m)-dimensional matrix which includes BSs as defined by Equation (9) and α = (α 1 , . . . , α m ) is a parameter vector of the BS function. Note also that the autoregressive errors in model (1) follow an n-dimensional multivariate normal distribution with a zero mean and stationary (n × n) covariance matrix Σ, that is, (ε 1 , . . . , ε n ) T ∼ N n (0, Σ ), where the covariance matrix Σ is a symmetric and positive definite matrix with elements: Throughout the paper, the notation is used as Σ −1 = V. Note that V is generally unknown. However, its elements can be obtained by the generalized least squares (GLS) based on an iterative process. Then, as in [25] which is a penalized BS study combining BS and difference penalties, the estimates of the components of semiparametric model (1) were obtained by minimizing the penalized sum of squares (PSS) criterion: where ∆α i = (α i − α i−1 ) is the first-order difference penalty on the coefficients of the BSs. The other differences can be defined as follows: and similarly: Note that if degree q = 0 in Equation (12), we obtain semiparametric ridge regression based on BSs. When λ = 0 in Equation (12), we have the minimization equation of ordinary least squares regression with a correlated error. If λ > 0, the penalty only influences the main diagonal and q sub-diagonals (on both sides of the main diagonal elements) of the banded structure system due to the limited overlap of the BSs.
We rewrite the minimization criterion described as Equation (12) in a matrix and vector notation: where . denotes Euclidean norm, X = (x 1 , . . . , x n ) , YĜ = Y 1Ĝ , . . . , Y tĜ is the synthetic response vector defined in Equation (7), λ > 0 is a smoothing parameter, and D denotes the matrix notation of the difference operator (∆ q ) defined in Equation (13). For example, D is an (n − 2) × n-dimensional banded matrix that corresponds to the second-order difference penalty, given by: From simple algebraic operations, it follows that the solution to the minimization problem in Equation (15) satisfies the following block matrix equation: Given a parameter λ > 0, the corresponding estimators based on BSs for vectors β and α can be easily obtained by: and:β where It should be noted that the estimates of the unknown regression function in a censored semiparametric model are obtained by: From Equations (19) and (20), we see that the fitted values of dependent time series data can be written as: where H BS is a hat matrix for BSs and computed as follows: where

AS Estimator
The adaptive spline (AS) applies an adaptive ridge penalty to the BS method, which makes it more flexible for knot determination. The AS concept is explained in [26] in a nonparametric context. However, in this paper, we generalized this estimation concept to the semiparametric environment based on synthetic response observations. It should be noted that the location and number of knots have crucial importance in terms of synthetic data transformation. This issue is discussed in detail in Section 4.3. The point here is that a more efficient estimator based on synthetic responses is needed, as most of the existing smoothing techniques (spline smoothing, kernel smoothing, etc.) cannot properly handle synthetic data. This article aims to solve this issue with the AS estimator.
When a BS is defined on the knots r 1 ≤ r 2 ≤ . . . ≤ r k such that ∆ q α i = 0 for some i th knot, it may be reparametrized as a BS on the knots r 1 , r 2 , . . . , r i−1 , r i+1 , . . . , r k . Accordingly, when m = (q + k + 1), we want to put a penalty on the number of non-zero differences indicated as below: where ∆ q α i is the q th -order difference operator and ∆ q α i 0 is the L 0 -norm of the differences, that is, ∆ q α i 0 = 0 if ∆ q α j = 0, otherwise, ∆ q α i 0 = 1, and λ is a positive penalty parameter that ensures the tradeoff between the goodness of fit to the data and the smoothness of the fitted curve. This penalty enables us to remove knot r i that is not related to the smoothing problem, to join the neighbor intervals [r i−1 , r i ) and [r i , r i+1 ), and to carry on fitting with a BS described over the remaining knot points. Note also that when λ → 0 , the fitted curve becomes a BS with knots r i , i = 1, 2, . . . , k and when λ → ∞, the fitted function becomes a polynomial of degree q. It should be emphasized that one of the important points about the adaptive ridge penalty is that Equation (23) cannot be differentiated due to the L 0 -norm. As a result, the fitting process is made numerically untraceable. An approximate solution to dealing with the L 0 -norm is provided by [27,28]. Following the studies of these authors, we approximate the L 0 -norm by using an iterative process referred to as an "adaptive ridge" based on synthetic data. The new criterion function is expressed by the following weighted penalized sum of squares: where w i 's denote the positive weights. It should be noted that the penalty is close to the L 0 -norm of the differences when the weights are iteratively calculated from the parameter vector α of BS following the equation: where γ is a constant properly determined by the researcher.

Remark 1.
There are a few important points to know about the selection of γ. If (∆ q α i ) < γ, then the magnitudes of w i 's might be quite large, resulting in (∆ q α i ) ∼ = 0 and the penalty term This convergence gives us a measure of how relevant the i th knot point is. In practice, one possible choice, suggested by [28], is γ = 10 −5 . They select the knots (denoted as r i * ) with a weighted difference bigger than 0.99. The number of parameters of the chosen BS is m λ = q + k λ + 1, where k λ denotes the number of selected knot points.
Note that reference [28] provides a figure to show the effects of different norm degrees (q) on the quality of estimation. It is seen from that the performance of estimation does not change for different values of γ when norm degree is zero (q = 0). However, it affects the performance seriously if q > 0.
For some λ > 0 and non-negative weights, the WPSS of Equation (26) can be rewritten as: where K is a penalty matrix and written as K = D WD, where W = diag w q+1 , . . . , w m and D is the matrix form of the difference operator ∆ q , as defined in Equation (13). Simple algebraic operations show that the solution to the minimization problem WPSS in Equation (26) satisfies the block matrix equation: By similar arguments as in the case of the BS approach, the corresponding estimatorŝ α AS andβ AS of α and β, based on the right-censored semiparametric time series model (1) with correlated data, can be easily obtained, respectively, as: and:β where A AS = X VB B VB + λK −1 B V . The proofs and derivations of Equations (28) and (29) are given in Appendix B. Notice that the estimates corresponding to the nonparametric part of the semiparametric model (1) are obtained using Equation (28) as described in the following equation:f From Equations (29) and (30), we can see that the fitted values of the dependent time series data can be obtained as: where H AS denotes the hat matrix, given by: with To make the computation process efficient, all penalty terms D T WD are calculated by using the iteration process instead of finding matrix D and knot set individually. The iterative algorithm is given in Algorithm 1 below.

Algorithm 1. Iterative algorithm process for the modified A-spline (AS) estimatorα AS .
Input: X, s, YĜ. Output:β 1: Begin 2: Give initial values, β (0) = 1 p , α (0) = 0 q+k+1 and W (0) = I to start iterative process 3: do until converges weighted differences to L 0 -norm For the constant value of γ = 10 −5 , the iteration process repeats between step 3 and step 9 until the pre-determined tolerance value δ = 10 −4 is obtained where From our experience, the expected number of iterations is observed as no.iteration = 20 to achieve the convergence.
Notice that the complexity and efficiency of Algorithm 1 is analyzed from different aspects that are given by: (i) Number of local searches: algorithm does not involve a local search procedure which is an advantage for the speed of Algorithm 1; (ii) Number of nested loops: due to the fact that there is only an iteration loop (without nested loops), if an algorithm does not include nested loops, its "order of growth" will be O(n); (iii) Asymptotic behaviors: as the former inference mentioned, Algorithm 1 has O(n) which means that the limiting case of its convergence speed is considerable when it is compared with its alternative BS method on this issue.
As mentioned at the beginning of this section, the choice of an optimum smoothing parameter λ is required for both semiparametric BS and AS estimators. In this context, the improved Akaike information criterion (AIC c ) proposed by [29] is used, which is computed with the following equation: whereσ 2 is the estimate of the model variance, which is estimated for both methods separately in the next section, and H denotes the hat matrix for any of two methods. It is replaced by H AS for the AS method and H BS for the BS method, respectively.

Statistical Properties of the Estimators
In this paper, we introduced the semiparametric AS and BS estimators for the estimation of the right-censored time series model. It should be noted that these two methods were used for the first time in the setting of a time series estimation procedure. Inferences were therefore carried out about their statistical properties. For example, among these, the error terms obtained from the estimates of both methods and the estimators of parametric and nonparametric components were inspected and their properties were extracted.

Properties of the Semiparametric BS Estimator
Firstly, the parametric component was inspected. As is known, in a parametric context, errors can be decomposed into the bias and the variance terms that provide the quality of the estimator. Accordingly, the estimatorβ BS of the parametric coefficients vector is expanded as follows: (34) where V, A BS and M BS matrices are as defined in Section 3.1 and From here, bias B β BS and variance-covariance V β BS of estimatorβ BS can be computed as follows: where σ 2 is the variance of the fitted semiparametric model. Since the variance is not generally known, instead of σ 2 , the estimation (denoted byσ 2 BS ) based on the BS is used. It can be computed from the residuals sum of squares (RSS) using error terms: where In the context of the BS, if the data have a normal distribution,σ 2 BS is asymptotically unbiased. Secondly, the properties of estimated nonparametric componentα BS = α 1 ,α 2 , . . . ,α q+k+1 are given here. The bias ofα is one of the quality measurements for the estimated model. The bias is denoted as conditional expectation E[α|s t ], given by: From that, the bias is given by: Accordingly, the covariance ofα BS can be computed as: whereσ 2 BS is defined by Equation (36). In addition, to reveal the performance off BS = Bα BS , the root square of mean squared error RMSE f,f BS is used:

Properties of the Semiparametric AS Estimator
Similar to in Section 4.1, the same properties for parametric and nonparametric components are given for the AS estimator here. The necessary expansion is written as follows to derivate the bias and variance ofβ AS : where A AS and M AS are given in Section 3.2. Now, the bias and the covariance matrix of the estimatorβ AS can be provided by: where σ 2 is the variance of the fitted semiparametric model. Similar to Equation (40), instead of the model variance,σ 2 AS is obtained as follows: The properties of estimated nonparametric componentα AS = α 1 ,α 2 , . . . ,α q+k+1 for the AS method are described below. The bias and the variance of the AS estimatorα AS can be given, respectively, as: and Cov(α AS ) =σ 2 Thus, the value of RMSE f,f AS forf AS = Bα AS , similar to Equation (41), is calculated as follows: (48)

Quality Measures for the Fitted Model
After assessing the parametric and nonparametric components of the model in Sections 4.1 and 4.2, several measurements are introduced in this section to evaluate the overall model performance. In the literature on time series modelling, mean absolute percentage error (MAPE), mean absolute error (MAE), and mean squared error (MSE) are the most commonly used performance criteria. To represent these criteria, MAPE is preferred in this study. In addition, median absolute error (MedAE) was used, which allowed us to account for missing or censored data. Generalized MSE (GMSE) and the ratio of GMSE (RGMSE) proposed by [30] and [2], respectively, were used to measure the quality of the fitted time series model. The aforementioned criteria can be defined as follows: whereŶ tĜ andŶĜ denote the fitted dependent variable values and vector for any estimation method. Here,Ŷ tĜ andŶĜ are replaced byŶ tĜ BS andŶĜ BS for the BS, andŶ tĜ A andŶĜ A for the AS. In addition, to make a more considerable comparison between the AS and BS estimators, RGMSE is defined below.

Definition 2.
The ratio of GMSE can be defined as follows: Regarding the RGMSE criterion, if RGMSE (YĜ BS ,ŶĜ AS ) < 1, then it can be seen that the fitted model by the AS method shows better performance then the BS method.

Further Information for Adaptive-Ridge Penalty
The semiparametric AS estimator proposed for the right-censored time series model, with its adaptive nature, aims for qualified estimations despite the censorship. To approach the L 0 -norm given in Equation (23), the most suitable knot locations can be chosen due to the weighted penalty term. Thus, the model avoids the disadvantages of synthetic data transformation, which gives higher magnitudes to uncensored observations. This section is designed to inspect some of the large sample properties of the modified AS estimator under right-censored data. It should be noted that adaptive ridge penalty in the setting of regression has been studied by many authors; see for example [25,26,28]. However, the aforementioned studies consider adaptive ridge penalty individually, not as a part of a semiparametric time series model. This section provides basic information for the large sample properties of the proposed AS estimator in the context of a semiparametric time series model.
As previously stated, the AS approximation is a modified version of the P-splines (penalized BSs) estimator proposed by [31]. Note also that the AS method diverges from BSs with a significant difference of the L 0 -norm in the penalty term. The AS estimator is obtained by an iterative process with determining weights, as expressed in Section 3.2. In addition, apart from the usage of the AS method in the literature, it is also used for modelling censored time series. For these reasons, we can make several important assumptions. The large sample properties are written based on the assumptions given below: Assumption 1. The minimization problem for the semiparametric AS is given in Equation (26). To make this expression more general, it can be rewritten as follows: where ∆ q α j τ represents the τ-norm of the penalty term. The first assumption is τ → 0 , which allows approximation to the L 0 -norm with the acquisition of weights via the iterative process. Otherwise, the L 0 -norm needs overly complex calculations, which leads to the loss of practicality when using the method. From our knowledge of the literature, when τ → 0, such as in Equation (26), the minimization of Equation (50) works by penalizing the non-zero coefficients α j 's, as shown by [32].

Assumption 2.
Whenα AS is examined asymptotically, the objective function of Equation (26) may not have a global minimum, since it is not clearly convex. However, if we assume that: then it is possible to point out some important aspects of asymptotic consistency. Therefore, it should be presumed that R is a non-negative definite matrix and: where elements of diag(R i ) = 1.
, and R are assumed to be full rank matrices. Under the assumptions given above, to see asymptotic consistency ofα AS andβ AS , an equation can be obtained from Equation (50) as follows: where α ASn ,β ASn denote the limiting case of the estimators for λ n = O(n). Note that Equation (52) is ensured by following Theorem 1.
For the proof of Theorem 1, see Appendix C.
To clearly indicate the place of Assumptions 1-3 in the estimation process, the following explanations are given for each assumption.
• Assumption 1 is independent from the data. We assume that to provide a practical solution when minimizing Equation (50). Therefore, in both empirical and real data studies, this assumption does not impose anything to the dataset, but it is necessary to reduce the computational complexity.

•
In real data studies, to ensure Assumption 2, "B" matrix obtained by using the nonparametric covariate needs to have independent columns. Because B B should be identifiable and avoid the ill-posed problem, B B must be a full-ranked matrix.
• Assumption 3 confirms Assumption 2. Thus, it can be seen that asymptotic consistency can be confirmed by Assumption 3. From that it can be said that Assumption 3 is indirectly depended on the dataset.

Asymptotic Distribution and Consistency of the Proposed Estimator
In this section, the estimate of parametric componentβ AS is inspected in terms of asymptotic consistency and asymptotic distribution.
Assume the following regularity conditions: Autoregressive errors ε t 's given in Equation (2) are stationary with independent and identically distributed random error terms u t 's that have zero mean and finite variance 0 < σ 2 < ∞; exists.
Here, condition (ii) indicates that the diagonal elements of F and F n are identical and one, because the covariates are scaled. To obtain the asymptotic distribution ofβ AS , "nearly-singular" designs are performed due to τ → 0 for F n . Thus, it can be ensured that F n → F asymptotically. On the other hand, F n and F are assumed as non-singular in Section 5.1.
To show the consistency and asymptotic normality of the semiparametric AS estimator when conditions (i), (ii), and (iii) are ensured with non-singular F, first the case of τ ≥ 1 is considered, followed by the case of τ < 1.
Letβ AS n be an asymptotic estimator. The consistency ofβ AS n can be reached by using following minimization function: The following theorem shows the consistency ofβ AS n for validated additional assumption λ n = O(n).

Theorem 2.
Assume that F is non-singular,f (s t ) behaves stable, and λ n n −1 → λ 0 ≥ 0 . It can then be said that as n → ∞ :β whereβ AS n is a consistent estimator of β. The proofs of this theorem are given in Appendix D. For λ n = O(n), argmin(ψ) = β and thereforeβ AS n is a consistent estimator.
It should be emphasized that the consistency ofβ AS n is sufficient to show that λ n = O(n). However, this depends on the magnitude of growth of λ n . When λ n grows more slowly, then a limiting distribution √ n β AS n − β exists. It is clear from Theorem 2 that the mean of the limiting distribution of √ n β AS n − β converges to zero for the consistency ofβ AS n . In addition, its asymptotic variance can be obtained based on conditions (i) and (iv) as σ 2 F −1 . Accordingly, the asymptotic distribution of the semiparametric AS estimator is written as: However, the limiting distribution depends on whether τ < 1 or τ ≥ 1. In the context of this paper, Theorem 3 is given for the limiting distribution ofβ AS n when τ < 1.

Simulation Study
In this section, a simulation study was conducted to inspect the finite-sample behaviors and performances of the two semiparametric estimators α BS ,β BS and α AS ,β AS under right-censored time series. These estimators were then compared through the quality measurements given in Section 4. The simulation scenarios are designed as follows: (a) We use the model Z t = X t β + f (s t ) + ε t , t = 1, 2, . . . , n to generate datasets in the simulation experiments. (b) The unknown smooth regression function f (s t ) is constructed by combining the functions S j , j = 1, . . . , 5 that denote seasonal effects on the data, that is, f (s t ) = U 5 j = 1 S j (s i ), where S j (s i ) = s i sin 2 (s i ) with s i = (i−0.5) (c) The design matrix is generated from a normal distribution: X t ∼ N µ x = 5, σ 2 x = 1 , where X t is the (n × p) dimensional matrix for p = 3. Note also that the distribution may not be normal, and that one can thus consider a uniform or other distributions. The vectors of the regression coefficients are β = (3, 0.5, −1).
(d) The autoregressive error terms are generated from a one-lagged process ε t = ρε t−1 + u t with ρ = 0.5 and u t ∼ N(0, 1). (e) Thus, as stated in (a), completely observed dependent time series Z t 's are generated from the sum of the parametric, nonparametric, and error terms using (b), (c), and (d). (f) To produce the right censored variable Y t , as specified in Equation (3), we generate the censoring variable C t from the binomial distribution with proportions or censoring levels (CLs) at 5%, 20%, and 40%. The Algorithm 2, given below, demonstrates how the censoring variable is created.

Algorithm 2. Generation of censoring variable C t .
Input: Completely observed Z t Output: Right-censored dependent variable Y t 1: For given censoring level (CL), produce δ t = I(Z t ≤ C t ) from the binomial distribution 2: for (t in 1 to n) 3: C t = Z t 8: end (for loop in step 2) 9: for (t in 1 to n) 10: Else 13: Y t = C t 14: end (for loop in Step 9) (a) To deal with censored observations in Y t obtained with Algorithm 2, we use synthetic data values Y tĜ obtained through the Kaplan and Meier estimator [18], as described in Equation (6). (b) AR(1) model is used as a naïve model to estimate the right-censored time-series as in [1,2]. Thus, the finite sample performance of the introduced methods can be made.
For each CL in the simulation experiments, we generated 1000 random samples for size n = 50, 100, and 200.
The results of the simulation study were divided into three parts for parametric components, nonparametric components, and overall model performance. Accordingly, the outcomes of the estimated models, comparative results, and corresponding comments are given together in the following tables and figures. To understand the simulated datasets and the scenarios, examples of some of the simulation configurations are given in Figure 1. Panel (a) shows the dataset for small sample size and low censorship. Panel (b) is drawn to show the case when the censoring level is really high. Panels (c)-(d) indicates the cases for medium and large sample sized data with censoring levels 20% and 40% respectively.

Assessing the Parametric Component
In this section, the performances of the two methods were compared in terms of the parametric components of the right-censored semiparametric linear models generated by the simulation. It should be also noted that in this simulation study, 54 different configurations were analyzed to provide a broad perspective of the adequacy of each method. The results from the parametric components in the simulation study are displayed in Table 1 and Figure 2. Note that bold colored scores indicate the best (minimum) scores.
From the careful inspection of Table 1, it can be demonstrated that the behaviors of the BS and AS change noticeably in different scenarios. Let us look at low and medium CLs for n = 50; under these conditions, the BS has remarkable superiority over the AS. This can be interpreted as the BS fitting the data better when the data's structure is distorted less by censorship. However, for CL = 40%, which means the data are heavily censored, the AS method gives better scores.
As the sample size increases, although the bias and variance values from the methods are obtained more closely, the AS provides more efficient performance in estimating the parametric component. Regarding the parametric component, it should be emphasized that the AS behaves as expected and gives the best scores for cases of heavy censorship.
In general, the best scores for each method can be evaluated in terms of bias and variance results. When we examined the bias results of the regression coefficients, the AS method gives the best score in only 12 out of 27 configurations while the BS method gives the best score in 15. However, regarding the variances, the AS gives the best score in 18 of 27 configurations, while the BS is superior in only 9 configurations. In Figure 2, Panels (a-c) shows the calculated biases for each simulation repetition for all cases when sample size is small, medium, and large.

Assessing the Parametric Component
In this section, the performances of the two methods were compared in terms of the parametric components of the right-censored semiparametric linear models generated by the simulation. It should be also noted that in this simulation study, 54 different configurations were analyzed to provide a broad perspective of the adequacy of each method. The results from the parametric components in the simulation study are displayed in Table 1 and Figure 2. Note that bold colored scores indicate the best (minimum) scores.
From the careful inspection of Table 1, it can be demonstrated that the behaviors of the BS and AS change noticeably in different scenarios. Let us look at low and medium CLs for = 50; under these conditions, the BS has remarkable superiority over the AS. This can be interpreted as the BS fitting the data better when the data's structure is distorted less by censorship. However, for = 40%, which means the data are heavily censored, the AS method gives better scores.  Figure 1. Some of the datasets generated using Algorithm 2 including both fully observed and censored data points for different censoring levels and sample sizes.  The bolded values indicate the best scores.

Evaluating the Nonparametric Component
As in the case of parametric components, we constructed 1000 estimates of the regression function f (.), which is the nonparametric component of model (1). For each method, 1000 replications were carried out, and the estimated bias, variance and RMSE values were computed for each estimator. This section is designed to show the simulated results related to the nonparametric component.
The results in Table 2 showed that the AS method proves its efficiency for the estimation of the nonparametric component when time series data are moderately to heavily censored. On the other hand, for CL = 5%, the BS method gives better results for all sample sizes according to our evaluation metrics. One of the main reasons for this is that the BS adapted to the knots more than the AS. Consequently, when the data points are manipulated by censorship, these knots force the BS to make inefficient estimates. At this point, the knot determination of the AS based on the weights given in Equation (24) diminishes the effect of the censorship. That is why the AS method performs better under moderately and heavily censored time series data. 25   20 0.310 0.324 0.333 0.355 0.311 0.300 0.325 0.351 0.304 0.296 0.328 0.353  40 0.314 0.333 0.338 0.352 0.321 0.337 0.332 0.356 0.307 0.336 0.332 0.363 The bolded values indicate the best scores.

of
As the sample size increases, although the bias and variance values from the methods are obtained more closely, the AS provides more efficient performance in estimating the parametric component. Regarding the parametric component, it should be emphasized that the AS behaves as expected and gives the best scores for cases of heavy censorship. In the xaxis, b1, b2, and b3 denote , , and ; A1, A2, and A3 denote biases obtained from the AS method for CLs of 5%, 20%, and 40%. Similarly, B1, B2, and B3 denote biases for the BS method, when CLs are 5%, 20%, and 40%.
In general, the best scores for each method can be evaluated in terms of bias and variance results. When we examined the bias results of the regression coefficients, the AS In the x-axis, b1, b2, and b3 denote β 1 , β 2 , and β 3 ; A1, A2, and A3 denote biases obtained from the AS method for CLs of 5%, 20%, and 40%. Similarly, B1, B2, and B3 denote biases for the BS method, when CLs are 5%, 20%, and 40%.  shows the estimated curves when sample size is large and censoring level is medium. When panels (a) and (c) are analyzed comparatively, the effect of the censorship level can be seen. At the first glance, the distortion of both curves is noticeable. However, the BS method is insufficient to represent censored time series compared to the AS method. In addition, panel (b) shows that when data are heavily censored, the BS curve is drawn towards the x = 0 line, due to the presence of zero values in the synthetic response variable. Finally, panel (d) indicates that although the time series contains censored data points, the qualities of the estimates for both the AS and BS methods become better as the sample size increases.

Assessing the Performances of Methods
This section involves the results for overall model estimations obtained from the AS and BS methods. Although results are given for parametric and nonparametric components in the previous sections, a separate review for the whole model estimation is required for a healthy comparison. Accordingly, the performance scores for , , and are given in Table 3, and Figure 4 is drawn to illustrate the values. Table 3. The values of performances from the AS and BS methods.

Assessing the Performances of Methods
This section involves the results for overall model estimations obtained from the AS and BS methods. Although results are given for parametric and nonparametric components in the previous sections, a separate review for the whole model estimation is required for a healthy comparison. Accordingly, the performance scores for MAPE, MedAE, and GMSE are given in Table 3, and Figure 4 is drawn to illustrate the RGMSE values. As can be seen from the bolded scores, the AS method generally performs better. From Table 3, it can be seen that the values obtained by BS are better for = 50.
However, as mentioned earlier, in this study, the criterion, which is not frequently used for time series data, is used to measure the durability of the predictions. When the scores of this criterion are examined, it is understood that, as stated from the beginning of the study, the BS method has more successful estimates under low censorship levels, but the AS method is superior for medium and high censorship levels. Figure 4 includes the scores for both the AS and BS methods that are formed by the ratio of the values of each method. In Figure 4, the difference between the qualities of the estimates is clearly very small for = 5%. However, the difference becomes more significant for = 20% and = 40%. Note that for = 5%, the BS method gives smaller ratio values, which confirms the results given in Table 3. As stated before, the AS method is demonstrably superior at higher censorship levels, which can be seen in Figure 4 for all sample sizes.

Real-World Data
This section is designed to show how the newly introduced semiparametric estimator AS and benchmark BS method behave with a real right-censored time series dataset. For this purpose, we consider unemployment duration data involving the monthly unemployment period rates years between 2004 and 2019 for Turkey; this dataset is available at https://ec.europa.eu/eurostat/databrowser/view/UNE_RT_M__custom_1635127/default/table?lang=en. In the dataset, the last three months of 2004 and the last three months of 2019 cannot be observed correctly. Therefore, these data points can be censored from the right by the detection limit zero, because none of the data points are negative values. Accordingly, the introduced semiparametric methods, AS and BS, can be used for this time series analysis. In addition, as in the simulation study, the results of the AR model are given in the following tables. However, different from the simulation study, AR(2) model was used for the real data study, because the optimal lag values is determined as = 2 from Table 4. Before the modelling procedure, the stationarity of the time series data was tested with the augmented Dickey-Fuller (ADF) test, the suitable lag is determined under null hypothesis : − . The test results are given in Table 4 below: When Table 3 is examined, it can be seen that the results obtained for the model estimates are slightly different from the previous results, as expected. The total error obtained from the estimation of parametric and nonparametric components is one of the reasons for this discrepancy. In addition, considering the situations where the two methods produce extremely similar scores, this difference can be understood better. Note that AR(1) model shows poor performance, which depends on its parametric and linear structure. However, for the large sample size (n = 200), the scores of models obtained are close to each other. However, it is clearly seen that the AS and BS methods are much better on the estimation of right-censored time series.
As can be seen from the bolded scores, the AS method generally performs better. From Table 3, it can be seen that the MAPE values obtained by BS are better for n = 50. However, as mentioned earlier, in this study, the MedAE criterion, which is not frequently used for time series data, is used to measure the durability of the predictions. When the scores of this criterion are examined, it is understood that, as stated from the beginning of the study, the BS method has more successful estimates under low censorship levels, but the AS method is superior for medium and high censorship levels. Figure 4 includes the RGMSE scores for both the AS and BS methods that are formed by the ratio of the GMSE values of each method. In Figure 4, the difference between the qualities of the estimates is clearly very small for CL = 5%. However, the difference becomes more significant for CL = 20% and CL = 40%. Note that for CL = 5%, the BS method gives smaller ratio values, which confirms the results given in Table 3. As stated before, the AS method is demonstrably superior at higher censorship levels, which can be seen in Figure 4 for all sample sizes.

Real-World Data
This section is designed to show how the newly introduced semiparametric estimator AS and benchmark BS method behave with a real right-censored time series dataset. For this purpose, we consider unemployment duration data involving the monthly unemployment period rates years between 2004 and 2019 for Turkey; this dataset is available at https://ec.europa.eu/eurostat/databrowser/view/UNE_RT_M__custom_163512 7/default/table?lang=en. In the dataset, the last three months of 2004 and the last three months of 2019 cannot be observed correctly. Therefore, these data points can be censored from the right by the detection limit zero, because none of the data points are negative values. Accordingly, the introduced semiparametric methods, AS and BS, can be used for this time series analysis. In addition, as in the simulation study, the results of the AR model are given in the following tables. However, different from the simulation study, AR(2) model was used for the real data study, because the optimal lag values is determined as lag = 2 from Table 4. Before the modelling procedure, the stationarity of the time series data was tested with the augmented Dickey-Fuller (ADF) test, the suitable lag is determined under null hypothesis H 0 : y t is non − stationary. The test results are given in Table 4 below:  Table 4 shows that the second lag for this time series is suitable for the modelling. From this information, the semiparametric time series model can be given by: where UED t s represent the dependent time series of the unemployment duration ratio, UED (t−1) and UED (t−2) denote the first and second lags of the dependent series UED t that are used as covariates, respectively, s t = (1, . . . , n) T denotes the seasonality, and finally, ε t 's are the stationary autoregressive error terms as given in Equation (2). The estimation of model (6.1) is realized by both the AS and BS methods, and then, results are presented in Tables 5 and 6 and Figure 5.  size of the real data of = 186 which is close to the simulation configurations when = 200, scores are relatively close to each other. Figure 5 is given to compare the AS and BS methods in representing data under censorship. As can be seen in Figure 5, the estimated curves are quite similar due to the data properties of a large sample size and a low CL. The effect of synthetic data manipulation is obvious in the figure with zero values. Like the simulation study, the BS method is affected by these zero values more than the AS method. The reason for this is that the knots of the AS method are determined by iteratively calculated weights. Therefore, the optimal knot sequence diminishes the effect of censorship.

Concluding Remarks
This paper demonstrated the estimation of right-censored time series data using a newly introduced semiparametric AS estimator and making a comparison with the BS method as a benchmark. The results obtained from both a simulation study and a real data example proved that the introduced method (AS) achieves the superior modelling of rightcensored time series data in a semiparametric context. Comparative outcomes also support that the AS method provides better performance scores over the BS method in most simulation configurations and the real data example. The most important factor in the success of the AS method is the adaptive nature of the method based on iteratively calculated weights. In the AS method, weights are responsible for determining and controlling the penalty term and for dependently obtaining the optimal knot points. Accordingly, our findings showed that the proposed method provides an advantage in modelling right-censored time series over the benchmark.
The simulation study examined the performance of the methods in three parts: the outcomes for the estimated parametric component (Table 1 and Figure 2), the nonparametric component (Table 2 and Figure 3), and the whole semiparametric model (Table 3 and Figure 4). The unemployment data estimation was evaluated for bias and variance (Table  5) using the criteria of  ,  , , and (Table 6). Given the outcomes of the simulation study and the real data example, our general and detailed conclusions are as follows: • As expected, the estimation qualities for both the AS and BS methods change for different CLs and sample sizes. The performances of the methods are affected negatively by increasing CLs, and they give better results for larger sample sizes. This claim is seen clearly from Tables 1-3.

•
When unemployment duration data were analyzed, it can be seen that the results agreed with the corresponding configuration ( = 200; = 20%) of the simulation study.  Table 5 involves the bias and variance values for estimated regression coefficientŝ β = β 1 ,β 2 andα = α 1 ,α 2 , . . . ,α q+k+1 T . Accordingly, the AS method gives smaller bias and variance values than the BS method regardingβ. Moreover, the AS method has better bias values forα, but the BS method gives smaller variance values forα than the AS method. In overview, the AS and BS methods give similar values, because the data properties are n = 186 and CL = 8.1%. Thus, it can be seen that the results of the unemployment duration data ensure the simulation outputs.
In addition, it should be noted that the outcomes obtained from estimated model (7.1) are given in Table 6 with RMSE scores for the estimated nonparametric function f (s t ). Upon close inspection, it is obviously seen from the results that the AS method produces the best scores. It should be emphasized that the largest difference between the methods regarding performance criteria is in MedAE, which indicates the strength of the AS method under censorship. Table 6 indicates the results of AR(2) model that are worse than the results of the other two as in the simulation study. Note that because of the sample size of the real data of n = 186 which is close to the simulation configurations when n = 200, scores are relatively close to each other. Figure 5 is given to compare the AS and BS methods in representing data under censorship.
As can be seen in Figure 5, the estimated curves are quite similar due to the data properties of a large sample size and a low CL. The effect of synthetic data manipulation is obvious in the figure with zero values. Like the simulation study, the BS method is affected by these zero values more than the AS method. The reason for this is that the knots of the AS method are determined by iteratively calculated weights. Therefore, the optimal knot sequence diminishes the effect of censorship.

Concluding Remarks
This paper demonstrated the estimation of right-censored time series data using a newly introduced semiparametric AS estimator and making a comparison with the BS method as a benchmark. The results obtained from both a simulation study and a real data example proved that the introduced method (AS) achieves the superior modelling of right-censored time series data in a semiparametric context. Comparative outcomes also support that the AS method provides better performance scores over the BS method in most simulation configurations and the real data example. The most important factor in the success of the AS method is the adaptive nature of the method based on iteratively calculated weights. In the AS method, weights are responsible for determining and controlling the penalty term and for dependently obtaining the optimal knot points. Accordingly, our findings showed that the proposed method provides an advantage in modelling right-censored time series over the benchmark.
The simulation study examined the performance of the methods in three parts: the outcomes for the estimated parametric component (Table 1 and Figure 2), the nonparametric component ( Table 2 and Figure 3), and the whole semiparametric model (Table 3 and Figure 4). The unemployment data estimation was evaluated for bias and variance (Table 5) using the criteria of MAPE, MedAE, GMSE, and RGMSE (Table 6). Given the outcomes of the simulation study and the real data example, our general and detailed conclusions are as follows:

•
As expected, the estimation qualities for both the AS and BS methods change for different CLs and sample sizes. The performances of the methods are affected negatively by increasing CLs, and they give better results for larger sample sizes. This claim is seen clearly from Tables 1-3.

•
When unemployment duration data were analyzed, it can be seen that the results agreed with the corresponding configuration (n = 200; CL = 20%) of the simulation study.

•
One of the striking results of this paper is that, as Tables 1-3 demonstrate, while the AS method gives worse results at low censorship levels than the BS method, it provides significantly better results at medium and high censorship levels. This conclusion proves the claim of the paper, which is that using the AS method reduces the effect of the data manipulation of synthetic data transformation. • When all the results obtained from simulation and real data studies were inspected, the AS method gives better results than the BS method, except in the configurations for low CLs, which supports the targeted conclusion. • Unemployment duration data were modelled by the BS and AS methods using two lagged parametric components and the seasonal effect as a nonparametric component. Tables 5 and 6 show each method's scores using four evaluation metrics, which indicate the superiority of the AS method. Figure 5 shows the estimated curves for both methods, which are similar. However, the estimated curves show that the AS method is less affected by zero values of synthetic data and thus gives more satisfying estimates for the right-censored time series model than the BS method.
Finally, as can be understood from the whole paper, the AS method is superior for estimating right-censored time series over the BS method in both theory and practice.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
We consider unemployment duration data involving the monthly unemployment period rates years between 2004 and 2019 for Turkey; this dataset is available at https://ec.europa.eu/eurostat/databrowser/view/UNE_RT_M__custom_1635127/default/table? lang=en.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Proof of Lemma 1. Lemma 1 can be ensured based on the common censorship assumption that Z t and C t are independent. From that, the proof can be written as follows: Thus, Lemma 1 is proven. Here, G(.) = 1 − G(.). Generally, distribution G(.) is unknown. Therefore, its Kaplan-Meier estimatorĜ(.) is used instead of G(.), which is given in Equation (5).
To show the derivations of Equations (29) and (30), two equations obtained from Equation (27)  Thus, if β is replaced byβ AS , thenα AS can be written as: Therefore, Equation (27) can be derived. Accordingly, the derivation ofβ AS can be obtained by using (B1): X VX β + X VB B VB + λK The derivations of Equations (29) and (30) are thus completed.