Penalty and Shrinkage Strategies Based on Local Polynomials for Right-Censored Partially Linear Regression

This study aims to propose modified semiparametric estimators based on six different penalty and shrinkage strategies for the estimation of a right-censored semiparametric regression model. In this context, the methods used to obtain the estimators are ridge, lasso, adaptive lasso, SCAD, MCP, and elasticnet penalty functions. The most important contribution that distinguishes this article from its peers is that it uses the local polynomial method as a smoothing method. The theoretical estimation procedures for the obtained estimators are explained. In addition, a simulation study is performed to see the behavior of the estimators and make a detailed comparison, and hepatocellular carcinoma data are estimated as a real data example. As a result of the study, the estimators based on adaptive lasso and SCAD were more resistant to censorship and outperformed the other four estimators.


Introduction
Consider the partially linear (or semiparametric) regression model where z i s are the observations of the response variable, x i = (x i1 , . . . , x ik ) is known k− dimensional vectors of explanatory variables, t i ∈ [a, b] is the value of an extra explanatory variable t, β = (β 1 , . . . , β k ) is an unknown k-dimensional parameter vector to be estimated, f (.) is an unknown univariate smooth function, and ε i s are supposed to be uncorrelated independent random variables with mean zero and finite variance σ 2 ε = E ε 2 . Partially linear models through a nonparametric component are flexible enough to cover many situations; in fact, these models may be an appropriate choice when it is suspected that the response variable z is linearly dependent on x, indicating parametric effects, but nonlinearly related to t i denoting nonparametric effects. Note that model (1) can be expressed in matrix and vector form as where Z = (z 1 , . . . , z n ) , X = [x 1 , . . . , x n ] is an (n × k) design matrix with x i = (x i1 , . . . , x ik ) denoting the i.th k−dimensional row vector of X, f = ( f (t 1 ), . . . , f (t n )) , and ε = (ε 1 , ε 2 , . . . , ε n ) is a random error vector with E(ε) = 0 and Var(ε) = σ 2 I n . For more discussions on model (1), see [1][2][3], among others.
In this paper, we are interested in estimating the parametric and nonparametric components of model (1) when the observations of the response variable are incompletely observed and right-censored by the random censoring variable c i , i = 1, 2, . . . , n, but x i and t i are completely observed. In the case where z i 's are the censored from the right, then any estimation procedure cannot be applied to z i due to censoring. To add the effect of the It should be emphasized that Assumption 1(i) and Assumption 1(ii) are commonly accepted assumptions regarding right-censored models and survival analysis (see [15,16]). Assumption 1(i) is an independency condition that provides identifiability for the model. Assumption 1(ii) indicates that covariates provide the same information about the response variable independent of the existence of censorship (see [17]).
Because of the censoring, the classical methods for estimating the parametric and nonparametric components of model (3) are inapplicable. The most important reason for this is that the censored observations z i and updated random observations y i have different expectations. This problem can be overcome by using so-called synthetic data, as in censored linear models. We refer, for example, to the studies of [6,12], among others. In this context, when the distribution G is known, we use synthetic data transformation where G(.) = 1 − G(.) and G(.) denotes the distribution functions of censoring variables C, as defined in the introduction to this section. The nature of the synthetic data method ensures that Y * iG , X i , W i , i = 1, 2, . . . , n are independent random variables with E y * iG |x i , t i = E(z i |x i , t i ), as described in Lemma 1.

Lemma 1.
If, instead of response observations z i , only {(y i , δ i )} n i=1 is observed in the context of a semiparametric regression model and the censoring distribution G is known, then the regression function (or mean vector) µ = x i β + f (t i ) is a conditional expectation; that is, E y * iG |x i , t i = E(z i |x i , t i ) = µ.
A proof of Lemma 1 is given in Appendix A.1. Lemma 1 cannot be directly applied to the estimation f (·) if the distribution G is unknown. To overcome this difficulty, [6] recommends replacing G with its Kaplan-Meier estimator [18]:Ĝ , t ≥ 0 (5) where y (1) ≤ · · · ≤ y (n) are the order statistics of y 1 , . . . , y n and δ (i) is the corresponding indicator related to y (i) , as defined in previous sections. In this case, that is, when the distribution G is unknown, we consider the following synthetic data transformation: y * iĜ = δ i y i 1 −Ĝ(y i ) −1 , i = 1, 2, . . . , n

Local Polynomial Estimator
Consider the semiparametric regression model defined in (1). Here, we approximate the regression function f (t i ) locally by a polynomial of order p (see [19]). Using a Taylor series expansion in t i at a neighborhood of fixed t i0 , the p.th degree polynomial approximation of f (t i ) yields Note that fixed t i0 is determined in the range t i0 ∈ [t i − δ, t i + δ] for a small real-valued δ and used to estimate t i locally (see [8] for details). The idea is to estimate the components of a semiparametric model, leading to the minimization of the local weighted least squares criterion: where y * iĜ s are the values of the synthetic variable, as defined in (6), K(.) is a kernel function assigning weights to each point, and h is the bandwidth parameter controlling the size of the local neighborhood of t i0 . Additionally, note that vector and matrix notation (8) can be written as follows: min b,β y * Ĝ − Tb − Xβ W n y * Ĝ − Tb − Xβ (9) where y = (y 1 , . . . , h is a n × n weights matrix whose properties are provided in Assumption 4. Note that the minimum problem (9) has a unique solution based on the following matrices:   x 11 x 12 · · · x 1k x 21 x 22 . . . x 2k . . .  For technical convenience, we assume that β is known to be the true parameter. Then the solution to minimizing (9) iŝ b = T ' W n T −1 T ' W n y * Ĝ − Xβ (10) It can be seen from the Taylor series expansion given in (7) that one needs to select the first element of the vectorb = b 0 , . . . ,b k in order to obtainf (t 0 ) =b 0 . Then, for the fixed neighborhood t 0 , the deconvoluted local polynomial estimator of the regression function can be written aŝ where S h = ω 1 T ' W n T −1 T ' W n denotes the deconvoluted local polynomial smoother matrix, ω 1 = (1, t, . . . , t p ) ∈ R (p+1) dimensional matrix having 1 in the first position and 0 otherwise, and the matrices T and W n are as defined above. After the theoretical confirmation by giving Equations (10) and (11), the cases of both model parameters (β, f) are unknown. To obtain the local polynomial-based estimates β L ,f L , the smoother matrix S h given right after Equation (11) is used to calculate the following partial residuals in matrix form: Thus, we obtain a transformed set of data based on local residuals. Considering these partial residuals for the vector β yields the following least squares instead of criterion (9): wherex i is the i.th row of the matrixX. Under Assumptions 2-4, by applying the least squares technique to (14), we obtain a "modified local polynomial estimator"β L for the parametric part of the semiparametric model (3), given bŷ Correspondingly, a "modified local polynomial estimator"f L of the function f (.) for the nonparametric part in the semiparametric model (3) is defined aŝ The implementation details of the Equations (15) and (16) are given in Appendix A2. We conclude this section with the following assumptions necessary to obtain the main results. These assumptions are quite general and easily fulfilled.
where v ij is a sequence of real numbers satisfying Note that Assumption 2 generalizes the conditions of [20,21], where (x ij , t i ) are fixed design points for a partially linear model with uncensored data. Assumption 3 is required to establish asymptotic normality with an observed value t i . Assumption 4. The weight functions W n satisfy these conditions: is an indicator function, a n satisfies lim sup n→∞ na 3 n < ∞, and b n satisfies lim sup n→∞ nb 3 n < ∞.

Ridge-Type Local Polynomial Estimator
In this paper, we confine ourselves to the local polynomial estimators of the vector parameter β and the unknown smooth function f (.) in a semiparametric model. For a given bandwidth parameter h, the corresponding estimators β and f based on model (2) are described by (14) and (15), respectively. Multiplying both sides of model (2) by (I n − S), we obtainZ =Xβ + ε whereZ = (I n − S)Z,ε =f + ε * ,f = (I n − S)f, and ε * = (I n − S)ε, similar to (12) and (13). This consideration turns model (17) into an optimization problem to obtain the estimator of the vector β corresponding parametric part of the semiparametric model in (2). In this context, this model leads to the following penalized least squares (PLS) criterion for the ridge regression problem: where λ is a positive shrinkage parameter that controls the magnitude of the penalty. The solution to this minimization problem (17) provides the following Theorem 1.

Theorem 1.
Ridge-type local polynomial estimator for β is presented byβ RL and is expressed based on the local polynomial smoothing matrix S bŷ where y * Ĝ is a vector of updated response observations, as defined in Equations (6) and (13).
A proof of Theorem 1 is given in Appendix A.3. As shown in Theorem 1, when λ = 0, the ridge-type local polynomial estimate reduces to an ordinary least squares estimate problem based on the local residuals defined in Equations (12) and (13). It should be noted that in order to estimate the unknown function f, we imitate Equation (16) and definê Thus, the estimator (19b) is stated as the ridge-type local polynomial estimator of the unknown function f in the semiparametric model (1.2).

Penalty Estimation Strategies Based on Local Polynomial
Several penalty functions are discussed for linear and generalized regression models in the literature (see [22]). In this paper, we study the minimax concave penalty (MCP), the least absolute shrinkage and selection operation (lasso), the smoothly clipped absolute deviation method (SCAD), the adaptive lasso, and the elasticnet method, which is a regu- larized regression technique that linearly combines the L 1 and L 2 penalties of the lasso and the ridge regression methods, respectively.
In this paper, we suggest local polynomial estimators based on different penalties for the components of the semiparametric regression model. For a given penalty function P λ (β) and tuning parameter λ, the general form of the penalized least squares (PLS G ) of penalty estimators can be expressed as (20) Note that the vectorβ that minimizes (20) for lasso and ridge penalties is known as a bridge estimator, proposed by [23]. On the other hand, elasticnet, SCAD, MCP, and adaptive lasso involve different penalties, which are inspected in the remainder of this paper. It should be emphasized that in the mentioned four penalty functions, in the penalty term P λ (β) = λ ∑ β j q q satisfies the L q norm of the regression coefficients β j (see [24][25][26]). Thus, the different penalty estimators corresponding to the parametric and nonparametric components of the semiparametric model can be defined for different values of degree q and shrinkage parameter λ.

Estimation Procedure for the Parametric Component
From (20), we see that for q = 2, ridge estimates corresponding to the parametric component can be obtained by minimizing the following penalized residual sum of squareŝ where y * iĜ is the ith synthetic observation ofỹ * Ĝ andx i is the i.th row of the matrixX. Notice that the solution (21) has the same regularization estimate stated in (19a). It should also be noted that when q = 1 in (20), we obtain the estimator known as the lasso.
Lasso: Proposed by [24], lasso, a penalized least squares method, is a regularization method for simultaneous estimation and variable selection that estimates with the L 1 penalty. The modified local polynomial estimators based on the lasso penalty can be defined asβ Although Equation (22) may seem subtle, the absolute penalty term makes it impossible to find an analytical solution for the lasso. Initially, lasso solutions are obtained through quadratic programming.
Adaptive lasso: Zou [25] suggested modifying the lasso penalty by using adaptive weights on L 1 penalties on the regression coefficients. This weighted lasso, which has oracle properties, is referred to as the adaptive lasso. The local polynomial estimatorβ aLL using the adaptive lasso penalty can be defined as follows: where w is a weight function given bŷ It should be noted thatβ * is an appropriate estimator of β here. For example, an ordinary least squares (OLS) estimate can be used as a reference value. To obtain the adaptive lasso estimates in (23), it is necessary to choose q > 0 and compute the weights after obtaining the OLS estimate.
SCAD: A disadvantage of the lasso method is that the penalty term is linear in the size of the regression coefficient, so it tends to give highly biased estimates for large regression coefficients. To account for this bias, [26] proposed a SCAD penalty obtained by replacing β j in (22) with P α,λ β j . A modified local estimatorβ SL based on the SCAD penalty can be described asβ where P α,λ (.) is the SCAD penalty defined by It should be stated that here λ > 0 and α > 2 are the penalty parameters, I(.) is the indicator function, and (t) + = max(t, 0). In addition, (24b) is equivalent to the L 1 penalty for α = ∞.
Elasticnet: The elastic net, proposed by [27], is a penalized least squares regression technique that has been widely used in regularization and automatic variable selection to select groups of correlated variables. Note that the elastic net method linearly combines the L 1 penalty term, which enforces the sparsity of the elastic net estimator, and the L 2 penalty term, which ensures appropriate selection of correlated variable groups. Accordingly, the modified local estimatorβ ENL using an elasticnet penalty is the solution of the following minimization problem: where λ 1 and λ 2 are the postive regularization parameters. Equation (25) ensures the estimates corresponding to the parametric part of the semiparametric regression model (2), as in the other methods. MCP: Introduced by [28], MCP is an alternative method to obtain less biased estimates of the nonzero regression coefficients in a sparse model. For the given regularization parameters λ > 0 and α > 0, the local polynomial estimatorβ MCL based on the MCP penalty can be defined aŝ where P α, λ (.) is the MCP penalty given by

The Estimation Procedure for the Nonparametric Component
Equations (21)-(26) provide modified local polynomial estimates based on different penalties for the parametric part of the semiparametric model in (2). Similar in spirit to (19b), the vector of estimated parametric coefficientsβ LL given in (21) can be used to construct the estimation of the nonparametric part in the same model. In this case, we obtain the modified local estimates based on the lasso penalty of the unknown function, given byf LL = S y * Ĝ − Xβ LL (27) as defined in the previous section.
Note that whenβ aLL defined in (23) is written instead ofβ LL in Equation (27), local estimates of the nonparametric part based on the adaptive lasso penalty are obtained and are stated asf aLL symbolically. Similarly, replacingβ LL in (27) withβ SL ,β ENL , andβ MCL yields modified local polynomial estimators, denoted asf SL ,f ENL , andf MCL based on the SCAD, elasticnet, and MCP penalties, respectively, for the nonparametric part of the right-censored semiparametric model (2).

Some Remarks on the Penalties
Remarks on the penalties can be stated as follows:

•
The regularization based on the L 1 norm produces sparse solutions as well as feature selection. However, the L 2 norm produces nonsparse solutions and does not have feature selection.

•
Although all of the regularization methods shrink most of the coefficients towards zero, SCAD, MCP, and adaptive lasso apply less shrinkage to nonzero coefficients. This is known as bias reduction.

•
As noted earlier, the tuning parameter α, used for SCAD and MCP estimations, controls how quickly the penalty rate goes to zero. This affects the bias and stability of the estimates, in the sense that there is a greater chance for more than one local minimum to exist as the penalty becomes more concave.

•
As α → ∞ , the MCP and SCAD penalties converge to the L 1 norm penalty. Conversely, as α → 0 , the bias is minimized, but both MCP and SCAD estimates become unstable. Note also that lower values of the tuning parameter α for SCAD and MCP produce more highly variable, but less biased, estimates.

•
The elasticnet penalty is designed to deal with highly correlated covariates more intelligently than other sparse penalties, such as the lasso. Note that the lasso penalty tends to choose one among highly correlated variables, while elasticnet uses them all.

Performance Indicators
Several performance measurements are described in this section with which to evaluate the performance of the modified six semiparametric local polynomial estimators based on penalty functions: ridge (RL), lasso (LL), adaptive lasso (aLL), SCAD (SL), MCP (MCL), and elasticnet (ENL). Note that the abbreviations given in parentheses here denote the estimators. The performance of the estimators are examined individually for the parametric component, nonparametric component, and overall estimated model. Accordingly, evaluation metrics are given by:

Measures for the Parametric Component
Root mean squared error (RMSE) of estimated regression coefficients β . The calculation of RMSE is given by: whereβ is the obtained estimate of β by any of the introduced six methods. It is replaced byβ RL ,β LL ,β aLL ,β SL ,β ENL , andβ MCL to obtain the RMSE score for each estimator. Coefficient of determination R 2 for the estimated models. Note that R 2 allows us to see overall model performance of the six methods. It can be calculated as follows: Sensitivity, specificity, and accuracy scores obtained from a confusion matrix. If true values of an interested variable are available, the confusion matrix can be obtained as in Table 1. This matrix allows us to measure the performance of the penalty functions for right-censored data. Accordingly, sensitivity, specificity, and accuracy values can be calculated as follows: G score calculated by geometric mean of sensitivity and specificity given in Equation (31):

Measures for the Nonparametric Component
Mean squared error (MSE) is used to measure the performance of the estimated nonparametric components by six methods:f RL ,f LL ,f aLL ,f SL ,f ENL , andf MCL . Assume that f is the fitted nonparametric function obtained from any of the six methods. Accordingly, the MSE is computed as follows: Relative MSE (ReMSE) is used to make a comparison between performances of the six methods on the estimation of the nonparametric component. The calculation of the ReMSE is given by where n m denotes the number of methods, which are six for this paper.

Simulation Study
We carried out an intense simulation study to evaluate the finite sample performance of the introduced six semiparametric estimators for a right-censored partially linear model. These estimators are compared with each other to evaluate their respective strengths and weaknesses in handling the right-censored problem. To obtain reproducibility, simulation codes with functions are provided in the following GitHub link: https://github.com/yilmazersin1 3?tab=repositories accessed on 31 August 2022. The estimators are computed using the formulations in Section 3. The simulation design and data generation are described as follows: Simulation Design: Two main scenarios are considered for generating the zero and nonzero coefficients of the model because the focus of the penalty functions is on making an accurate variable selection. In each scenario, simulation runs are made for • Three sample sizes: n = 50, 150, 300 • Two censoring levels: CL = 10%, 30% • Two numbers of parametric covariates k = 1540 All possible simulation configurations are repeated 1000 times. To evaluate the performance of the methods, the performance indicators described in Section 4 are used. The scenarios are defined in the data generation section below.
Data Generation: Regarding model (1), each element of the model obtained as The true values of regression coefficients are determined for both Scenarios 1 and 2 as follows: For both scenarios, there are 10 nonzero β j 's to be estimated and (k − 10) sparse coefficients. The main purpose of using these two scenarios is that it allows us to measure the capacity of the estimators on the selection of nonzero coefficients when β j 's are close to zero. In addition, these scenarios make it possible to see how the censoring level (CL) affects their performances. These scenarios allow us to inspect the convergence of the estimated coefficients to the true ones when the sample size is becoming larger practically under censored data, which can be counted as an important contribution of this paper.
Regarding the censoring data, the censoring variable c i is generated as c i ∼ N µ y , σ 2 y independently of the initially observed variable y i . An algorithm is provided by [29]. Another important point for this study is the selection of the shrinkage parameters and the bandwidth parameter for the local polynomial approach for the introduced six estimators. In this study, the improved Akaike information criterion (AIC c ), proposed by [30], is used. It can be calculated as follows: is the estimated coefficient based on the shrinkage parameter λ > 0, and the bandwidth h > 0 and σ 2 is the variance of the model, and ∑ r<k I β r (λ, h) = 0 denotes the number of nonzero regression coefficients. Note that, due to the projection (hat) matrix, the introduced estimation procedures (except ridge regression) cannot be written; the number of nonzero coefficients is used instead of the hat matrix.
The results of the simulation study are presented individually for parametric and nonparametric components below. Before that, Figure 1 is presented to provide some information about the generated data. In Figure 1, two plots ( − ) are represented for the different configurations of Scenarios 1 and 2. In plot ( ), scatter plots of the data points for = 50, = 10%, and = 15 are given. In panel (b) of ( ), the working procedure of synthetic data transformation can be seen clearly. It gives zero to right-censored observations and increases the magnitude of observed data points. Thus, it makes equal the expected value of the synthetic response variable and original response variable, as indicated in Section 2. Similarly, plot ( ) shows the scatter plots of data points for = 150, = 30%, and = 40, which makes it possible to see heavily censored cases. In panel (b) of ( ), due to heavy censorship, the magnitude of the increments in the observed data points is larger than (i), which is the working principle of the synthetic data. This is a disadvantage because it significantly manipulates the data structure, although it still solves the censorship problem.
To describe the generated dataset further, Figure 2 shows the nonparametric component of the right-censored semiparametric model. In panel (a), the smooth function can be seen for the small sample size ( = 50), low censoring level ( = 10%), and low number of covariates ( = 15 ). Panel  In Figure 1, two plots (i − ii) are represented for the different configurations of Scenarios 1 and 2. In plot (i), scatter plots of the data points for n = 50, CL = 10%, and k = 15 are given. In panel (b) of (i), the working procedure of synthetic data transformation can be seen clearly. It gives zero to right-censored observations and increases the magnitude of observed data points. Thus, it makes equal the expected value of the synthetic response variable and original response variable, as indicated in Section 2. Similarly, plot (ii) shows the scatter plots of data points for n = 150, CL = 30%, and k = 40, which makes it possible to see heavily censored cases. In panel (b) of (ii), due to heavy censorship, the magnitude of the increments in the observed data points is larger than (i), which is the working principle of the synthetic data. This is a disadvantage because it significantly manipulates the data structure, although it still solves the censorship problem.
To describe the generated dataset further, Figure 2 shows the nonparametric component of the right-censored semiparametric model. In panel (a), the smooth function can be seen for the small sample size (n = 50), low censoring level (CL = 10%), and low number of covariates (k = 15). Panel (b) shows the nonparametric smooth function for n = 150, k = 15, and CL = 10%. It should be emphasized that the censoring level or number of the parametric covariates does not affect the shape of the nonparametric component. Thus, there is no need to show all possible generated functions here. Note that the nonparametric component affects the number of covariates and the censoring level indirectly in the estimation process.
As previously mentioned, this paper introduces six modified estimators based on penalty and shrinkage strategies. In Figure 3, the shrinkage process of the estimators, according to the shrinkage parameter "lambda," is provided in panels (i) and (ii). In panel (i), shrunk regression coefficients are shown for scenario 1, n = 150, CL = 30%, and k = 40. Panel (ii) is drawn for scenario 2, n = 50, CL = 10%, and k = 15. When Figure 3 is inspected carefully, it can be seen that in panel (i), due to a high censoring level and many covariates, the shrinkage of the coefficients is more challenging than in panel (ii). One of the reasons for that is, in Scenario 2, coefficients are determined as smaller than the coefficients in Scenario 1 while generating data. For both panels, it can be observed that the SCAD and MCP methods behave similarly. As expected, they shrunk the coefficients quicker than the others. Additionally, lasso and ElasticNet seem close to each other for both panels. However, adaptive lasso differs from the others in both panels. The reason for this is discussed with the results given in Section 5.1. As previously mentioned, this paper introduces six modified estimators based on penalty and shrinkage strategies. In Figure 3, the shrinkage process of the estimators, according to the shrinkage parameter "lambda," is provided in panels (i) and (ii). In panel (i), shrunk regression coefficients are shown for scenario 1, = 150, = 30%, and = 40. Panel (ii) is drawn for scenario 2, = 50, = 10%, and = 15. When Figure 3 is inspected carefully, it can be seen that in panel (i), due to a high censoring level and many covariates, the shrinkage of the coefficients is more challenging than in panel (ii). One of the reasons for that is, in Scenario 2, coefficients are determined as smaller than the coefficients in Scenario 1 while generating data. For both panels, it can be observed that the SCAD and MCP methods behave similarly. As expected, they shrunk the coefficients quicker than the others. Additionally, lasso and ElasticNet seem close to each other for both panels. However, adaptive lasso differs from the others in both panels. The reason for this is discussed with the results given in Section 5.1.    As previously mentioned, this paper introduces six modified estimators based on penalty and shrinkage strategies. In Figure 3, the shrinkage process of the estimators, according to the shrinkage parameter "lambda," is provided in panels (i) and (ii). In panel (i), shrunk regression coefficients are shown for scenario 1, = 150, = 30%, and = 40. Panel (ii) is drawn for scenario 2, = 50, = 10%, and = 15. When Figure 3 is inspected carefully, it can be seen that in panel (i), due to a high censoring level and many covariates, the shrinkage of the coefficients is more challenging than in panel (ii). One of the reasons for that is, in Scenario 2, coefficients are determined as smaller than the coefficients in Scenario 1 while generating data. For both panels, it can be observed that the SCAD and MCP methods behave similarly. As expected, they shrunk the coefficients quicker than the others. Additionally, lasso and ElasticNet seem close to each other for both panels. However, adaptive lasso differs from the others in both panels. The reason for this is discussed with the results given in Section 5.1.

Analysis of Parametric Component
In this section, the estimation of the parametric component of a right-censored semiparametric model is analyzed, and results are presented for all simulation scenarios in Tables 2-5 and Figures 4-7. The performance of the estimators is evaluated using the metrics given in Section 4: RMSE, R 2 , sensitivity, specificity, accuracy, and G-score. In addition to the performance criteria, the selection ratio of the methods is calculated for the estimators. The selection ratio can be defined as follows: Selection Ratio: The ratio of the selected true nonzero coefficients by the corresponding estimator in 1000 simulation repetitions. The formulation can be given by: j is the estimated coefficient for i.th simulation by any of the introduced estimators. Results are given in Figures 6 and 7.  Tables 1 and 2 include the RMSE scores for the estimated coefficients calculated from (4.1) and R 2 of the model for Scenarios 1 and 2. The best scores are indicated with bold text. If the two tables are inspected carefully, two observations can be made about the performance of the methods for both Scenarios 1 and 2. Regarding Scenario 1, when the sample size is small (n = 50), ENL and SL estimators give smaller RMSE scores and higher R 2 values than the other four methods. On the other hand, when the sample size becomes larger (n = 150, n = 300), aLL takes the lead in terms of estimation performance. The results for different censoring levels show that aLL is less affected by censorship than SL and the other methods. This can be observed in Table 1 clearly. Bold ones are the best performance scores; *: the second-best score in G-scores and accuracy. Bold ones are the best performance scores; *: the second-best score in G-scores and accuracy.
The same is true for the aLL method regarding strength against censorship.  Tables 2 and 3 for all simulation configurations. Figure 4 presents the line plots of the RMSE scores for all simulation cases. As expected, the negative effects of increment on the censoring level and the positive effect of growth on the sample size can be clearly observed from panels (a) and (b). For both sce-  Tables 2 and 3 for all simulation configurations. high, and the number of covariates ( ) is large. In addition, there is an increasing trend from the small to large sample sizes for LL, aLL, SL, and MCL. This trend is most evident for the aLL line, which makes aLL distinguishable. However, interestingly, ENL is not influenced by the change in sample size, and the -scores of ENL do not take a value greater than 0.5. In general, aLL, SL, and LL provide the highest -scores. All -scores for the simulation study are provided in Tables 4 and 5 together with the accuracy values of the methods.   Tables 4 and 5 present the accuracy rates and -scores for all methods and simulation configurations for both Scenarios 1 and 2. Note that, because the ridge penalty is unable to shrink the estimated coefficients towards zero, the specificity of RL is always calculated as zero. Thus, RL does not have a -score. When the tables are examined, it can be clearly seen that the prominent methods are aLL, LL, and SL. The aLL produces satisfactory results for each simulation configuration. On the other hand, the other two methods, LL and SL, give good results under different conditions. When this situation is examined in detail, it can be seen that the SL method produces better results when = 15, and the LL method when = 40 with aLL. In addition, it can be said that the level of censorship and the sample size do not affect this situation, except for an increase or decrease in Bold ones are the best performance scores; *: the second-best score in -scores and accuracy.
Unlike the evaluation criteria given in Section 4, the frequency of choosing the correct coefficients for each method in the simulation study is analyzed, and the results are presented for both scenarios in Figures 6 and 7 with bar plots. Figure 6 presents two panels (i and ii), which demonstrate the impact of censorship, one of the main purposes of this article. As expected, as censorship increases, the frequency of selection decreases for the nonzero coefficients. The point here is to reveal which methods are less affected by this. It can be observed in Figure 6 that the MCL and ENL methods are less affected by censorship in terms of the frequency of selection of nonzero coefficients. However, since these methods are less efficient than the SL, LL, and aLL methods in determining ineffective coefficients, their overall performance is lower (see Tables 4 and 5). On the other hand, the SL, LL, and aLL methods make a balanced selection for both subsets (no effect-non-zero), which indirectly makes them more resistant to censorship.  It shows both effects of sample size and censoring level increment and bar plots for = 40. The detection performance of the methods of nonzero coefficients is less affected than in Figure 6. However, the selection of the ineffective set plays a decisive role in terms of the performance of each of the methods. For example, MCL and ENL performed poorly in the correct determination of ineffective coefficients when the censorship level increased, but LL, aLL, and SL were able to make the right choice under heavy censorship.

Analysis of Nonparametric Component
This section is prepared to show the behaviors of the estimation of nonparametric components by the introduced six estimators. Performance scores of the methods are given in Tables 6 and 7  In Table 2, RMSE and R 2 scores are provided for all simulation configurations of Scenario 2. The results can be distinguished from the results in Table 1 by the higher R 2 values obtained from the modified ridge estimator. However, the RMSE scores of the modified ridge estimator are the largest. This can be explained by the fact that the ridge penalty uses all covariates, whether sparse or nonsparse. Therefore, the estimated model based on the ridge penalty has larger R 2 values. On the other hand, the RMSE scores prove that for small sample sizes (n = 50), ENL and aLL perform satisfactorily. Moreover, as in the case of Scenario 1, when the sample size becomes larger, aLL gives the most satisfying performance. SL-and LL-based estimators also show good performances in the general frame. If Table 2 is inspected in detail, it can be seen that when k = 15, the SL method comes to the front for both low (CL = 10%) and high (CL = 30%) censoring levels. The same is true for the aLL method regarding strength against censorship. Figure 4 presents the line plots of the RMSE scores for all simulation cases. As expected, the negative effects of increment on the censoring level and the positive effect of growth on the sample size can be clearly observed from panels (a) and (b). For both scenarios, a peak can be seen when the sample size is small (n = 50) and the censorship level increases from 10% to 30%. The methods most affected by censorship are MCL, RL, and SL. The least affected are aLL, lasso, and ENL. Thus, Figure 4 supports the results and inferences obtained from Tables 2 and 3.
Together with the RMSE scores, one other important metric to evaluate the performance of the parametric component estimation is G-score, which measures the true selection made by the estimators for the sparse and nonzero subsets based on the confusion matrix given in Table 1. In this context, Figure 5 is drawn to illustrate the G-scores of the methods for all simulation combinations using line plots. Note that the G-score changes between the range [0,1] and the lines of methods that are close to 1 are notated as better than the others in terms of successful determination of sparsity. Figure 5 is formed by two panels: Scenario 1 (left) and Scenario 2 (right). As expected, for all methods except for RL (which does not involve any sparse subset and is therefore not shown in Figure 5 and Table 4), the G-scores diminish when the censoring level is high, and the number of covariates (k) is large. In addition, there is an increasing trend from the small to large sample sizes for LL, aLL, SL, and MCL. This trend is most evident for the aLL line, which makes aLL distinguishable. However, interestingly, ENL is not influenced by the change in sample size, and the G-scores of ENL do not take a value greater than 0.5. In general, aLL, SL, and LL provide the highest G-scores. All G-scores for the simulation study are provided in Tables 4 and 5 together with the accuracy values of the methods.
Tables 4 and 5 present the accuracy rates and G-scores for all methods and simulation configurations for both Scenarios 1 and 2. Note that, because the ridge penalty is unable to shrink the estimated coefficients towards zero, the specificity of RL is always calculated as zero. Thus, RL does not have a G-score. When the tables are examined, it can be clearly seen that the prominent methods are aLL, LL, and SL. The aLL produces satisfactory results for each simulation configuration. On the other hand, the other two methods, LL and SL, give good results under different conditions. When this situation is examined in detail, it can be seen that the SL method produces better results when k = 15, and the LL method when k = 40 with aLL. In addition, it can be said that the level of censorship and the sample size do not affect this situation, except for an increase or decrease in the values. These inferences apply to both scenarios. Here, the difference between the scenarios emerges in the size of the G-scores and accuracy values obtained. It can be said that they are slightly less than the values obtained for Scenario 2.
Unlike the evaluation criteria given in Section 4, the frequency of choosing the correct coefficients for each method in the simulation study is analyzed, and the results are presented for both scenarios in Figures 6 and 7 with bar plots. Figure 6 presents two panels (I and ii), which demonstrate the impact of censorship, one of the main purposes of this article. As expected, as censorship increases, the frequency of selection decreases for the nonzero coefficients. The point here is to reveal which methods are less affected by this. It can be observed in Figure 6 that the MCL and ENL methods are less affected by censorship in terms of the frequency of selection of nonzero coefficients. However, since these methods are less efficient than the SL, LL, and aLL methods in determining ineffective coefficients, their overall performance is lower (see Tables 4 and 5). On the other hand, the SL, LL, and aLL methods make a balanced selection for both subsets (no effect-non-zero), which indirectly makes them more resistant to censorship. Figure 7 presents the different configurations for Scenario 2. It shows both effects of sample size and censoring level increment and bar plots for k = 40. The detection performance of the methods of nonzero coefficients is less affected than in Figure 6. However, the selection of the ineffective set plays a decisive role in terms of the performance of each of the methods. For example, MCL and ENL performed poorly in the correct determination of ineffective coefficients when the censorship level increased, but LL, aLL, and SL were able to make the right choice under heavy censorship.

Analysis of Nonparametric Component
This section is prepared to show the behaviors of the estimation of nonparametric components by the introduced six estimators. Performance scores of the methods are given in Tables 6 and 7, and MSE and ReMSE metrics are used. Additionally, Figures 8 and 9 are provided to show the real smooth function versus all estimated curves for individual simulation repeats. These figures can provide information about the variation of the estimates according to both scenarios and censoring effects. Finally, in Figure 10, estimated curves obtained from all methods are inspected with four different configurations. Tables 6 and 7 include the MSE and ReMSE values for the two scenarios. For Scenario 1, the aLL method gives more dominant values than others, followed by SL and LL. As expected, RL shows the worst performance; however, the difference from the others is small. Note that, when the sample size becomes larger, all methods begin to give similar results. Dependent on this similarity, the ReMSE scores become closer to one, which is an expected result. Thus, even if the censoring level increases, ReMSE scores may decrease. If the tables are inspected carefully, as mentioned in Section 5.1, aLL overcomes the censorship problem better than the others regarding Scenario 1, which means that contributions of covariates are high. However, in Scenario 2, SL shows better performance in high censoring levels, especially in small and medium sample sizes. Additionally, it is clearly observed that the number of covariates (k) affects the performances. In Table 7, when k = 15, the LL and SL methods show good performances.     Figure 8 shows two different simulation configurations for Scenario 1. The purpose of this figure is to illustrate the effect of censorship in curve estimation. Therefore, panel (i) is obtained for 10% censorship, and panel (ii) for 30% censorship. As can be seen at a glance, the minimum and maximum points of the prediction points obtained from all simulations around the real curve are shown with vertical lines. This reveals the range of variation of the estimators. Accordingly, when the difference between the effect of censorship panel (i) and panel (ii) is examined, it can be seen how the range of variation widens. It can be said, with the help of the values in Table 6, that the estimator with the least expansion is aLL and the method with the most is RL. It should also be noted that the SL and LL methods also showed satisfactory results.  Figure 9 shows the effect of censorship on the estimated curves for Scenario 2 with a large sample size and relatively few covariates ( = 15). Because there are too many data points, the lines appear as a black area. Compared with Figure 8, the effect of censorship is less, and the estimators obtain curves closer to the true curve. In addition, due to the large sample size, each method estimated very close curves. This can be clearly seen in Table 7. The obtained performance values were very close to each other. It can therefore be said that the introduced six estimators produce satisfactory results in high samples, and they are relatively less affected by censorship in this scenario.  . This can be explained by the messy scattering of synthetic data, which can be observed in all panels. The censorship level increases the corruption of the data structure. Similarly, panels (c) and (d) are obtained for Scenario 2, but only to observe the effect of the change in censorship level. However, the effect of the large sample size is clearly visible, and the curves appear smooth in panel (d), despite the deterioration in the data structure. If examined carefully, the aLL method gives the closest curve to the true curve. At the same time, the other methods have shown satisfactory results in representing the data.  Figure 9 shows the effect of censorship on the estimated curves for Scenario 2 with a large sample size and relatively few covariates (k = 15). Because there are too many data points, the lines appear as a black area. Compared with Figure 8, the effect of censorship is less, and the estimators obtain curves closer to the true curve. In addition, due to the large sample size, each method estimated very close curves. This can be clearly seen in Table 7. The obtained performance values were very close to each other. It can therefore be said that the introduced six estimators produce satisfactory results in high samples, and they are relatively less affected by censorship in this scenario. Figure 10 consists of four panels (a)-(d) containing four different simulation cases. The first two panels (a and b) show the estimated curves of Scenario 1 for different sample sizes, different censorship levels, and different numbers of explanatory variables. It can be clearly seen that the curves in panel (a) are smoother than in panel (b). This can be explained by the messy scattering of synthetic data, which can be observed in all panels. The censorship level increases the corruption of the data structure. Similarly, panels (c) and (d) are obtained for Scenario 2, but only to observe the effect of the change in censorship level. However, the effect of the large sample size is clearly visible, and the curves appear smooth in panel (d), despite the deterioration in the data structure. If examined carefully, the aLL method gives the closest curve to the true curve. At the same time, the other methods have shown satisfactory results in representing the data.

Hepatocellular Carcinoma Dataset
This section contains the estimation of a right-censored partially linear model for real data, the Hepatocellular carcinoma dataset, by the introduced six estimators (RL, LL, aLL, SL, MCL, and ENL). Their performances are compared, and the results are presented in Table 8 and Figures 11-14. The dataset was collected by [31] to study CXCL17 gene expression for hepatocellular carcinoma. The remaining 12 explanatory variables are added to the parametric component of the model. Accordingly, the right-censored partially linear model can be written as follows: is the (228 × 12) -dimensional covariate matrix for the parametric component of the model, and = ( , … , ) is the (12 × 1)-dimensional vector of the regression coefficients to be estimated. Note that in the estimation process, ( ) cannot be used directly because of censoring. Therefore, synthetic data transformation is applied to ( ) as in (6). Note also that the dataset includes 84 right-censored survival times, which means that the censoring level is = 37%. This ratio can be interpreted as a heavy censoring level in the simulation study. Therefore, it is expected that the results of the real data example should be in harmony with the results of corresponding simulation configurations ( = 150, 300, = 15, = 30%).
To describe the hepatocellular carcinoma dataset, Figures 11 and 12 are provided. Figure 11 is constructed by two panels, (a) and (b). In panel (a), a scatter plot of the data points can be seen with censored and noncensored points. As can be observed, there are a lot of right-censored points in the dataset. To solve this problem, synthetic data transformation is realized and is shown in panel (b). The synthetic data give zero value to rightcensored points and increase the magnitude of the remaining data. Thus, it aims to make equal the expected values of and completely observed response (but we do not know in real cases). Figure 12 presents the plot for response variable ( ) versus nonparametric covariate age to show the nonlinear relationship between them. Accordingly, a hypothetical curve is presented, which proves our claim on the nonlinear relationship.   The aforementioned dataset involves 227 data points and 13 explanatory variables, including age, recurrence-free survival (RFS i ), gender (Gen i ), and HBsAg (sur f ace antigen o f the hepatitis B virus − HB i ). Some variables that were obtained from blood tests to measure liver damage include ALT i (alanine aminotrans f erase), AST i (aspartate aminotrans f erase), and AFP i (apha − f etoprotein). The covariates of tumors detected in the liver are tumor size (TS i ), TN M i (tumor node and metastasis), BCLC i (Barcelona Clinic Liver Cancer Staging System) and values of genes related to liver cancer: CXCL17T (CXCT i ), CXCL17P (CXCP i ), and CXCL17N (CXCN i ). Note that the logarithm of the overall survival time (OS i ) is used as a response variable. Note also that the age variable is used as a nonparametric covariate because of its nonlinear structure. The remaining 12 explanatory variables are added to the parametric component of the model. Accordingly, the right-censored partially linear model can be written as follows: where is the (228 × 12)-dimensional covariate matrix for the parametric component of the model, and β = (β 1 , . . . , β 12 ) T is the (12 × 1)-dimensional vector of the regression coefficients to be estimated. Note that in the estimation process, log(OS i ) cannot be used directly because of censoring. Therefore, synthetic data transformation is applied to log(OS i ) as in (6). Note also that the dataset includes 84 right-censored survival times, which means that the censoring level is CL = 37%. This ratio can be interpreted as a heavy censoring level in the simulation study. Therefore, it is expected that the results of the real data example should be in harmony with the results of corresponding simulation configurations (n = 150, 300, k = 15, CL = 30%).
a lot of right-censored points in the dataset. To solve this problem, synthetic data transformation is realized and is shown in panel (b). The synthetic data give zero value to rightcensored points and increase the magnitude of the remaining data. Thus, it aims to make equal the expected values of and completely observed response (but we do not know in real cases). Figure 12 presents the plot for response variable ( ) versus nonparametric covariate age to show the nonlinear relationship between them. Accordingly, a hypothetical curve is presented, which proves our claim on the nonlinear relationship. Figure 11. Descriptive plots for the right-censored hepatocellular carcinoma dataset.  General outcomes for the analysis of the hepatocellular carcinoma dataset are presented in Table 8, which involves the performance scores of the six estimators. Note that, here, -score cannot be calculated due to real regression coefficients being unknown. In Table 8, RL gives the highest value because it uses all 12 covariates in model estimation, and sparse and nonzero subsets are considered. The aLL and SL methods provide satisfying values with fewer covariates, especially aLL. Regarding the estimation of the nonparametric component, SL gives the best estimation, which supports our inference given before. In addition, aLL gives a smaller MSE value than the other four estimators. Bold ones are the best performance scores; *: the second-best scores. The estimated coefficients are shown with bar plots in Figure 13 to illustrate how the methods work and to make a healthy comparison. In panels (b) and (c), the similar process of aLL and SL can be observed clearly. The ENL and RL methods also look similar to each other, which can be understood from Table 8. Figure 14 involves the six fitted curves obtained by the introduced estimators. At first glance, all the fitted curves are very close to each other, which can be monitored in the MSE scores given in Table 8. However, the difference between RL and the other five methods can be easily observed. Due to the data structure having excessive variation, in the modeling process, the local polynomial method gains importance because it takes into account the local densities. This case can be counted as one of the important contributions of this paper.

Conclusions
The results of the paper obtained from the simulation study are given in Tables 2-7 and Figures 4-10. The analysis is made for both parametric and nonparametric components of the model individually. The advantage of the simulation study is knowing the real values of the regression coefficients; the accuracy and sensitivity of the estimators are thus evaluated, using the confusion matrix in Table 1. From the results, the aLL and SL estimators showed the best performance and gave satisfactory results for the model estimation. In addition, the behaviors of the methods are inspected for three cases, which are sample sizes ( = 50, 150, 300), number of covariates ( = 1540), and censoring level ( = 10%, 30%). These effects are also observed by the figures. A real data example using the hepatocellular carcinoma dataset is analyzed using the introduced estimators. The results of that dataset are compared with the related simulation configurations. By using the mentioned results, concluding remarks are given as follows: • From the simulation results regarding the parametric component estimation, Tables  2-5 prove that the aLL and SL methods give satisfying results in terms of the metrics RMSE, , accuracy, and G-score. In more detail, for small sample sizes and low censoring levels, SL generally shows better performance than the other five methods. However, for the problematic scenarios, the aLL estimator is the best in both estimation performance and making a true selection between zero and nonzero subsets. Figures 4 and 5 support these inferences. • In addition to introduced evaluation metrics, the selection frequency of the estimators is inspected for the simulation study, and results are shown in bar plots given in Figures 6 and 7. These figures demonstrate the consistency of the estimators in terms of their selection of the sparse and nonzero subsets for each coefficient. Under heavy censorship, it can be seen that LL, aLL, and SL gave the best performances. The ENL and MCL estimators did not show a good performance in this case.

•
The introduced six estimators provide closer performances on the estimation of the nonparametric component. Corresponding results are given in Tables 6 and 7 and Figures 8-10. Note that Figures 8 and 9 are drawn to show the individual fitted curves obtained from each simulation, which provides information about the variation of the estimators. Although the estimators give similar evaluation scores and closer fitted curves (which is seen in Figure 10), aLL and SL are the best.

•
In the hepatocellular carcinoma dataset analysis, the outcomes are found in harmony with the corresponding simulation scenarios. The results are provided in Table 8 and To describe the hepatocellular carcinoma dataset, Figures 11 and 12 are provided. Figure 11 is constructed by two panels, (a) and (b). In panel (a), a scatter plot of the data points can be seen with censored and noncensored points. As can be observed, there are a lot of right-censored points in the dataset. To solve this problem, synthetic data transformation is realized and is shown in panel (b). The synthetic data give zero value to right-censored points and increase the magnitude of the remaining data. Thus, it aims to make equal the expected values of Y iĜ and completely observed response Y (but we do not know in real cases). Figure 12 presents the plot for response variable log(OS) versus nonparametric covariate age to show the nonlinear relationship between them. Accordingly, a hypothetical curve is presented, which proves our claim on the nonlinear relationship.
General outcomes for the analysis of the hepatocellular carcinoma dataset are presented in Table 8, which involves the performance scores of the six estimators. Note that, here, G-score cannot be calculated due to real regression coefficients being unknown. In Table 8, RL gives the highest R 2 value because it uses all 12 covariates in model estimation, and sparse and nonzero subsets are considered. The aLL and SL methods provide satisfying values with fewer covariates, especially aLL. Regarding the estimation of the nonparametric component, SL gives the best estimation, which supports our inference given before. In addition, aLL gives a smaller MSE value than the other four estimators.
The estimated coefficients are shown with bar plots in Figure 13 to illustrate how the methods work and to make a healthy comparison. In panels (b) and (c), the similar process of aLL and SL can be observed clearly. The ENL and RL methods also look similar to each other, which can be understood from Table 8. Figure 14 involves the six fitted curves obtained by the introduced estimators. At first glance, all the fitted curves are very close to each other, which can be monitored in the MSE scores given in Table 8. However, the difference between RL and the other five methods can be easily observed. Due to the data structure having excessive variation, in the modeling process, the local polynomial method gains importance because it takes into account the local densities. This case can be counted as one of the important contributions of this paper.

Conclusions
The results of the paper obtained from the simulation study are given in Tables 2-7 and Figures 4-10. The analysis is made for both parametric and nonparametric components of the model individually. The advantage of the simulation study is knowing the real values of the regression coefficients; the accuracy and sensitivity of the estimators are thus evaluated, using the confusion matrix in Table 1. From the results, the aLL and SL estimators showed the best performance and gave satisfactory results for the model estimation. In addition, the behaviors of the methods are inspected for three cases, which are sample sizes (n = 50, 150, 300), number of covariates (k = 1540), and censoring level (CL = 10%, 30%). These effects are also observed by the figures. A real data example using the hepatocellular carcinoma dataset is analyzed using the introduced estimators. The results of that dataset are compared with the related simulation configurations. By using the mentioned results, concluding remarks are given as follows: • From the simulation results regarding the parametric component estimation, Tables 2-5 prove that the aLL and SL methods give satisfying results in terms of the metrics RMSE, R 2 , accuracy, and G-score. In more detail, for small sample sizes and low censoring levels, SL generally shows better performance than the other five methods. However, for the problematic scenarios, the aLL estimator is the best in both estimation performance and making a true selection between zero and nonzero subsets. Figures 4 and 5 support these inferences.

•
In addition to introduced evaluation metrics, the selection frequency of the estimators is inspected for the simulation study, and results are shown in bar plots given in Figures 6 and 7. These figures demonstrate the consistency of the estimators in terms of their selection of the sparse and nonzero subsets for each coefficient. Under heavy censorship, it can be seen that LL, aLL, and SL gave the best performances. The ENL and MCL estimators did not show a good performance in this case.

•
The introduced six estimators provide closer performances on the estimation of the nonparametric component. Corresponding results are given in Tables 6 and 7 and Figures 8-10. Note that Figures 8 and 9 are drawn to show the individual fitted curves obtained from each simulation, which provides information about the variation of the estimators. Although the estimators give similar evaluation scores and closer fitted curves (which is seen in Figure 10), aLL and SL are the best.

•
In the hepatocellular carcinoma dataset analysis, the outcomes are found in harmony with the corresponding simulation scenarios. The results are provided in Table 8 and Figures 13 and 14. Similar to the simulation study, SL and aLL show the best performance. However, from Figure 14, it can be seen that the fitted curves are very close to each other, which can be explained by the large sample size. The real data study demonstrates that all six estimators show considerably good model estimates, which makes valuable the contribution of the paper.
Finally, from the results of both the simulation and real data studies, the introduced six estimators for the right-censored partially linear models based on penalty and shrinkage strategies are compared, and results are presented. It is found that the adaptive lasso (aLL) and SCAD (SL) methods are more resistant than the other four estimators against the effects of censorship and the number of covariates. In general, the ridge (RL) estimator showed poor performance. On the other hand, the lasso (LL), MCP (MCL), and elasticnet (ENL) methods provided good performance for both the parametric and nonparametric components. This study recommends the aLL and SL estimators for the problematic scenarios.