Penalty Strategies in Semiparametric Regression Models

Alhassan, Ayuba Jack; Ahmed, S. Ejaz; Aydin, Dursun; Yilmaz, Ersin

doi:10.3390/mca30030054

Open AccessArticle

Penalty Strategies in Semiparametric Regression Models

¹

Department of Statistics, Mugla Sıtkı Kocman University, Mugla 48000, Turkey

²

Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2025, 30(3), 54; https://doi.org/10.3390/mca30030054

Submission received: 4 February 2025 / Revised: 28 April 2025 / Accepted: 30 April 2025 / Published: 12 May 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study includes a comprehensive evaluation of six penalty estimation strategies for partially linear models (PLRMs), focusing on their performance in the presence of multicollinearity and their ability to handle both parametric and nonparametric components. The methods under consideration include Ridge regression, Lasso, Adaptive Lasso (aLasso), smoothly clipped absolute deviation (SCAD), ElasticNet, and minimax concave penalty (MCP). In addition to these established methods, we also incorporate Stein-type shrinkage estimation techniques that are standard and positive shrinkage and assess their effectiveness in this context. To estimate the PLRMs, we consider a kernel smoothing technique grounded in penalized least squares. Our investigation involves a theoretical analysis of the estimators’ asymptotic properties and a detailed simulation study designed to compare their performance under a variety of conditions, including different sample sizes, numbers of predictors, and levels of multicollinearity. The simulation results reveal that aLasso and shrinkage estimators, particularly the positive shrinkage estimator, consistently outperform the other methods in terms of Mean Squared Error (MSE) relative efficiencies (RE), especially when the sample size is small and multicollinearity is high. Furthermore, we present a real data analysis using the Hitters dataset to demonstrate the applicability of these methods in a practical setting. The results of the real data analysis align with the simulation findings, highlighting the superior predictive accuracy of aLasso and the shrinkage estimators in the presence of multicollinearity. The findings of this study offer valuable insights into the strengths and limitations of these penalty and shrinkage strategies, guiding their application in future research and practice involving semiparametric regression.

Keywords:

shrinkage estimators; penalty functions; kernel smoothing; semiparametric regression

1. Introduction

The Partially Linear Regression Model (PLRM), introduced by [1], serves as a versatile tool for utilizing the advantage of both parametric linear models and nonparametric regression models. It allows for the modeling of complex relationships by incorporating both linear and nonlinear components, thereby offering greater flexibility and interpretability across various scientific disciplines, including social sciences, biology, and economics. Formally, the PLRM is defined as:

y_{i} = x_{i}^{'} β + f (t_{i}) + ε_{i}, i = 1, \dots, n

(1)

where

y_{i}' s

are the observations of the response variable,

x_{i} = (x_{i 1}, \dots, x_{i k})

and

t_{i}' s

are the values of the explanatory variable,

β = {(β_{1}, \dots, β_{k})}^{'}

is an unknown k-dimensional parameter vector to be estimated,

f (.)

is an unknown smooth function, and

ε_{i}' s

are assumed to be uncorrelated random variables with mean zero and a common variance

σ^{2}

and independent of the explanatory variables. Note also that the observations of

y_{i}

depend linearly on the entries of

x_{i}

and nonlinearly on the values of a univariate variable

t_{i}

. The PLRM can be re-expressed in matrix and vector form:

y = X β + f + ε

(2)

where

y = {(y_{1}, \dots, y_{n})}^{'}

,

X = {[x_{1}, \dots, x_{n}]}^{'}

is an (

n \times k

) design matrix with

{x_{i} = (x_{i 1}, \dots, x_{i k})}^{'}

denoting the i.th

k -

dimensional row vector of the design matrix X, and

f = {(f (t_{1}), \dots, f (t_{n}))}^{'}

and

ε = (ε_{1}, . ε_{2}, \dots, ε_{n})

are random error vectors with

E (ε) = 0

and

V a r (ε) = σ^{2} I_{n}

. For more discussion on Model (1), see [2,3,4], among others. Since it was initially presented in [1], this model has been very popular and is frequently used in the social, biological, and economic sciences. The PLRM generalizes both the parametric linear regression model and nonparametric regression model. When the

f = 0

, Model (1) reduces to the linear regression model

y_{i} = x_{i} β + ε_{i}

, when the

β = 0

model (1) becomes a nonparametric regression model

y_{i} = f (t_{i}) + ε_{i}

(3)

with a univariate covariate. In addition, because PLRMs contain both linear and nonlinear components, they are more flexible than linear models. In this work, we are interested in estimating parameter vector

β

and function

f

.

Estimating the parameters and the smooth function in PLRM has been a focal point of research, with various methodologies proposed to enhance estimation accuracy and model interpretability. Early contributions [4,5] utilized kernel smoothing techniques for estimating the nonparametric component, laying the groundwork for subsequent advancements.

The advent of regularization methods introduced in [6,7] revolutionized parameter estimation by addressing issues such as multicollinearity and high-dimensionality through penalty functions. In particular, the authors of [8,9] explored penalized least squares in semiparametric models, emphasizing the integration of penalty functions with kernel smoothing to enhance estimator performance (see also [10,11,12,13]).

Adaptive methods, such as the Adaptive Lasso (aLasso) introduced in [14], have been incorporated into PLRM frameworks to improve variable selection consistency. Moreover, SCAD [15] and MCP [16] penalties have been employed to mitigate the bias inherent in

L_{1}

penalties, providing more accurate parameter estimates in complex models. Recent studies [17,18] highlight the asymptotic properties of penalized estimators in semiparametric regression, emphasizing the significance of penalty strategies in PLRM. Challenges in balancing bias and variance persist, particularly in sparsity contexts with multicollinearity.

In regression problems, multicollinearity emerges when variables are highly correlated with each other and the dependent variable. This situation inflates confidence intervals, weakens parameter estimates, and leads to less reliable predictions. It also raises sampling variance, reducing accuracy for both inference and prediction. To counter these challenges in PLRMs, we integrate shrinkage estimation with penalty functions to bolster robustness and manage both multicollinearity and variable selection:

We tackle variable selection and multicollinearity in Model (1), which features strongly correlated covariates, using six penalty functions: Ridge, Lasso, aLasso, SCAD, Elastic Net, and MCP.
We then review and compare these methods, highlighting their strengths, limitations, and uses within PLRMs.
To further enhance parameter estimation, we incorporate shrinkage estimation techniques—both standard and positive shrinkage estimators—into the penalty-based framework.
Finally, kernel smoothing via penalized least squares is employed to estimate both the parametric (linear) and nonparametric (smooth) components of the model.

The remainder of this paper is structured as follows. At the end of this section, we summarized the background on penalty functions and partially linear models and their integration. Section 2 describes the kernel smoothing method. Section 3 introduces the kernel-type ridge estimator. In Section 4, we discuss penalty functions and shrinkage estimators based on kernel smoothing. Section 5 presents the evaluation metrics for both parametric and nonparametric components. Section 6 outlines the estimators’ asymptotic properties. Section 7 details the Monte Carlo simulations. Section 8 compares the performance of our proposed estimators on real data. Finally, Section 9 provides concluding remarks and includes supplementary technical materials.

Background

In this section, we present a relevant literature review of partially linear regression models (PLRMs) and show why penalty functions and shrinkage estimators are essential for handling challenges like multicollinearity, high-dimensional data, and complex nonlinearities. We also explain how kernel smoothing complements these approaches by capturing intricate patterns in the data based on the previous studies. Altogether, these methods offer a strong foundation for building reliable and flexible statistical models.

Partially Linear Regression Models (PLRMs) are a cornerstone of semiparametric statistics, offering a powerful way to model relationships that are not fully captured by purely parametric or nonparametric approaches. They blend the interpretability of linear models with the flexibility of nonparametric methods, making them ideal for data with both linear and nonlinear components. Their ability to isolate certain covariate effects while accommodating nonlinearities has led to widespread use in fields such as economics, biostatistics, and environmental science [19]. Recent developments have expanded their scope to high-dimensional, longitudinal, and survival data [20,21].

Regularization methods, particularly those using penalty functions, have revolutionized regression analysis by tackling issues like multicollinearity, high dimensionality, and model selection. By constraining coefficient magnitudes, these methods shrink coefficients toward zero and improve model stability. Different penalty functions yield diverse estimators suited to various data characteristics and research goals.

Ridge regression, pioneered in [22], is well-established, yet its application in high-dimensional contexts and its ties to Bayesian methods continue to be studied [23]. Lasso has also evolved substantially [6], with research on its consistency in variable selection, high-dimensional settings, and extensions to generalized linear models [24]. Ahmed and colleagues contributed to Lasso’s development in PLRMs using Stein-type shrinkage estimators, especially under multicollinearity [25].

Building on the adaptive Lasso (aLasso) [26], researchers have explored data-driven weights for improving variable selection and estimation. For SCAD, recent work has focused on high-dimensional data, quantile regression, and robust estimation [27]. The researchers applied SCAD along with the other penalty functions and with shrinkage estimators in PLRMs, showing its effectiveness in handling outliers [10]. MCP, initially studied in [16], has inspired efficient algorithms and further theoretical exploration [25]. Elastic Net, proposed in [26], has also gained traction in areas like genomics, imaging, and finance, where it handles correlated predictors well [7]. Its benefits are used for balancing bias and variance in PLRMs [11].

Beyond these penalty-based approaches, shrinkage estimation techniques—particularly Stein-type estimators—offer another route. The authors of [28] investigated partially linear models by splitting the coefficient vector into main and nuisance effects and comparing shrinkage, pretest, and absolute penalty estimators. Their results show that a Stein-type shrinkage estimator uniformly outperforms the conventional least squares approach, especially when the nuisance parameter space is large, while the pretest estimator only offers benefits in a limited region. These methods shrink coefficients toward a chosen target, thereby reducing variance and mitigating noise [29,30,31]. Further, Ahmed’s work demonstrates that these techniques can improve estimation accuracy and interpretability in partially linear regression models [8,32,33]. Meanwhile, kernel smoothing remains central to nonparametric estimation for capturing nonlinear relationships, with advances such as adaptive bandwidth selection [34] and its integration with penalty and shrinkage methods [35]. The authors of [36] showed via simulation that applying Stein-type shrinkage to the parametric components of the semi-nonparametric regression model [37] enhanced accuracy. The authors of [38] proposed a two-stage approach using LASSO/Elastic-Net for variable selection followed by post-selection shrinkage estimation to improve predictions in high-dimensional sparse linear regression models, while the authors of [10] introduced shrinkage-type estimators for reliability and stress–strength parameters from progressively censored data that outperformed MLEs in simulations and industrial applications. Additional insights on shrinkage estimators are provided in [31,39,40,41,42,43,44].

Combining penalty functions with kernel smoothing has become a potent strategy for complex data in PLRMs. This approach refines the parametric part through penalty-driven selection and shrinkage, while kernel smoothing captures nonlinearities. Recent studies highlight theoretical and practical benefits, including improved estimation accuracy and consistency in variable selection under multicollinearity or high dimensionality [45]. The advanced penalized kernel smoothing in PLRMs using shrinkage estimators is shown in [32], showing strong performance in simulations and real data. That work illustrates the advantages of using penalties like aLasso and shrinkage estimation alongside kernel smoothing, especially with high-dimensional covariates and intricate nonlinearities.

2. Estimation Based on Kernel Smoothing

Let us first consider the nonparametric estimation of the unknown regression function

f (t)

in Model (1). For simplicity, we assume that

β

is known from Equation (1). The relationship between

y_{i} - X_{i} β

and

t_{i}

in this instance may be represented as

(y_{i} - X_{i} β) = f (t_{i}) + ε_{i,} i = 1, \dots, n

(4)

The nonparametric part of the semiparametric model is equivalent to Equation (4). This results in the Nadaraya-Watson estimator, commonly known as the kernel estimator, which was proposed in [46,47], as mentioned in [4]:

\hat{f} (h, t) = \sum_{i = 1}^{n} w_{i} (h, t_{i}) (y_{i} - X_{i} β) = W_{h} (y - X β)

(5)

where

W_{h}

is a kernel smoothing matrix with

j

-th entries

w_{h i}

and where

h

is a smoothing parameter (or bandwidth), as shown by:

w_{i} (h, t_{i}) = K (\frac{t - t_{i}}{h}) / \sum_{i = 1}^{n} K (\frac{t - t_{i}}{h}) = K (u_{i}) / \sum_{i} K (u_{i})

(6)

To estimate

f (t)

, kernel smoothing (regression) uses appropriate weights

w_{i} (h, t)

, as shown in Equation (4). The kernel function

K (u)

with a smoothing parameter

h

, which defines the size of the neighborhood around

t

, directs the weights assigned to the observations

t_{i}

[48,49]. The kernel or weight function

K (u)

in Equation (6) has the properties of

\int K (u) d u = 1

and

K (u) = K (- u)

. The kernel function is chosen to give the closest observations the most weight and the farthest from

t

the least weight.

The following partial residuals may be obtained via the matrix and vector forms of Model (2) in matrix form:

ε = y - X β - \hat{f} = (I - W_{h}) (y - X β) = \tilde{y} - \tilde{X} β

(7)

where

\tilde{X} = (I - W_{h}) X

and

\tilde{y} = (I - W_{h}) y .

Hence, a transformed set of data based on kernel residuals is what we obtain as a result. The following weighted least squares (WLS) criterion is obtained by considering these partial residuals for the vector

β

:

W L S (β; h) {= ((I - W_{h}) (y - X β))}^{T} (I - W_{h}) (y - X β {)) = (\tilde{y} - \tilde{X} β)}^{'} (\tilde{y} - \tilde{X} β)

(8)

The result for the criterion

W L S (β; h)

provided in Equation (8) is easily demonstrated to be similar to the solution of ordinary least squares (OLS)

{{\hat{β}}_{K S} (h) = ({\tilde{X}}^{'} X)}^{- 1} {\tilde{X}}^{'} \tilde{y}

(9)

Additionally, revising the steps for

f (t)

reduces Equation (5) to the form:

\hat{f} (h, t) = \sum_{i = 1}^{n} \frac{K (\frac{t - t_{i}}{h})}{\sum_{i = 1}^{n} K (\frac{t - t_{i}}{h})} (y_{i} - x_{i} \hat{β})

(10)

Another way to express Equation (10) in a matrix is as follows:

{\hat{f}}_{K S} = W_{h} (y - X {\hat{β}}_{K S}) .

(11)

Thus, our estimation for

y_{K S}

becomes:

{\hat{y}}_{K S} = X {\hat{β}}_{K S} + {\hat{f}}_{K S} = X ({\tilde{X}}^{'} {\tilde{X})}^{- 1} {\tilde{X}}^{'} \tilde{y} + W_{h} (y - X {\hat{β}}_{K S}) = H_{K S} (h) y

for the hat (projection) matrix

H_{K S} (h) = W_{h} + \tilde{X} {({\tilde{X}}^{'} X)}^{- 1} {\tilde{X}}^{'} (I - W_{h})

(12)

3. Kernel Type Ridge Estimator in Semiparametric Model

For the sake of this paper, we limit our attention to kernel smoothing estimators of vector parameter

β

and unidentified smooth function

f (.)

in a semiparametric model. The corresponding estimators

β

and

f

are based on Model (2) for a given bandwidth parameter

h

. By multiplying

(I_{n} - W_{h})

on both sides of Model (2), we obtain the following model:

\tilde{y} = \tilde{X} β + \tilde{ε}

(13)

where

\tilde{y} = (I_{n} - W_{h}) y

,

\tilde{ε} = \tilde{f} + ε^{*},

\tilde{f} = (I_{n} - W_{h}) f

and

ε^{*} = (I_{n} - W_{h}) ε

.

These considerations show that Model (13) is the optimal solution for obtaining vector

β

estimator, corresponding to the parametric part of the semiparametric Model (2). For the ridge regression problem, this model provides the following penalized least squares (PLS) criterion:

P L S_{R K} = \underset{β}{arg min} (\tilde{y} - \tilde{X} β)' (\tilde{y} - \tilde{X} β) + λ β' β

(14)

where

λ

denotes a positive shrinkage parameter that regulates the severity of the penalty. The solution of the minimization problem (14) can be written as follows as a ridge-type kernel smoothing estimator (see [18]), similar to [22], but modified based on partial residuals obtained using smoothing matrix

W_{h} :

{{\hat{β}}_{R K} (λ) = ({\tilde{X}}^{'} \tilde{X} + λ I_{k})}^{- 1} {\tilde{X}}^{'} \tilde{y}

(15)

The ridge-type kernel smoothing estimate simplifies to an ordinary least squares estimation problem when

λ = 0

on the basis of the local residuals specified in Equations (9) and (11). Additionally, to estimate the unknown function

f

, we imitate Equation (11) and define it.

{\hat{f}}_{R K} = W_{h} (y - X {\hat{β}}_{R K} (λ))

(16)

Thus, in semiparametric Model (2), the estimator (16) is defined as the ridge type kernel estimator of unknown function

f

.

4. Penalty Functions and Shrinkage Estimators

Many penalty functions for linear and generalized regression models have been studied [11,32]. Here, we consider the MCP, Lasso, SCAD, aLasso, and ElasticNet functions and shrinkage techniques additionally. Note that ElasticNet is a regularized regression technique that linearly combines the

L_{1}

and

L_{2}

penalties of the Lasso and ridge regression methods, respectively.

In this paper, we introduce the kernel smoothing estimators based on several penalties for the semiparametric regression model’s components. For a given penalty function and tuning parameter

λ

, the general form of the penalty estimators is defined by the following penalized least squares

(P L S)

criterion:

P L S = \underset{β}{a r g m i n} \{\sum_{i = 1}^{n} {({\tilde{y}}_{i} - {\tilde{x}}_{i}^{'} β)}^{2}\} + P_{λ} (β) = {(\tilde{y} - \tilde{X} β)}^{'} (\tilde{y} - \tilde{X} β) + P_{λ} (β)

(17)

where

P_{λ} (β)

denotes the penalty function based on the shrinkage (or tuning) parameter

λ .

This parameter needs to be selected with any selection criteria.

Note that the bridge estimator [50] is the vector

\hat{β}

that minimizes the Lasso and Ridge penalties in Model (17). However, SCAD, ElasticNet, aLasso, and MCP have different penalties, which are introduced later in this study. Note that

{|β_{j}|}^{q}

fulfills the

L_{q}

norm of the regression coefficients

β_{j}

because of the penalty functions used.

Hence, for various values of the shrinkage parameter

λ

, different penalty estimators that correspond to the parametric and nonparametric parts of the semiparametric model can be defined.

4.1. Estimation Procedure for the Parametric Component

According to Model (17), it is possible to obtain ridge estimates for the parametric component for

q = 2

by minimizing the following penalized residual sum of squares.

{\hat{β}}_{R K} = \underset{β}{a r g m i n} \{\sum_{i = 1}^{n} {({\tilde{y}}_{i} - {\tilde{x}}_{i}^{'} β)}^{2} + λ \sum_{j = 1}^{k} {(β_{j})}^{2}\}

(18)

where

{\tilde{x}}_{i}^{'}

is the

i

-th row of the matrix

\tilde{X}

, and

{\tilde{y}}_{i}

is the i-th observation of

\tilde{y}

. Note that the regularization estimate in solution (18) matches that in (15). Alternative methods to ridge are described below.

Lasso: The authors of [6] proposed Lasso, a regularization method that uses the

L_{1}

penalty for estimation and variable selection. The modified kernel smoothing estimators based on the Lasso penalty can be defined as

{\hat{β}}_{L K} = \underset{β}{a r g m i n} \{\sum_{i = 1}^{n} {({\tilde{y}}_{i} - {\tilde{x}}_{i}^{'} β)}^{2} + λ \sum_{j = 1}^{k} |β_{j}|\}

(19)

The absolute penalty term prevents Lasso from having an analytical solution, despite the apparent simplicity of Equation (19).

aLasso: The authors of [14] proposed using adaptive weights on

L_{1}

penalties on the regression coefficients to modify the Lasso penalty. The term “aLasso” refers to this weighted Lasso with oracle properties. The kernel smoothing estimator

{\hat{β}}_{a L K}

using aLasso penalty is defined as follows:

{\hat{β}}_{a L K} = \underset{β}{a r g m i n} \{\sum_{i = 1}^{n} {({\tilde{y}}_{i} - {\tilde{x}}_{i}^{'} β)}^{2} + λ \sum_{j = 1}^{k} {\hat{w}}_{j} |β_{j}|\}

(20)

where

w

is a weight function given by

{\hat{w}}_{j} = \frac{1}{{|{\hat{β}}^{*}|}^{q}}, q > 0

Notably,

{\hat{β}}^{*}

is an appropriate estimator of

β

. In addition, an ordinary least squares (OLS) estimate can be used as a reference value. After the OLS estimate is obtained,

q > 0

is selected, and the weights are computed to obtain the adaptive-Lasso estimates in (20).

SCAD: The Lasso method’s penalty term increases linearly with the regression coefficient’s size, leading to highly biased estimates for coefficients with large values. To address this issue, the authors of [15] proposed a SCAD penalty that can be obtained by replacing

|β_{j}|

in (18) with

P_{α, λ} |β_{j}| .

A modified kernel smoothing estimator

{\hat{β}}_{S K}

based on the SCAD penalty can be expressed as

{\hat{β}}_{S K} = \underset{β}{a r g m i n} \{\sum_{i = 1}^{n} {({\tilde{y}}_{i} - {\tilde{x}}_{i}^{'} β)}^{2} + \sum_{j = 1}^{k} P_{α, λ} |β_{j}|\}

(21)

where

P_{α, λ} (.)

is the SCAD penalty defined by

P_{α, λ} = λ \{I (|β|) \leq λ + \frac{{(α λ - |β|)}_{+}}{(α - 1) λ} I (|β| < λ)\}, f o r β \geq 0

(22)

The penalty parameters in this case are

λ > 0

and

α > 2

, where

{(t)}_{+} = m a x (t, 0)

and

I (.)

is the indicator function. When

α = \infty

, (22) is equivalent to the

L_{1}

penalty.

ElasticNet: ElasticNet is a penalized least squares regression method proposed in [26] that is widely used for regularization and automatic variable selection to choose groups of correlated variables. It combines the

L_{1}

and the

L_{2}

penalty term, where the

L_{1}

penalty term ensures sparsity, whereas the

L_{2}

penalty term selects correlated variable groups. Accordingly, the modified kernel-smoothing estimator

{\hat{β}}_{E N K}

using an ElasticNet penalty is the solution to the following minimization problem:

{\hat{β}}_{E N K} = \underset{β}{a r g m i n} \{\sum_{i = 1}^{n} {({\tilde{y}}_{i} - {\tilde{x}}_{i}^{'} β)}^{2} + λ_{1} \sum_{j = 1}^{k} {|β_{j}|}^{2} + λ_{2} \sum_{j = 1}^{k} |β_{j}|\}

(23)

In the expression above, the positive regularization parameters are

λ_{1}

and

λ_{2}

. Note that (23) ensures that the estimates correspond to the parametric part of the semiparametric regression Model (2).

MCP: The authors of [16] introduced MCP as a different method to obtain less biased estimates of nonzero regression coefficients in a sparse model. For given regularization parameters

λ > 0

and

α > 0

, the kernel smoothing estimator

{\hat{β}}_{M C K}

based on the MCP penalty can be defined as

{\hat{β}}_{M C K} = \underset{β}{a r g m i n} \{\sum_{i = 1}^{n} {({\tilde{y}}_{i} - {\tilde{x}}_{i}^{'} β)}^{2} + \sum_{j = 1}^{k} P_{α, λ} (|β_{j}|)\}

(24)

where

P_{α, λ} (.)

is the MCP penalty expressed as

P_{α, λ} (β) = \int_{0}^{|β|} {(\frac{λ - x}{α})}_{+} d x = (λ |β| - \frac{β^{2}}{2 α}) I (0 \leq |β| < λ α) + \frac{λ^{2} γ}{2} I (|β| \geq λ α)

4.2. Shrinkage Estimators

Shrinkage estimation offers a robust method for parameter estimation, especially in sparse environments where traditional methods may face high variance or multicollinearity. This section introduces standard and positive shrinkage estimators to improve accuracy and stability. Shrinkage estimators balance full model estimators

({\hat{β}}^{F M})

, which use extensive predictors, with sub model estimators

({\hat{β}}^{S M})

, which focus on fewer, significant predictors using any of the introduced penalty functions (see [32]). To better understand the shrinkage process, it is necessary to partition design matrix X into subsets that correspond to significant and less significant predictors. Let X represent the n × k design matrix, where n is the number of observations, and k is the total number of predictors. We partition X into two submatrices:

\tilde{X} = [{\tilde{X}}_{1}, {\tilde{X}}_{2}]

where

{\tilde{X}}_{1}

corresponds to the subset of significant nonzero predictors (

p_{1}

predictors) and

{\tilde{X}}_{2},

contains the remaining predictors (

p - p_{1}

) where

p = p_{1} + p_{2}

. This partitioning ensures that the shrinkage process selectively regularizes the coefficient associated with less significant predictors while preserving the interpretability of the model.

After optimizing

{\hat{β}}^{F M}

and

{\hat{β}}^{S M}

, we can define shrinkage estimator

{\hat{β}}_{S}

and the positive part of the shrinkage estimator to avoid the over-shrinking of the regression coefficients

{\hat{β}}_{P S}

as follows:

{\hat{β}}_{S} = {\hat{β}}^{F M} - \frac{(p_{2} - 2)}{\hat{T}} ({\hat{β}}_{F M} - {\hat{β}}_{S M})

(25)

{\hat{β}}_{P S} = {\hat{β}}^{F M} - {[\frac{(p_{2} - 2)}{\hat{T}}]}_{1} ({\hat{β}}_{F M} - {\hat{β}}_{S M})

where

{(η)}_{1} = \max [η, 1]

,

p_{2}

represents the number of sparse parameters that are detected by the penalty functions to estimate

{\hat{β}}^{S M}

,

\hat{T}

is the distance measure which can be defined as

\hat{T} = σ^{- 2} n {\hat{β}}_{2}^{'} ({\tilde{X}}_{2}^{'} \tilde{U} {\tilde{X}}_{2}) {\hat{β}}_{2}

with sparse subset of regression coefficients denoted by

{\hat{β}}_{2}

, model variance

σ^{2}

,

{\tilde{X}}_{2}

matrix of partial residuals of associated predictors with

{\hat{β}}_{2}

and the projection matrix

\tilde{U} = I_{n} - [{\tilde{X}}_{1} {({\tilde{X}}_{1}^{'} {\tilde{X}}_{1} + λ_{s} I_{p_{1}})}^{- 1} {\tilde{X}}_{1}^{'}]

where

{\tilde{X}}_{1}

is the partial residuals associated with the

p_{1}

significant coefficients. Finally,

λ_{s}

is again the shrinkage parameter. Some of the advantages of using shrinkage estimators are as follows. Introducing the bias through the shrinkage, these estimators reduce variance and leading in the smaller Mean Squared Error (MSE) values, which means better prediction accuracy. In addition, shrinkage techniques shrink the coefficients in a controlled manner, which preserves interpretability.

Building upon the Shrinkage Estimation framework introduced above, we integrate both standard and Positive Shrinkage Estimators into the penalty estimation strategies. These estimators provide an additional layer of regularization, enhancing the estimation accuracy for sparse models. The estimation procedure involves the following.

Parametric Component Estimation: We utilize the respective penalty functions (Ridge, Lasso, aLasso, SCAD, ElasticNet, MCP) to estimate the parameter vector $β$ .
Shrinkage Application: We apply the shrinkage techniques to obtain ordinary stein-type shrinkage ${\hat{β}}_{S}$ or positive shrinkage estimator ${\hat{β}}_{P S}$ to refine the parameter estimates.
Nonparametric Component Estimation: We estimate the smooth function $f (t)$ using the refined parameter estimates via kernel smoothing as described in Equation (25).

This comprehensive approach leverages the strengths of both penalty functions and shrinkage estimators, offering a robust estimation framework for PLRM. The incorporation of shrinkage estimators offers several advantages. By introducing a controlled bias, these estimators reduce the variance of parameter estimates, leading to more stable and reliable predictions. Additionally, they enhance model interpretability by focusing on significant predictors while mitigating the effects of multicollinearity. Moreover, the positive shrinkage estimator provides a practical safeguard against excessive regularization, striking a balance between bias and variance.

4.3. Estimation Procedure for the Nonparametric Component

For the parametric part of the semiparametric model in (2), modified kernel smoothing estimates based on various penalties are provided by Equations (18)–(24). Analogous to that of (16), the estimation of the nonparametric part of the same model can be constructed via the vector of estimated parametric coefficients

{\hat{β}}_{L K}

provided in (19). On the basis of the Lasso penalty of the unknown function, we obtain the transformed local estimates as follows:

{\hat{f}}_{L K} = W_{h} (y - X {\hat{β}}_{L K})

(26)

as described in the previous section. Importantly, local estimates of the nonparametric component based on the adaptive-Lasso penalty are derived and are symbolically given as

{\hat{f}}_{a L K}

in Equation (26), where

{\hat{β}}_{a L K}

, defined in (20), is written in place of

{\hat{β}}_{L K}

. Similarly, for the nonparametric part of the semiparametric Model (2), the modified local kernel estimators

{\hat{f}}_{S K}

,

{\hat{f}}_{E N K},

, and

{\hat{f}}_{M C K}

based on the SCAD, ElasticNet, and MCP penalties, respectively, are obtained by replacing

{\hat{β}}_{L K}

in (26) with

{\hat{β}}_{S K},

,

{\hat{β}}_{E N K}

, and

{\hat{β}}_{M C K}

.

To better understand the adaptation of the penalty functions to the estimation process of the partially linear Model (1), a generic algorithm is introduced. As mentioned previously, to obtain a specific estimator on the basis of a determined penalty function, the corresponding penalty term is used in Algorithm 1. Therefore, the output estimators are denoted as

(\hat{β}, \hat{f})

to generalize the algorithm. For example, if we obtain the Lasso estimator of the parametric components in Model (1), first,

{\hat{β}}_{L K}

is found to minimize (19), then

{\hat{f}}_{L K}

is obtained by using the vector of estimated parametric coefficients

{\hat{β}}_{L K}

. A similar path is followed for the remaining estimators. The algorithm describing their findings is as follows:

Algorithm 1. Computation of penalty estimators.

Input: Data matrix of parametric component

X \in R^{n \times k}

, data vector

t_{i} \in R^{n \times 1}

, and response vector

y \in R^{n \times 1}

.
Output: Pair of estimates

(\hat{β}, \hat{f})

based on a certain penalty function
1: Select an appropriate bandwidth

h

using a predetermined criterion and compute the smoother matrix

W_{h}

, as defined in (6).
2: Compute the partial residuals

\tilde{X} = (I_{n} - W_{h}) X

and

\tilde{y} = (I_{n} - W_{h}) y .

3: To minimize

P L S (β, f),

determine the shrinkage parameter

λ

by a predetermined criterion.
4: Partition the partial residuals of

X

in form

\tilde{X} = ({\tilde{X}}_{1}, {\tilde{X}}_{2})

, as defined in Section 4.2.
5: Apply shrinkage estimators

{\hat{β}}^{P S}

, and

{\hat{β}}^{S}

based on used penalty functions
6: Find the estimate of parametric component

\hat{β}

associated with the

\tilde{X}

contains and

\tilde{y} .

7: Estimate the nonparametric smooth function

f

as follows:

\hat{f} = W_{h} (y - X \hat{β})

8: Return

(\hat{β}, \hat{f})

.

5. Measuring the Quality of Estimators

This section describes several performance metrics that can be used to assess the performance of the modified semiparametric kernel smoothing estimators based on the penalty functions and shrinkage estimators defined in Section 4. Note that the estimators are indicated by the acronyms in parentheses. Each estimator’s performance for the parametric component, nonparametric component, and overall estimated model is examined. Consequently, the performance metrics are explained in the following section.

5.1. Evaluation of the Parametric Component

The performance of a method is related to its ability to estimate the data. Note that the bias and variance of a penalty estimator

{\hat{β}}_{S}

are measured simultaneously via the mean squared error

(M S E)

matrix. Some metrics are defined in the following section.

MSE: This value is calculated by adding the variance and squared bias, which is provided by

M S E ({\hat{β}}_{S}, β) = E [{(β - {\hat{β}}_{S})}^{'} (β - {\hat{β}}_{S})] = V a r ({\hat{β}}_{S}) + {(E ({\hat{β}}_{S}) - β)}^{2}

(27)

where

{\hat{β}}_{S}

is the vector of the estimated regression coefficients determined by either of the methods. The key information regarding the quality standards for estimation is provided by Equation (19). To evaluate the risk more accurately via the MSE, a scalar variant of the MSE (SMSE) is utilized, which yields an estimator’s average quality score:

S M S E ({\hat{β}}_{S}, β) = E [{(β - {\hat{β}}_{S})}^{'} (β - {\hat{β}}_{S})] = t r [(M S E ({\hat{β}}_{S}, β))]

(28)

Additionally, the relative efficiencies of the methods for estimating

β

can be easily obtained via (28). To do this, the relative efficiency (RE) is defined in Definition 1.

Definition 1.

Let

S_{1}

and

S_{2}

denote the two distinct estimation methods for

β

, and the estimators are shown as

{\hat{β}}_{S}_{1}

and

{\hat{β}}_{S}_{2}

, respectively. The formula for RE is as follows:

R E ({\hat{β}}_{S}_{1}, {\hat{β}}_{S}_{2}) = S M S E ({\hat{β}}_{S}_{1}, β) / S M S E ({\hat{β}}_{S}_{2}, β)

(29)

Here,

R E ({\hat{β}}_{S}_{1}, {\hat{β}}_{S}_{2}) > 1

means that

{\hat{β}}_{S}_{2}

is more efficient than

{\hat{β}}_{S}_{1}

.

Root mean squared error (

R M S E

): The RMSE is simply the square root of the MSE. The formula for calculating the estimated regression coefficients

(\hat{β})

is as follows:

R M S E (β, {\hat{β}}_{S}) = \sqrt{M S E ({\hat{β}}_{S}, β)}

(30)

where

{\hat{β}}_{S}

is the estimate of

β

obtained via one of the methods described in this paper, as defined in (28). To obtain the RMSE score for each estimator,

{\hat{β}}_{S}

is replaced with

{\hat{β}}_{R K}, {\hat{β}}_{L K}, {\hat{β}}_{a L K}, {\hat{β}}_{S K},

{\hat{β}}_{E N K}

and

{\hat{β}}_{M C K}

.

5.2. Evaluation of the Nonparametric Component

To evaluate the nonparametric part of Model (1),

M S E

can also be used. Suppose that

\hat{f}

is any of the estimators of the nonparametric component in Model (1). That is, let us assume that it is equivalent to one of the estimators

{\hat{f}}_{R K}, {\hat{f}}_{L K}, {\hat{f}}_{a L K}, {\hat{f}}_{S K},

{\hat{f}}_{E N K}

and

{\hat{f}}_{M C K}

defined in Section 3. Therefore, the MSE is calculated as follows:

M S E (f, \hat{f}) = n^{- 1} \sum_{i = 1}^{n} {(f (t_{i}) - \hat{f} (t_{i}))}^{2} = n^{- 1} {‖(f - \hat{f})‖}^{2}

(31)

To compare how well each of the six methods estimates the nonparametric component, the relative

M S E

(

R M S E

) is used. The formula for

R M S E

is computed as follows:

R M S E ({\hat{f}}_{i}) = n_{m}^{- 1} [\sum_{j = 1}^{(n_{m} - 1)} M S E ({\hat{f}}_{i}) / M S E ({\hat{f}}_{j})], i \neq j

(32)

6. Asymptotic Analysis

Suppose that

β_{1}^{*} \neq 0

and

β_{2}^{*} = 0

is the estimate of

β = (β_{1} \neq 0, β_{2} = 0)

, where

β_{1}^{*} \neq 0

is any of the proposed estimators, which are Ridge, Lasso, aLasso, SCAD, ElasticNet, and MCP. This section defines the asymptotic distributional risk (ADR) of our full model and selected model estimators. Our primary objective is the performance of these estimators when

β_{2} = 0

; for this reason, we consider a sequence

\{K_{n}\}

given by

K_{n} : β_{2} = β_{2 (n)} = \frac{w}{\sqrt{n}}, w = {(w_{1}, . . ., w_{p 2})}^{'} \in R^{p 2}

Now, using a positive definite matrix (p.d.m.) W, we define a quadratic loss function as follows:

L (β_{1}^{*}) = n {(β_{1}^{*} - β_{1})}^{'} W (β_{1}^{*} - β_{1}),

Now, under

K_{n}

, we can define the asymptotic distribution function of

β_{1}^{*}

as

F (x) = \lim_{n \to \infty} P (\sqrt{n} (β_{1}^{*} - β_{1}) \leq x | K_{n}),

where

F (x)

is nondegenerate. Hence, we can define the ADR of

β_{1}^{*}

as follows.

ADR (β_{1}^{*}) = t r (W \int_{R^{P 1}} \int {x x}^{'} d F (x)) = t r (W V)

where V is the dispersion matrix for the distribution

F (x)

.

Assumption 1.

We establish two regularity conditions as follows:

(i).: $\frac{1}{n} \max_{0 \leq x \leq 1} {\tilde{x}}_{i}^{'} {({\tilde{X}}_{i}^{'} \tilde{X})}^{- 1} {\tilde{x}}_{i} \to 0 a s n \to$ $\infty$ where ${\tilde{x}}_{i}^{'}$ is the ith row of $\tilde{X}$
(ii).: $\frac{1}{n} \sum_{i = 1}^{n} {\tilde{X}}_{i}^{'} \tilde{X} \to \tilde{Q,}$ where $\tilde{Q,}$ is a finite positive-definite matrix.

By Lemma 1, which is defined below, with assumed regularity conditions and local alternatives, the ADRs of the estimators under

\{K_{n}\}

are given as:

Theorem 1.

ADR (

{\hat{β}}_{1}^{F M}

) =

σ^{2} t r (W {\tilde{Q,}}_{11.2}^{- 1}) + η_{11.2}^{'} W η_{11.2},

ADR

({\hat{β}}_{1}^{S M}

) =

σ^{2} t r (W {\tilde{Q,}}_{11}^{- 1})

+

ξ' W ξ

.

Lemma 1 enables us to derive the result of Theorem 1 in this study.

Lemma 1.

i f k / \sqrt{n} \to λ_{0} \geq 0 a n d \tilde{Q,} i s n o n - s i n g u l a r, t h e n

\sqrt{n} ({\hat{β}}_{1}^{F M} - β) \overset{d}{\to} N (- λ_{0} {\tilde{Q}}^{- 1} β, σ^{2} {\tilde{Q}}^{- 1}) .

where “

\overset{d}{\to}

” denotes the convergence in the distribution. See the proof in [2].

7. Simulation Studies

In this section, we conduct a thorough simulation study to assess the finite-sample performance of the aforementioned six semiparametric estimators proposed for a partially linear model. These estimators are compared to each other to evaluate their performance. R-software [51] was used for all calculations. A description of the simulation design and data generation is as follows.

Design of the Simulation: The simulations are carried out as follows:

i.: Samples of size $n = 50, 100 a n d 200$
ii.: Two numbers of parametric covariates $k = 25$ and 40
iii.: Two correlation levels $ρ (r h o) = 0.5 a n d 0.90$
iv.: The number of replications is 1000.

In addition, we generated zero and nonzero coefficients of the model to enable our absolute penalty functions to select significant variables. As mentioned above, each possible simulation design was repeated 1000 times to detect significant findings. Finally, the evaluation metrics mentioned in Section 5 were used to assess the performance of the proposed estimators.

Data generation: We generated the explanatory variables

x_{i} = x_{i j} = (x_{i 1}, \dots, x_{i k})

via the following equation:

x_{i j} = {(1 - ρ^{2})}^{1 / 2} x_{i j} + ρ x_{i, k + 1}, i = 1,2, \dots, n, j = 1,2, \dots, k

where

x_{i j}

are independent standard normal pseudorandom numbers and

ρ

represents the level of collinearity among the explanatory variables. The two levels of collinearity assumed in this study are 0.50 and 0.90, as denoted above. These variables are standardized so that

X' X

and

X' y

are in correlation forms. The

n

observations for the response variable

y_{i}

are generated by

y_{i} = x_{i}^{'} β + f (t_{i}) + ε_{i}, 1 \leq i \leq n

where

β = {(β_{1}, β_{2}, β_{3}, β_{4}, β_{5}, {β_{6}, \dots, β}_{25})}^{T} = {(- 1, 2, 0.5, 3, - 2, 0, 0, \dots, 2)}^{T}

, the nonparametric function

f (t_{i}) = {- t}_{i} \sin (t_{i})

with

t_{i} ~ U (0,1)

, and

ε_{i} ~ N (0,1)

. It is understood from this model that when

k = 25

explanatory variables are considered, there are 6 nonzero

β_{j}

s to be estimated and

(k - 6) = 19

sparse coefficients. If

k = 40

explanatory variables are considered, there will be 6 nonzero

β_{j}

s to be estimated and

(k - 6) = 34

sparse coefficients.

The Monte Carlo simulation approach involves repeatedly generating datasets according to the specified Partially Linear Regression Model (PLRM) under different conditions (sample size, correlation). For each generated dataset, all estimation strategies are applied, and their performance metrics are calculated. This process is replicated 1000 times with different random errors to ensure stable results, allowing us to assess the average performance and variability of the estimators under the controlled simulation settings.

In this study, to obtain the optimal value of

λ

, we consider modified cross-validation (CV), which is generalized cross-validation (GCV). The main idea of the

G C V

criterion is to replace the factor

(1 - {(W_{h})}_{i i})

with the mean value, that is,

(1 - n^{- 1} t r (W_{h}))

. In this case, the

G C V

value is obtained as follows:

G C V (h) = \frac{n {‖(I - W_{h}) y‖}^{2}}{{[t r (I - W_{h})]}^{2}} = \frac{n {‖(I - W_{h}) y‖}^{2}}{{[n - t r (W_{h})]}^{2}}

(33)

where

t r (W_{h})

represents the sum of the diagonal values of the matrix

W_{h}

, that is, its trace. Finally, to obtain the optimum shrinkage parameter

λ

in the objective function given in (17), we use a five-fold cross-validation method.

In this simulation study, we analyze 12 different simulation configurations and attempt to display all of them. In this sense, the simulation study results are shown separately for the parametric and nonparametric components.

7.1. Analysis of the Parametric Component

This subsection examines the estimation of the parametric component of a semiparametric model, with the results shown in the following tables for all simulations. In addition to the estimation, the performance of the estimators is evaluated via the performance metrics mentioned in Section 5. The results are presented in the following tables and figures.

Table 1, Table 2 and Table 3 detail the performance scores (MSE, RMSE, RE, SMSE) for the parametric component across different simulation setups.

In terms of general observations that can be concluded from the tables, as sample size (n) increases, the MSE and RMSE values generally decrease for all estimators, indicating improved accuracy. Higher multicollinearity (

ρ

= 0.9) tends to worsen the performance of all estimators compared to lower multicollinearity (

ρ

= 0.5). The MCP and SCAD penalty functions, along with the positive shrinkage estimator (

{\hat{β}}_{P S}

) and shrinkage estimator (

{\hat{β}}_{S}

), often demonstrate superior performance in terms of lower MSE and RMSE, especially in scenarios with high multicollinearity or smaller sample sizes.

In detail, Ridge (

{\hat{β}}_{R K}

) performs relatively poorly compared to other methods, particularly when multicollinearity is high. The Lasso (

{\hat{β}}_{L K}

) and aLasso (

{\hat{β}}_{a L K}

) methods show moderate performance. aLasso tends to be slightly better than Lasso. ElasticNet (

{\hat{β}}_{E N K}

) performs similarly to Lasso and aLasso. MCP (

{\hat{β}}_{M C K}

) and SCAD (

{\hat{β}}_{S K}

) often exhibit the best performance among the penalty methods, with lower MSE and RMSE values, especially in challenging scenarios. Shrinkage estimators (

{\hat{β}}_{P S}

and

{\hat{β}}_{S}

) consistently perform well, often outperforming the other methods, especially when the sample size is small and multicollinearity is high.

Figure 1 illustrates how the Mean Squared Error (MSE) of the parametric component estimators changes with varying levels of multicollinearity (

ρ

). Notably, as multicollinearity increases, so does the MSE for all estimators, which aligns with the data presented in Table 1, Table 2 and Table 3. However,

{\hat{β}}_{M C K}

,

{\hat{β}}_{S K}

, and the shrinkage estimators, particularly the

{\hat{β}}_{P S}

, consistently exhibit lower MSE values compared to

{\hat{β}}_{R K}

,

{\hat{β}}_{L K}

,

{\hat{β}}_{a L K}

, and

{\hat{β}}_{E N K}

. This trend is particularly pronounced when multicollinearity is high (ρ = 0.9). The superior performance of MCP, SCAD, and shrinkage estimators under these challenging conditions is further emphasized by the relatively flat lines they exhibit in Figure A3 in Appendix A.3, suggesting robustness to an increasing number of predictors.

7.2. Analysis of the Nonparametric Component

This subsection examines the estimation of the nonparametric component of a semiparametric model, with the results shown in Table 4 for all simulations. Furthermore, the performance scores of the proposed methods are measured via the MSE and RMSE metrics. In addition, the estimated curves obtained from all methods are examined using all the distinct configurations in Figure 2 and Figure 3.

Table 4 presents the performance scores for the nonparametric component estimators in a similar format to Table 1, Table 2 and Table 3, which focuses on the parametric component. This table would quantify the accuracy and efficiency of each estimator in estimating the nonparametric function

f (t)

under various simulation scenarios, varying sample size (

n

), number of predictors (

p

), and multicollinearity levels (

ρ

).

Based on the trends observed in Figure 2, Figure 3, Figure A1 and Figure A4, and the consistent findings in Table 1, Table 2 and Table 3, it is highly probable that Table 4 would further demonstrate the superior performance of

{\hat{f}}_{M C K}

,

{\hat{f}}_{S K}

, and the shrinkage estimators (

{\hat{f}}_{S}

and

{\hat{f}}_{P S}

), particularly in scenarios with high multicollinearity and smaller sample sizes. These estimators would likely exhibit lower MSE and RMSE values compared to others, indicating greater accuracy in estimating the nonparametric function and potentially higher RE values, suggesting greater efficiency relative to a reference estimator like

{\hat{f}}_{R K}

. The data in Table 4 provide the numerical backbone to support the visual observations in Figure 2 and the trends shown in Figure 3,Figure A1 and Figure A4, solidifying the conclusion that

{\hat{f}}_{M C K}

,

{\hat{f}}_{S K}

,

{\hat{f}}_{S}

, and

{\hat{f}}_{P S}

offer significant advantages when estimating the nonparametric component of partially linear models.

Figure 2 provides a visual representation of how well each estimator captures the true nonparametric function,

f (t) .

When examining the fitted curves, especially under conditions of high multicollinearity (

ρ = 0.9

) and small sample size (

n = 50

), it becomes apparent that MCP (

{\hat{f}}_{M C K}

), SCAD

{\hat{f}}_{S K}

, and the shrinkage estimators (

{\hat{f}}_{P S} a n d {\hat{f}}_{S}

) produce curves that align more closely with the true function (represented by the black line) than those generated by Ridge (

{\hat{f}}_{R K}

), Lasso (

{\hat{f}}_{L K}

), aLasso (

{\hat{f}}_{a L K}

), and ElasticNet (

{\hat{f}}_{E N K}

). As the sample size increases, the differences become less visually striking, yet the advantage of

{\hat{f}}_{M C K}

,

{\hat{f}}_{S K}

, and shrinkage estimators persists, particularly when multicollinearity is present. This visual observation is consistent with the quantitative data shown in Figure 3, where these estimators consistently exhibit lower MSE values for the nonparametric component.

Figure 3 complements Figure 2 by quantifying the MSE of the nonparametric component against varying levels of multicollinearity. The trend observed mirrors that of Figure 1, i.e., MSE increases with multicollinearity for all estimators. However,

{\hat{f}}_{R K}

, consistently displays the highest MSE, while

{\hat{f}}_{M C K}

,

{\hat{f}}_{S K}

, and the shrinkage estimators maintain lower MSE values across the board. This quantitative evidence corroborates the visual findings of Figure 2, demonstrating that

{\hat{f}}_{M C K}

,

{\hat{f}}_{S K}

, and shrinkage estimators (

{\hat{f}}_{S}

and

{\hat{f}}_{P S}

,) provide more accurate estimates of the nonparametric function, especially when faced with correlated predictors. The relatively stable RMSE values of these better-performing estimators, even as the number of predictors increases, are shown in Figure A4 and further support their robustness in Appendix A.3.

8. Real Data

In this section, we implement our proposed six estimators, i.e., Ridge, Lasso, aLasso, SCAD, ElasticNet, and MCP, on real data. Our real data, Hitter’s dataset, can be obtained from the ISLR package in R. This dataset contains 322 rows and 20 variables. Three covariates, Division, League, and New League, which are not scalers, were deleted. The CN value obtained from this dataset is 5830, which indicates the presence of multicollinearity in the dataset. The remaining variables used for the estimation of a partially linear model are

S a l a r y

,

A t b a t, R u n s, R B I, W a l k s, C R B I

,

C w a l k s

,

C H i t s, C H m r u n

,

H i t s, H m r u n,

C a t B a t, C r u n s,

and

P u t s O u t s

. In addition, the performance of the estimators is compared, and the results are shown in Table 5 and Figure 4 and Figure 5. Note that the logarithm of Salary is used as a response variable. From the visual inspection of the relationships between covariate and the response variable, the variable “

Y e a r s

” has a significant nonlinear relationship with the response

l o g (s a l)

. The remaining 15 predictor variables are added to the parametric component of the model. Hence, the partially linear model can be written as follows:

l o g ({s a l}_{i}) = X_{i}^{T} β + f ({y e a r s}_{i}) + ε_{i}, 1 \leq i \leq 263

(34)

where

X_{i}^{T} = [\begin{matrix} {A t B a t}_{i}, {H i t s}_{i}, {H m R u n}_{i}, {R u n s}_{i}, {R B I}_{i}, {W a l k s}_{i}, {C A t B a t}_{i} {, C H i t s}_{i}, {C H m R u n}_{i} {C R u n s}_{i}, \\ {C R B I}_{i}, {C W a l k s}_{i}, {P u t O u t s}_{i}, {A s s i s t s}_{i}, {E r r o r s}_{i} \end{matrix}]

Therefore, there is an

(263 \times 15)

-dimensional covariate matrix for the parametric component of the model, and

β = {(β_{1}, \dots, β_{15})}^{T}

is the

(15 \times 1)

-dimensional vector of the regression coefficients to be estimated.

Table 5 quantifies the performance of each estimator on the Hitters dataset using RMSE, MSE, and RE, all based on the prediction of log(Salary). The Shrinkage and Positive Shrinkage estimators stand out with the lowest RMSE and MSE values of 0.413 and 0.171, and 0.405 and 0.164, respectively. This indicates that these two methods provide the most accurate predictions for log(Salary) on this dataset. Their superior performance is further underscored by their RE values, 1.950 and 2.025, respectively, which demonstrate that they are almost twice as efficient as the Ridge estimator, which has an RE of 1.000 and performs the worst on this metric.

aLasso also performs well, with an RMSE of 0.443, MSE of 0.196, and RE of 1.697, outperforming Lasso, SCAD, ElasticNet, and MCP. Lasso, SCAD, ElasticNet, and MCP exhibit moderate performance, with RMSE values ranging from 0.474 to 0.512 and RE values between 1.270 and 1.484. These results are consistent with the simulation study, which suggested that shrinkage estimators, particularly positive shrinkage, and aLasso are effective in handling multicollinearity and achieving good prediction accuracy.

Figure 4 visually depicts the estimated nonparametric function f

(Y e a r s)

for each estimator, revealing how they model the nonlinear relationship between a player’s standardized years of experience and their

l o g (S a l a r y)

. The curves generally capture an initial increase in log(Salary) as years of experience increase, followed by a plateau or a slight decrease for players with more years. Notably, the curves produced by Ridge, MCP, and the shrinkage estimators are relatively similar and smooth, suggesting a consistent estimation of the underlying nonlinear trend. Lasso, aLasso, and ElasticNet exhibit some deviations from this trend, particularly in the middle range of the “Years” variable, indicating potential sensitivity to noise or specific data points. SCAD produces a curve that is very different from the others. The black dots, representing the observed data points adjusted for the parametric component’s effect, provide a visual reference for assessing the fit of each curve. The variation in the shapes of the curves highlights the influence of the chosen estimation method on the estimated nonparametric relationship, emphasizing the importance of considering different estimators and their potential impact on the interpretation of the results.

Figure 5 provides a visual comparison of the predicted log(Salary) values against the actual

l o g (S a l a r y)

values for each estimator, offering a direct assessment of their predictive accuracy. The points for Ridge, MCP, and the shrinkage estimators are more closely clustered around the diagonal line (

y = x

), indicating that their predictions are generally closer to the actual values. This observation aligns with the lower RMSE and MSE values reported for these estimators in Table 5. Conversely, Lasso, aLasso, and ElasticNet exhibit a wider spread of points around the diagonal, suggesting greater prediction errors and confirming their higher RMSE and MSE values in the table. The plot for SCAD has a different pattern that does not align with the diagonal. Overall, Figure 5 visually reinforces the quantitative findings in Table 5, demonstrating that Ridge, MCP, and particularly the shrinkage estimators achieve better predictive performance on the Hitters dataset, while Lasso, aLasso, and ElasticNet show relatively weaker performance. This visualization underscores the practical implications of estimator selection for achieving accurate predictions in partially linear models.

9. Conclusions

This paper focuses on the realm of semiparametric regression, specifically focusing on Partially Linear Regression Models (PLRMs) that incorporate both linear and nonlinear components. It considers a comprehensive review of six penalty estimation strategies: Ridge, Lasso, aLasso, SCAD, ElasticNet, and MCP. Recognizing the challenges posed by multicollinearity and the need for robust estimation in sparse models, we further introduce Stein-type shrinkage estimation techniques. The core of the study lies in evaluating the performance of these methods through both theoretical analysis and empirical investigation. A kernel smoothing technique, grounded in penalized least squares, is employed to estimate the semiparametric regression models.

The theoretical contributions include a partly asymptotic analysis of the proposed estimators, providing insights into their long-term behavior and robustness. The empirical investigation encompasses a simulation study that examines the estimators’ performance under various conditions, including different sample sizes, numbers of predictors, and levels of multicollinearity. Finally, the practical applicability of these methods is demonstrated through a real data example using the Hitters dataset, where the estimators are used to model baseball players’ salaries based on their performance metrics. The paper concludes that aLasso and shrinkage estimators exhibit superior performance in terms of prediction accuracy and efficiency in the presence of multicollinearity, especially the positive shrinkage, performing even better than aLasso.

In detail, the following points can be emphasized in terms of theoretical inferences, simulation and real data studies:

The paper establishes the asymptotic properties of the proposed estimators, including Ridge, Lasso, aLasso, SCAD, ElasticNet, MCP, and the Stein-type shrinkage estimators, providing a theoretical foundation for their use in PLRMs.
The theoretical results highlight the advantages of aLasso and shrinkage estimation, particularly in scenarios with high multicollinearity and sparsity.
The simulation study demonstrates that aLasso and the shrinkage estimators, especially the positive shrinkage estimator, consistently outperform other methods in terms of lower Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) for both the parametric and nonparametric components of the PLRM.
The superior performance of aLasso and shrinkage estimators is more pronounced when the sample size is small and multicollinearity is high, confirming their robustness in challenging conditions.
MCP and SCAD also exhibit strong performance in the simulations, often outperforming Ridge, Lasso, and ElasticNet, particularly when multicollinearity is present.
The simulation results reveal that the choice of estimator can significantly impact the estimation of the nonparametric function, with aLasso and shrinkage estimators generally producing smoother and more accurate curves.
The analysis of the Hitters dataset confirms the practical advantages of aLasso and shrinkage estimation, particularly positive shrinkage, in a real-world scenario with multicollinearity, as indicated by the high condition number.
The shrinkage and aLasso estimators achieve the lowest RMSE and MSE values when predicting log(Salary), demonstrating their superior predictive accuracy compared to Ridge, Lasso, SCAD, ElasticNet, and MCP.
The fitted nonparametric curves for the “Years” variable reveal interesting differences in how each estimator captures the nonlinear relationship between experience and salary, with aLasso and shrinkage estimators providing a balance between flexibility and smoothness.
The real data results align with the findings of the simulation study, further supporting the use of aLasso and shrinkage estimation, especially positive shrinkage, in PLRMs when multicollinearity is a concern. Also, SCAD has unexpected results which needs more investigation.

While this study provides valuable theoretical and empirical insights into the performance of various penalty and shrinkage estimators for partially linear models, several limitations should be acknowledged. The simulation study uses a GCV for the bandwidth, which might not be the best choice. The analysis is also limited to a single real-world dataset (Hitters), and the simulation study, while comprehensive, does not cover all possible scenarios, such as different error distributions.

The computational cost of some estimators, particularly SCAD and MCP, is not explicitly addressed. Additionally, the choice of tuning parameters for the penalized estimators is obtained based on CV criterion, whereas in practice, these need to be carefully selected, potentially impacting the results. The paper’s theoretical results are based on regularity conditions that might not always hold in real-world applications. While shrinkage estimators are introduced and applied, a more in-depth investigation into their properties and performance would be beneficial. These limitations highlight the need for further research and careful consideration when applying these methods in practice.

Author Contributions

Conceptualization, S.E.A. and D.A.; methodology, S.E.A. and D.A.; software, A.J.A.; validation, A.J.A., D.A. and E.Y.; formal analysis, A.J.A.; investigation, A.J.A. and E.Y.; resources, S.E.A. and D.A.; data curation, A.J.A.; writing—original draft preparation, A.J.A. and E.Y.; writing—review and editing, A.J.A., E.Y., S.E.A. and D.A.; visualization, A.J.A.; supervision, S.E.A. and D.A.; project administration, S.E.A. and D.A.; funding acquisition, S.E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available dataset has been used for the paper.

Acknowledgments

This paper is inspired by the Master’s Thesis of Ayuba Jack Alhassan. The research of S. Ejaz Ahmed was supported by the Natural Sciences and the Engineering Research Council (NSERC) of Canada.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

The most important step in obtaining the ridge penalty-based kernel smoothing estimator is to calculate the partial residuals and minimize (13). From that, as mentioned before,

\tilde{X} = (I - W) X

and

\tilde{y} = (I - W) y

are calculated, and the minimization of (13) is realized on the basis of response variable

\tilde{y}

as follows:

P L S_{R K} = {(\tilde{y} - \tilde{X} β)}^{'} (\tilde{y} - \tilde{X} β) + λ β^{'} β = {\tilde{y}}^{T} \tilde{y} - {\tilde{y}}^{T} \tilde{X} β - β^{T} {\tilde{X}}^{T} \tilde{y} + β^{T} {\tilde{X}}^{T} \tilde{X} β + λ β^{'} β I f, \frac{\partial L}{\partial β} = 0 - 2 {\tilde{y}}^{T} \tilde{X} + 2 {\tilde{X}}^{T} \tilde{X} β + 2 λ β = 0 {\tilde{X}}^{T} \tilde{X} β + λ β = {\tilde{y}}^{T} \tilde{X} {\hat{β}}_{R K} = {({\tilde{X}}^{T} \tilde{X})}^{- 1} {\tilde{X}}^{T} {\tilde{y}}^{T}

Appendix A.2. Proof of Lemma 1

V_{n} (u)

can be defined as follows:

\sum_{i = 1}^{n} [{({\tilde{ϵ}}_{i} - u' {\bar{x}}_{i} / \sqrt{n})}^{2} - {\tilde{ϵ}}_{i}^{2}] + k \sum_{j = 1}^{p} [{|β_{j} + u_{j} / \sqrt{n}|}^{2} - {|β_{j}|}^{2}],

where

u = {(u_{1}, . . ., u_{p})}^{'}

. According to [52], it can be shown that

\sum_{i = 1}^{n} [{({\tilde{ϵ}}_{i} - u' {\bar{x}}_{i} / \sqrt{n})}^{2} - {\tilde{ϵ}}_{i}^{2}] \overset{d}{\to} - 2 u^{'} D + u' \tilde{Q} u

where

D ~ N (0, σ^{2} I_{P})

, with finite-dimensional convergence holding trivially. Hence,

k \sum_{j = 1}^{p} [{|β_{j} + u_{j} / \sqrt{n}|}^{2} - {|β_{j}|}^{2}] \overset{d}{\to} λ_{0} \sum_{j = 1}^{p} u_{j} s g n (β_{j}) |β_{j}| .

Hence,

V_{n} (u) \overset{d}{\to} V (u)

., because

V_{n}

is convex and

V

has a unique minimum, it yields

\arg \min (V_{n}) = \sqrt{n} ({\hat{β}}_{1}^{F M} - β) \overset{d}{\to} \arg m i n (V) .

Hence,

\sqrt{n} ({\hat{β}}_{1}^{F M} - β) \overset{d}{\to} {\tilde{Q}}^{- 1} (D - λ_{0} β) ~ N (λ_{0} {\tilde{Q}}^{- 1} β, σ^{2} {\tilde{Q}}^{- 1})

Appendix A.3. Additional Figures for Simulation and Real Data Studies

The boxplots in Figure A1 provide a detailed distributional view of the MSE for the nonparametric component. Here, we see that MCP, SCAD, and the shrinkage estimators not only have lower median MSE values but also smaller interquartile ranges, indicating greater consistency and less susceptibility to extreme errors. This is in contrast to methods like Ridge, Lasso, aLasso, and ElasticNet, which exhibit more outliers, especially under challenging conditions. These observations align with the findings presented in Figure 3 and further solidify the superior performance of MCP, SCAD, and shrinkage estimators in terms of both accuracy and precision when estimating the nonparametric part of the model.

Figure A1. Boxplots of MSE values obtained for nonparametric component of the model for all simulation configurations and all introduced estimators.

Figure A2 shifts the focus to the relative efficiency (RE) of the estimators for the parametric component, comparing them to a reference estimator, likely Ridge. The plots reveal that MCP, SCAD, and the shrinkage estimators consistently demonstrate RE values greater than 1, often significantly so, particularly when the sample size is small and multicollinearity is high. This indicates that these methods are substantially more efficient than Ridge in estimating the parametric component under such conditions, a finding that resonates with the lower MSE values these estimators exhibit in Figure 1 and Table 1, Table 2 and Table 3.

Figure A2. Plots of RE values for estimated parametric components of the model including all simulation configurations.

Figure A3 examines the impact of the number of predictors on the RMSE of the parametric component. Notably, the lines for MCP, SCAD, and the shrinkage estimators remain relatively flat as the number of predictors increases. This suggests that these methods are less sensitive to model complexity compared to Ridge, Lasso, aLasso, and ElasticNet, which show a more pronounced upward trend in RMSE, especially when multicollinearity is present. This observation reinforces the robustness of MCP, SCAD, and shrinkage estimators, aligning with their overall superior performance highlighted in previous figures and tables.

Figure A3. RMSE values for estimated parametric component against the number of parameters (p) for all simulation configurations and introduced estimators.

Figure A4. RMSE values for estimated non-parametric component against the number of parameters (p) for all simulation configurations and introduced estimators.

Finally, Figure A4 investigates the relationship between RMSE and the number of predictors for the nonparametric component. Similar to Figure A3, the plots show that MCP, SCAD, and the shrinkage estimators exhibit more stable RMSE values as the number of predictors increases, particularly compared to Ridge, Lasso, aLasso, and ElasticNet. This demonstrates the robustness of these estimators when estimating the nonparametric part of the model, even as the complexity of the parametric component grows. The findings in this figure are consistent with the lower MSE values for the nonparametric component observed for MCP, SCAD, and shrinkage estimators in Figure 3 and visually confirmed in Figure 2.

References

Engle, R.F.; Granger, C.W.; Rice, J.; Weiss, A. Semiparametric Estimates of the Relation Between Weather and Electricity Sales. J. Am. Stat. Assoc. 1986, 81, 310–320. [Google Scholar] [CrossRef]
Green, P.J.; Silverman, B.W. Nonparametric Regression and Generalized Linear Models, 1st ed.; Chapman and Hall/CRC: London, UK, 1994. [Google Scholar]
Ruppert, D.; Wand, M.P.; Carroll, R.J. Semiparametric Regression, 1st ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Speckman, P. Kernel Smoothing in Partial Linear Models. J. R. Stat. Soc. B 1988, 50, 413–436. [Google Scholar] [CrossRef]
Heckman, N.E. Spline Smoothing in a Partly Linear Model. J. R. Stat. Soc. B 1986, 48, 244–248. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H.; Zhang, H.H. On the Adaptive Elastic-Net with a Diverging Number of Parameters. Ann. Stat. 2009, 37, 1733–1751. [Google Scholar] [CrossRef]
Ahmed, S.E. Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation; Springer: New York, NY, USA, 2014. [Google Scholar]
Wu, J.; Asar, Y. On Almost Unbiased Ridge Logistic Estimator for the Logistic Regression Model. Hacet. J. Math. Stat. 2016, 45, 989–998. [Google Scholar] [CrossRef]
Ahmed, S.E.; Belaghi, R.A.; Hussein, A.; Safariyan, A. New and Efficient Estimators of Reliability Characteristics for a Family of Lifetime Distributions Under Progressive Censoring. Mathematics 2024, 12, 1599. [Google Scholar] [CrossRef]
Ahmed, S.E.; Aydın, D.; Yılmaz, E. Penalty and Shrinkage Strategies Based on Local Polynomials for Right-Censored Partially Linear Regression. Entropy 2022, 24, 1833. [Google Scholar] [CrossRef]
Yüzbaşı, B.; Arashi, M.; Ahmed, S.E. Big Data Analysis Using Shrinkage Strategies. arXiv 2017, arXiv:1704.05074. [Google Scholar]
Yüzbaşı, B.; Ahmed, S.E.; Aydın, D. Ridge-Type Pretest and Shrinkage Estimations in Partially Linear Models. Stat. Pap. 2020, 61, 869–898. [Google Scholar] [CrossRef]
Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zhang, C.H. Nearly Unbiased Variable Selection under Minimax Concave Penalty. arXiv 2010, arXiv:1002.4734. [Google Scholar] [CrossRef] [PubMed]
Aydın, D.; Ahmed, S.E.; Yılmaz, E. Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator. Entropy 2021, 23, 1586. [Google Scholar] [CrossRef]
Yilmaz, E.; Yuzbasi, B.; Aydin, D. Choice of Smoothing Parameter for Kernel-Type Ridge Estimators in Semiparametric Regression Models. REVSTAT-Stat. J. 2021, 19, 47–69. [Google Scholar]
Li, T.; Kang, X. Variable Selection of Higher-Order Partially Linear Spatial Autoregressive Model with a Diverging Number of Parameters. Stat. Pap. 2022, 63, 243–285. [Google Scholar] [CrossRef]
Sun, L.; Zhou, X.; Guo, S. Marginal Regression Models with Time-Varying Coefficients for Recurrent Event Data. Stat. Med. 2011, 30, 2265–2277. [Google Scholar] [CrossRef]
Sun, Z.; Cao, H.; Chen, L. Regression Analysis of Additive Hazards Model with Sparse Longitudinal Covariates. Lifetime Data Anal. 2022, 28, 263–281. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Yang, S.P.; Emura, T. A Bayesian Approach with Generalized Ridge Estimation for High-Dimensional Regression and Testing. Commun. Stat. Simul. Comput. 2017, 46, 6083–6105. [Google Scholar] [CrossRef]
Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer Science & Business Media: Heidelberg, Germany, 2011. [Google Scholar]
Breheny, P.; Huang, J. Coordinate Descent Algorithms for Nonconvex Penalized Regression, with Applications to Biological Feature Selection. Ann. Appl. Stat. 2011, 5, 232–253. [Google Scholar] [CrossRef] [PubMed]
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
Wang, L.; Liu, X.; Liang, H.; Carroll, R.J. Estimation and Variable Selection for Generalized Additive Partial Linear Models. Ann. Stat. 2013, 41, 712–741. [Google Scholar] [CrossRef]
Ahmed, S.E.; Doksum, K.A.; Hossain, S.; You, J. Shrinkage, Pretest and Absolute Penalty Estimators in Partially Linear Models. Aust. N. Z. J. Stat. 2007, 49, 435–454. [Google Scholar] [CrossRef]
Kubokawa, T. Highly-Efficient Quadratic Loss Estimation and Shrinkage to Minimize Mean Squared Error; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Piladaeng, J.; Ahmed, S.E.; Lisawadi, S. Penalised, Post-Pretest, and Post-Shrinkage Strategies in Nonlinear Growth Models. Aust. N. Z. J. Stat. 2022, 64, 381–405. [Google Scholar] [CrossRef]
Piladaeng, J.; Lisawadi, S.; Ahmed, S.E. Improving the performance of least squares estimator in a nonlinear regression model. In Proceedings of the Fourteenth International Conference on Management Science and Engineering Management, Chisinau, Moldova, 30 July–2 August 2020; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; Volume 1, pp. 492–502. [Google Scholar]
Ahmed, S.E.; Ahmed, F.; Yüzbaşı, B. Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data, 1st ed.; Chapman and Hall/CRC: New York, NY, USA, 2023. [Google Scholar]
Phukongtong, S.; Lisawadi, S.; Ahmed, S.E. Linear Shrinkage and Shrinkage Pretest Strategies in Partially Linear Models. E3S Web Conf. 2023, 409, 02009. [Google Scholar] [CrossRef]
Li, D.; Racine, J. Nonparametric Econometrics: Theory and Practice. In Handbook of Research Methods and Applications in Empirical Macroeconomics; Edward Elgar Publishing: Northampton, MA, USA, 2013; pp. 3–28. [Google Scholar]
Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
Zareamoghaddam, H.; Ahmed, S.E.; Provost, S.B. Shrinkage Estimation Applied to a Semi-Nonparametric Regression Model. Int. J. Biostat. 2020, 17, 23–38. [Google Scholar] [CrossRef]
Ma, W.; Feng, Y.; Chen, K.; Ying, Z. Functional and Parametric Estimation in a Semi- and Nonparametric Model with Application to Mass-Spectrometry Data. Int. J. Biostat. 2013, 11, 285–303. [Google Scholar] [CrossRef]
Asl, M.N.; Bevrani, H.; Belaghi, R.A.; Ahmed, S.E. Shrinkage and Sparse Estimation for High-Dimensional Linear Models. Adv. Intell. Syst. Comput. 2019, 1, 147–156. [Google Scholar]
Aldeni, M.; Wagaman, J.C.; Amezziane, M.; Ahmed, S.E. Pretest and Shrinkage Estimators for Log-Normal Means. Comput. Stat. 2022, 38, 1555–1578. [Google Scholar] [CrossRef]
Al-Momani, M.; Ahmed, S.E.; Hussein, A. Efficient Estimation Strategies for Spatial Moving Average Model. Adv. Intell. Syst. Comput. 2019, 1, 520–543. [Google Scholar]
Reangsephet, O.; Lisawadi, S.; Ahmed, S.E. Post selection estimation and prediction in Poisson regression model. Thail. Stat. 2020, 18, 176–195. [Google Scholar]
Reangsephet, O.; Lisawadi, S.; Ahmed, S.E. Weak signals in high-dimensional logistic regression models. In Proceedings of the Thirteenth International Conference on Management Science and Engineering Management, Ontario, ON, Canada, 5–8 August 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; Volume 1, pp. 121–133. [Google Scholar]
Reangsephet, O.; Lisawadi, S.; Ahmed, S.E. A Comparison of Pretest, Stein-Type and Penalty Estimators in Logistic Regression Model. In Proceedings of the Eleventh International Conference on Management Science and Engineering Management, Kanazawa, Japan, 28–31 July 2017; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 19–34. [Google Scholar]
Shah, M.K.; Zahra, N.; Ahmed, S.E. On the Simultaneous Estimation of Weibull Reliability Functions. In Proceedings of the Thirteenth International Conference on Management Science and Engineering Management, Ontario, ON, Canada, 5–8 August 2019. [Google Scholar]
Antoniadis, A.; Gijbels, I.; Verhasselt, A. Variable Selection in Additive Models Using P-Splines with Nonconcave Penalties. Ann. Inst. Stat. Math. 2012, 64, 5–27. [Google Scholar]
Nadaraya, E.A. On Estimating Regression. Theory Probab. Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
Watson, G.S. Smooth Regression Analysis. Sankhyā A 1964, 26, 359–372. [Google Scholar]
Yüzbaşı, B.; Ahmed, S.E.; Arashi, M.; Norouzirad, M. LAD, LASSO and Related Strategies in Regression Models. In Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Staniswalis, J.G. The Kernel Estimate of a Regression Function in Likelihood-Based Models. J. Am. Stat. Assoc. 1989, 84, 276–283. [Google Scholar] [CrossRef]
Frank, L.E.; Friedman, J.H. A Statistical View of Some Chemometrics Regression Tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
Fu, W.; Knight, K. Asymptotics for Lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar] [CrossRef]

Figure 1. MSE values of parametric components against

ρ

(multicollinearity level) for the simulation configurations.

Figure 1. MSE values of parametric components against

ρ

(multicollinearity level) for the simulation configurations.

Figure 2. Fitted curves for all simulation configurations and for all introduced estimators.

Figure 3. MSE values of fitted curves against the multicollinearity level between covariates.

Figure 4. Fitted curves of Hitter’s data for the

f ({y e a r s}_{i})

component.

Figure 4. Fitted curves of Hitter’s data for the

f ({y e a r s}_{i})

component.