Next Article in Journal
Deriving Euler’s Equation for Rigid-Body Rotation via Lagrangian Dynamics with Generalized Coordinates
Previous Article in Journal
Neuroadaptive Dynamic Surface Asymptotic Tracking Control of a VTOL Aircraft with Unknown Dynamics and External Disturbances
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research Based on High-Dimensional Fused Lasso Partially Linear Model

School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang 471023, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(12), 2726; https://doi.org/10.3390/math11122726
Submission received: 17 May 2023 / Revised: 13 June 2023 / Accepted: 14 June 2023 / Published: 16 June 2023

Abstract

:
In this paper, a partially linear model based on the fused lasso method is proposed to solve the problem of high correlation between adjacent variables, and then the idea of the two-stage estimation method is used to study the solution of this model. Firstly, the non-parametric part of the partially linear model is estimated using the kernel function method and transforming the semiparametric model into a parametric model. Secondly, the fused lasso regularization term is introduced into the model to construct the least squares parameter estimation based on the fused lasso penalty. Then, due to the non-smooth terms of the model, the subproblems may not have closed-form solutions, so the linearized alternating direction multiplier method (LADMM) is used to solve the model, and the convergence of the algorithm and the asymptotic properties of the model are analyzed. Finally, the applicability of this model was demonstrated through two types of simulation data and practical problems in predicting worker wages.
MSC:
62J05; 90C06; 90C25; 90C30

1. Introduction

With the advent of the era of big data, many fields have generated massive and complex data, such as genetic representations, biomedical image processing, tumor classification, signal processing, financial time series analysis, etc., and the research on the statistical inference of high-dimensional data has become a hot research topic in recent years. Among these, parametric regression modeling is a crucial method to study the statistical relationships between response variables and a set of covariates. However, in practical data analysis, we may not know the true model structure and can use nonparametric regression models to fit the nonlinear relationships of the variables in the data, but when the dimensionality of the covariates is large, such models can create the problem of “dimensional bane”. Therefore, in the 1980s, the semiparametric regression model, which retains the simplicity of parametric regression but has the flexibility of nonparametric regression, was proposed by Engle et al. [1] for the first time and has wider applications. As a common semiparametric regression model, the partially linear model includes three parts: the unknown regression coefficient, unknown nonparametric function, and Gaussian distribution random error. It can be expressed as
Y = X T β + g ( T ) + ε ,
where Y = ( Y 1 , Y n ) T is a real-valued response variable, and β R p is a p-dimensional unknown parameter vector. T = ( T 1 , T n ) T is a random variable valued at [ 0 , 1 ] , g ( · ) is a smooth unknown function defined on [ 0 , 1 ] , and ε = ( ε 1 , , ε n ) T is a random error.
This model combines the advantages of the multivariate linear model and nonparametric model, so it has been widely welcomed and considered by statisticians and has achieved very fruitful research results in a short period of time. The non-parametric part of the partially linear model can be estimated using spline estimation, kernel functions, etc. The nonparametric part was estimated using an N-W kernel [2], and nonparametric regression values were inserted in the nonlinear orthogonal projection of the nonparametric part to construct estimates of the parametric part. The idea of kernel smoothness was incorporated in partially linear model studies [3], and least squares was used to obtain parameter estimates. The estimation results of the two-stage spline smooth estimation method and the biased regression method for solving the parameter components were investigated [4]; under the smooth parameter is a sequence that tends to zero. The nonparametric components were estimated with penalized regression splines [5], and then robust S-estimators were used to construct the parametric components. The parameter part of the partially linear model can be estimated using regression methods such as least squares. In [6], the authors studied the two-stage algorithm of the partially linear model under orthogonal constraints. The constrained contour least squares estimation of partially linear models with high-dimensional covariates was investigated in [7].
For high-dimensional data, variable selection is a common approach. The term “high-dimensional” data means that the number of variables is much larger than the sample size. At present, there are many methods for the variable selection of high-dimensional linear models, such as LASSO [8,9], adaptive LASSO [10,11], SCAD [12], fused lasso [13,14,15], and so on. There have also been many studies on variable selection in partially linear models. In [16], the authors studied the asymptotic nature of lasso, a high-dimensional partially linear model. A new variable selection method based on gradient learning is proposed by [17]. The study in [18] proposes a partially linear model for ultra-high-dimensional prediction based on the partial correlation between the partial residuals of the response and the predicted value. Adaptive lasso penalty partial least squares estimates are constructed (see [19,20]). However, the problem of partially linear models highly correlated to adjacent variables remains to be studied. Fused lasso is a method proposed by Tibshirani [21] in feature selection for linear models for ordering in a meaningful way and has since been fully developed in linear models [22,23]. In order to better estimate the partially linear model with a high correlation between adjacent variables, the fused lasso partially linear model is proposed, which belongs to the convex optimization problem.
Traditional methods for solving convex optimization problems include the standard interior point method [24] and gradient descent method [25], but these methods are less effective with high-dimensional data. The alternate-direction multiplier method (ADMM) is particularly suitable for solving large-scale sparse optimization problems in statistics [26,27,28], but classical ADMM solving the fused lasso problem will cause one of the subproblems to not have closed-form solutions. Thus, LADMM is proposed by [14], that is, the linearization technique is introduced to linearize one of the subproblems to obtain an approximate solution. Therefore, LADMM is used to solve the fused lasso partially linear model in this paper.
The concrete structure of this paper is as follows. In Section 2, the fused lasso partially linear model is constructed, and the least squares method is used to estimate the parameters, in which the semiparametric model is converted to the parametric model by using the kernel function method. In Section 3, the algorithm of LADMM for solving the high dimensional partially linear model is designed. In Section 4, the convergence of the algorithm is given. In Section 5, the asymptotic property of the model is proved. In Section 6, two types of data are given for numerical simulation, and the applicability of the model is analyzed. In Section 7, the method proposed in this article is applied to the actual problem of worker wages, verifying its effectiveness.

2. Construction of the Fused Lasso Partially Linear Model

In this section, we construct the fused lasso partially linear model for parameter estimation. Firstly, assuming the parameters are known, the nonparametric part is estimated by the kernel function method, the partially linear model is converted into a parametric model, and then the fused lasso partially linear model is constructed.
Firstly, some basic assumptions need to be made [3].
1.
The support set of the kernel function K ( · ) is [ 1 , 1 ] , and there are constants 0 Q 1 Q 2 satisfying Q 1 K ( u ) Q 2 , K ( u ) d u = 1 , u K ( u ) d u = 0 , u 2 K ( u ) d u 0 .
2.
t r ( K K ) = t r ( K ) = O p ( h 1 ) .
3.
g 2 2 = O p ( n h 4 ) , g ( · ) is the nonparametric term.
4.
h = O p ( n 1 5 ) , where the bandwidth h is a constant sequence.
Assuming ( Y 1 ; X 1 T ; T 1 ) , , ( Y n ; X n T ; T n ) is an independent homogeneous sample of the model, denote X i = ( X i 1 , X i p ) T , β = ( β 1 , β p ) T ; then, Equation (1) can be written as
Y i = X i T β + g ( T i ) + ε i i = 1 , 2 , , n .
Assuming β is known, Equation (2) can be written as a nonparametric regression model:
Y i X i T β = g ( T i ) + ε i i = 1 , 2 , , n .
The nonparametric term g ( · ) can be estimated using the kernel function method
g n ( t , β ) = j = 1 p W n j ( t ) ( Y j X j T β ) ,
where W n j ( t ) = K h ( T j t ) K h ( T j t ) / k = 1 n K h ( T k t ) is a non-negative weight function satisfying 0 W n j ( t ) 1 , j = 1 n W n j ( t ) = 1 , K ( · ) is a non-negative kernel function, and the bandwidth h is a constant sequence that converges to zero. Thus, we have
Y i X i T β = j = 1 n W n j ( t ) ( Y j X j β ) + ε i .
Let
X ^ i = X i j = 1 n W n j ( T j ) X j , Y ^ i = Y i j = 1 n W n j ( T j ) Y j ,
and substituting g n ( t , β ) into g ( T i ) in Equation (3) yields a parameter estimation model:
Y ^ i = X ^ i β + ε ^ i i = 1 , 2 , , n .
For the variable selection method of high-dimensional partially linear models, Ref. [19] defines the least squares estimation of the coefficient vector β of the partially linear model based on the adaptive lasso method.
min β 1 2 Y ^ X ^ β 2 + λ j = 1 p w j β j .
This model performs well in variable selection for high-dimensional data. In order to better solve the problem of a high correlation between adjacent variables, based on the least squares idea and the fused lasso variable selection method, the penalized least squares estimate of the regression coefficient β is defined as
min β 1 2 Y ^ X ^ β 2 + λ 1 β 1 + λ 2 j = 1 p β j β j 1 ,
where λ 1 > 0 , λ 2 > 0 is the tuning parameter.
For convenience, write j = 1 p β j β j 1 as D β 1 , where D is the matrix of differential operators:
D = 1 0 0 1 1 0 0 1 1 0 0 1 .
Therefore, Equation (7) can be written as
min β 1 2 Y ^ X ^ β 2 + λ 1 β 1 + λ 2 D β 1 ,
Introducing the auxiliary variable, Equation (8) can be written equivalently:
min β 1 2 Y ^ X ^ β 2 + λ 1 β 1 + λ 2 y 1 , s . t . D β = y
then the solution of Equation (7) is transformed into the solution of constrained optimization problems Equation (9), that is Fused lasso Partially linear model (for short FPLM).

3. The Solution and Algorithm Design of FPLM

In this section, LADMM is used to solve Equation (9), and the algorithm design of LADMM to solve the fused lasso partially linear models is provided.

3.1. LADMM for Solving FPLM

For the optimization problem in Equation (9), the augmented Lagrange multiplier method is used to transform the constrained programming problem into an unconstrained programming problem, and then the augmented Lagrange function is
L ρ β , y , α = 1 2 Y ^ X ^ β 2 2 + λ 1 β 1 + λ 2 y 1 α T D β y + ρ 2 D β y 2 2 ,
where ρ > 0 is the penalty parameter.
Applying the augmented Lagrange method, the  m + 1 step iteration of Equation (9) is
β m + 1 = arg min β R p L ρ β , y m , α m , y m + 1 = arg min y R p 1 L ρ β m + 1 , y , α m , α m + 1 = α m ρ D β m + 1 y m + 1 .
A new iteration point ( β m + 1 , y m + 1 , α m + 1 ) can be obtained by solving the subproblem of Equation (11). The subproblems are solved separately as follows.
Consider the β -subproblem firstly:
β m + 1 = arg min β R p 1 2 Y ^ X ^ β 2 2 + λ 1 β 1 ( α m ) T D β y m + ρ 2 D β y m 2 2 = arg min β R p 1 2 Y ^ X ^ β 2 2 + λ 1 β 1 + ρ 2 D β y m α m α m ρ ρ 2 2 = arg min β R p λ 1 β 1 + 1 2 X ˜ β Y ˜ m 2 2 .
where X ˜ = X ^ T , ρ D T T R n + p 1 × p , Y ˜ m = ( Y ^ T , ρ y m + α m α m ρ ρ T R n + p 1 . Due to the presence of the non-smooth term λ 1 β 1 and the non-identity matrix X ˜ , the  β -subproblem does not have a closed-form solution. Therefore, the approximate solution of β can be obtained by linearizing the quadratic term 1 2 X ˜ β Y ˜ m 2 2 in Equation (12) and expanding this term at β m by Taylor, that is,
1 2 X ˜ β Y ˜ m 2 2 = 1 2 X ˜ β m Y ˜ m 2 2 + X ˜ T X ˜ β m Y ˜ m T β β m + υ 2 β β m 2 2 ,
where υ > 0 is the parameter that controls the proximity of β m . Thus, Equation (12) is
β m + 1 = arg min β R p λ 1 β 1 + X ˜ T X ˜ β m Y ˜ m T β β m + υ 2 β β m 2 2 = arg min β R p λ 1 β 1 + υ 2 β β m + X ˜ T X ˜ β m Y ˜ m X ˜ T X ˜ β m Y ˜ m υ υ 2 2 ,
To obtain a closed-form solution to the subproblem, the following operator [29] is considered:
x * = arg min x λ x 1 + r 2 x κ 2 2 ,
and the closed solution of this operator is
x * : = S λ λ r r ( κ ) : = s i g n ( κ ) max κ λ r , 0 ,
where r is a scalar.
Then, the solution of the β -subproblem can be given:
β m + 1 = S λ 1 λ 1 υ υ ( β m X ˜ T ( X ˜ β m Y ˜ k ) X ˜ T ( X ˜ β m Y ˜ k ) υ υ ) = s i g n ( γ ) max γ λ 1 υ , 0 .
where γ = β m X ˜ T ( X ˜ β m Y ˜ k ) β m X ˜ T ( X ˜ β m Y ˜ k ) υ υ .
For the solution of y-subproblem, it can be obtained through calculation:
y m + 1 = arg min y R p 1 λ 2 y 1 ( α m ) T D β m + 1 y + ρ 2 D β m + 1 y 2 2 = arg min y R p 1 λ 2 y 1 + ρ 2 y D β m + 1 + α m α m ρ ρ 2 2 = S λ 2 λ 2 ρ ρ ( D β m + 1 α m α m ρ ρ ) = s i g n ( η ) max η λ 2 ρ , 0 .
where η = D β m + 1 α m α m ρ ρ .
Overall, applying LADMM, the solution of the β -subproblem of Equation (9) can be given by Equation (14), and the solution of the y-subproblem can be given by Equation (15). Then, the  m + 1 iteration of Equation (9) is
β m + 1 = S λ 1 λ 1 υ υ ( β m X ˜ T ( X ˜ β m Y ˜ k ) X ˜ T ( X ˜ β m Y ˜ k ) υ υ ) , y m + 1 = S λ 2 λ 2 ρ ρ ( D β m + 1 α m α m ρ ρ ) , α m + 1 = α m ρ D β m + 1 y m + 1 .

3.2. Algorithm Design of LADMM for Solving FPLM

In this section, we provide the algorithm design for solving fused lasso partially linear models using LADMM. In summary, when solving an optimization problem with constraints, the augmented Lagrangian function is constructed firstly, the problem is transformed into an unconstrained problem, and then the solution of the model is obtained by solving the iterative subproblems with LADMM. Therefore, the iterative algorithm for LADMM to solve FPLM is (Algorithm 1).
Algorithm 1 Iterative scheme of LADMM for FPLM.
Input: X , Y , W , t and t o l . Choose λ 1 > 0 , λ 2 > 0 , ρ > 0 , υ > μ ( X ^ T X ^ + ρ D T D ) ,
            Given the initial variables ( β 0 , y 0 , α 0 ) .
Output: Stop until a certain stopping criterion is met
1: Compute β m + 1 by Equation (14),
2: Compute y m + 1 by Equation (15),
3: Compute α m + 1 by α m + 1 = α m ρ ( D β m + 1 y m + 1 ) ,
return β m + 1 .

4. Convergence

In this section, we will use a variational inequality to prove that the algorithm given above converges. The Lagrange function of Equation (9) is
L ρ β , y , α = 1 2 Y ^ X ^ β 2 2 + λ 1 β 1 + λ 2 y 1 α T D β y ,
where α R p 1 is the Lagrange multiplier.
Solving Equation (9) is equivalent to finding ( β * , y * , α * ) Q * : = R p × R p 1 × R p 1 , f ( β * ) ( β * 1 ) , g ( y * ) ( y * 1 ) , such that the first-order optimality condition of Equation (17) holds:
0 = λ 1 f ( β * ) + X ^ T ( X ^ β * Y ^ ) D T α * , 0 = g ( y * ) + α * , 0 = D β * y * .
where ( · ) represents the subdifferential operator of the non-smooth convex function. Q * represents all elements in Q that satisfy Equation (18).
Define θ * = ( β * , y * , α * ) Q * , f β ( β 1 ) , g ( y ) ( y 1 ) ; it is possible to write Equation (18) as a variational inequality:
V I ( Q , F ) : ( θ θ * ) F ( θ * ) 0 θ Q ,
where
θ = β y α F ( θ ) = λ 1 f ( β ) + X ^ T ( X ^ β Y ^ ) D T α g ( y ) + α D β y .
The following analysis requires the use of positive definite matrices G, which are given by
G = υ I p X ˜ T X ˜ 0 0 0 ρ I p 0 0 0 1 ρ I p
where X ˜ = ( X ^ T , ρ D T ) T . υ > μ ( X ^ T X ^ + ρ D T D ) guarantees the positive determinism of G. μ ( · ) represents the spectral radius of the matrix.
The following lemma describes the m + 1 -th iteration of the algorithm as the V I ( Q , F ) problem used to analyze the convergence of the algorithm.
Lemma 1. 
Let θ m be the sequence produced by the algorithm; then, there is
( θ θ m + 1 ) T ( F ( θ m + 1 ) + R ( y m y m + 1 ) G ( θ m θ m + 1 ) ) 0 θ Q
where
R = ρ D ρ I p 1 0 p 1 .
Proof. 
By X ˜ = ( X ^ T , ρ D T ) T and α m + 1 = α m ρ ( D β m + 1 y m + 1 ) , we have
X ˜ T ( X ˜ β m + 1 Y ˜ m ) = ( X ^ T X ^ + ρ D T D ) β m + 1 ( X ^ T Y ^ + ρ D T y m + D T α m ) = X ^ T ( X ^ β m + 1 Y ^ ) D T α m + 1 D T ρ ( y m y m + 1 ) .
From Equations (13) and (15), it can be seen that the iterative scheme of iteration Equation (11) is equivalent to finding f ( β m + 1 ) ( β m + 1 1 ) , θ m + 1 = ( β m + 1 , y m + 1 , α m + 1 ) Q , g ( y m + 1 ) ( y m + 1 1 ) , satisfying
0 = λ 1 f ( β m + 1 ) + X ˜ T ( X ˜ β m Y ˜ m ) + υ ( β m + 1 β m ) , 0 = λ 2 g ( y m + 1 ) + ρ ( y m + 1 D β m + 1 + α m α m ρ ρ ) , 0 = D β m + 1 y m + 1 ( α m α m + 1 ) ( α m α m + 1 ) ρ ρ .
Combining Equation (22) and the positive definite matrices G, Equation (23) can easily be written as Equation (21). □
The following lemma can be easily derived from Lemma 1.
Lemma 2. 
Let θ m be the sequence produced by the algorithm; then, for any θ * Q * ,
( θ m θ * ) T G ( θ m θ m + 1 ) ( θ m θ m + 1 ) T G ( θ m θ m + 1 ) ( α m α m + 1 ) T ( y m y m + 1 ) .
From Lemmas 1 and 2, it can be seen that the sequence is contracted for the solution set Q * .
Lemma 3. 
Let θ m be the sequence produced by the algorithm; then, for any θ * Q * ,
θ m + 1 θ * G 2 θ m θ * G 2 θ m θ m + 1 G 2
Lemma 3 indicates that the sequence θ m is contracted for the solution set Q * ; the following corollary can be obtained from Lemma 3.
Corollary 1. 
Let θ m be the sequence produced by the algorithm; then,
1.
lim m θ m θ m + 1 G = 0 .
2.
The sequence θ m is bounded.
3.
For any θ * Q * , the sequence θ m θ * G is monotonic and does not increase.
Theorem 1. 
For any ρ > 0 , υ > μ ( X ^ T X ^ + ρ D T D ) , given the initial point ( β 0 , y 0 , α 0 ) Q , the sequence θ m = ( β m , y m , α m ) generated by the algorithm converges to θ = ( β , y , α ) , where ( β , y ) is the solution of Equation (9).
Proof. 
According to Corollary 1, there is
lim m β m β m + 1 G = 0 , lim m y m y m + 1 G = 0 , lim m α m α m + 1 G = 0 .
Additionally the sequence θ m has at least one cluster point, noted as θ = ( β , y , α ) . Let θ m i be the subsequence that converges to θ ; then, we have
β m i β , y m i y , α m i α ,
lim i β m i β m i + 1 G = 0 , lim i y m i y m i + 1 G = 0 , lim i α m i α m i + 1 G = 0 .
It is shown below that the cluster point θ satisfies the optimality condition Equation (18). According to Lemma 1 and Equation (27), for any θ Q ,
lim i ( θ θ m i ) T F ( θ m i ) 0 ,
From Equation (26), the above inequality becomes
( θ θ ) T F ( θ ) 0 , θ Q .
so the cluster point θ satisfies Equation (18), that is, θ Q * . From Corollary 1, for any m 0 ,
θ m + 1 θ G θ m θ G .
In summary, the sequence θ m has a unique convergence θ , so the sequence θ m converges to θ . This proof is completed. □

5. Asymptotic Properties

In this section, consider the asymptotic properties of fused lasso β ^ estimators for partially linear model parameters β .
Theorem 2. 
If λ ( N ) λ ( N ) N N λ ( 0 ) , and V = lim N ( 1 N i = 1 N X ^ i X ^ i T ) is non-singular, then
N ( β * β ) d arg min Z ( u ) .
where
Z ( u ) = 2 u T M + u T V u + λ 1 ( 0 ) j = 1 p u j sgn ( β j ) I ( β j 0 ) + u j I ( β j = 0 ) + λ 2 ( 0 ) j = 2 p ( u j u j 1 ) sgn ( β j β j 1 ) I ( β j β j 1 ) + u j u j 1 I ( β j = β j 1 ) .
where M N ( 0 , σ 2 V ) .
Proof. 
Define Z N ( u ) by
Z N ( u ) = i = 1 N ( ε i u T X ^ i u T X ^ i N N ) 2 ε i 2 + λ 1 ( N ) j = 1 p ( β j + u j u j N N β j ) + λ 2 ( N ) j = 1 p 1 ( β j β j 1 + ( u j u j 1 ) ( u j u j 1 ) N N β j β j 1 ) .
where u = u 1 , , u p T . We can determine that Z N is smallest at N ( β * β ) . Note that
i = 1 N ( ε i u T X ^ i u T X ^ i N N ) 2 ε i 2 d 2 u T M + u T V u ,
with finite dimensional convergence holding trivially. In addition, we have
λ 1 ( N ) j = 1 p ( β j + u j u j N N β j ) λ 1 ( 0 ) j = 1 p u j sgn ( β j ) I ( β j 0 ) + u j I ( β j = 0 ) , λ 2 ( N ) j = 1 p 1 β j β j 1 + ( u j u j 1 ) ( u j u j 1 ) N N β j β j 1 λ 2 ( 0 ) j = 2 p ( u j u j 1 ) sgn ( β j β j 1 ) I ( β j β j 1 ) + u j u j 1 I ( β j = β j 1 ) ,
Thus, Z N ( u ) d Z ( u ) , and because of the convexity of Z N ( u ) and the fact that Z ( u ) has a unique minimum,
arg min Z N ( u ) = N ( β * β ) d arg min Z ( u ) .
Theorem 2 shows that if λ ( N ) = o ( N ) , then N ( β * β ) d V 1 M N ( 0 , σ 2 V 1 ) . If λ ( N ) = O ( N ) , then the estimated β of the non-zero coefficient β c * of β c is not N -Consistent.

6. Numerical Experiments

In this section, two kinds of data are used to demonstrate that the model proposed in this paper performs better in high-dimensional data with a high correlation between adjacent variables. The experiments were divided into two groups, one with general high-dimensional data and the other with high-dimensional simulated data with a high correlation between adjacent variables. The LADMM algorithm is used to solve the fused lasso partially linear model, the ADMM algorithm is used to solve the adaptive lasso partially linear model, and numerical comparisons are made to test the validity of the models. All calculations are performed on the Microsoft Windows 10 operating system and MATLAB R2016a.
When estimating the nonparametric part with a weight function, the selection of window width h directly determines the estimation effect of the nonparametric part, which in turn determines the estimation effect of the parametric part. The theoretical optimal value of window width [30] is h = c n 1 1 5 5 ; c is constant, and c = 1 / 2 is taken in this paper. The kernel function takes the Gaussian kernel K ( u ) = 1 2 π e 1 2 u 2 .

6.1. Numerical Experimental Results of General High-Dimensional Data

The section deals with general high-dimensional data, denoted as numerical experiment 1. A data set with a sample size n and feature p is generated. X obeys a p-dimensional multivariate normal distribution, that is, X N ( 0 , ) , = 0 . 5 j i , where i, j are the i-th and j-th components of the covariance, respectively, and parameters are given by β = ( 1 , 2 , 0.5 , 1 , 0 , . . . , 0 ) T .The random error term ε N ( 0 , σ 2 ) ,where σ takes 0.5, 1, 2. The variable t follows a uniform distribution on interval [ 0 , 1 ] , that is, t U ( 0 , 1 ) . the measurable function takes g ( t ) = cos ( 2 π t ) , and the response variable is generated by Y = X β + g ( t ) + ε . In addition, for the parameter λ 1 = 1 , λ 2 = 0.1 , ρ = 1 in the fused lasso model, the parameter λ = 0.1 , ρ = 0.01 in the adaptive lasso model, the sample size is set to n = 100 , and the dimension takes a different value.
Simulation experiments were conducted for the Fused Lasso partially linear model and the adaptive Lasso partially linear model using the high-dimensional data with highly correlated to adjacent variables and parameter generated by the above method, respectively, and the main results are shown in Table 1.
The indicator for evaluating the quality of a result is the mean squared error, that is, M S E = β * β 2 , where MSEF represents the mean squared error of the fused lasso method, and MSEA represents the mean squared error of the adaptive lasso method.
The results in Table 1 show that the mean squared error of the fused lasso method is slightly higher than that of the adaptive lasso method under general high-dimensional data, but the mean squared errors of both methods are small. The mean squared error increases with the increase in σ but is small overall, indicating that both methods are ideal for processing general high-dimensional data.

6.2. Numerical Experimental Results of High-Dimensional Data Highly Correlated to Adjacent Variables

In this section, simulated data that are highly correlated with adjacent variables will be processed separately using the fusion lasso method and the adaptive lasso method. Firstly, a Gaussian matrix X j with a unit column norm is randomly generated. In order to generate solutions with sparsity and linear relationships with adjacent variables, the dimension p is divided into 80 groups, 10 of them are randomly chosen, and then sample sets of bases s = i , i = 3 , 5 , 7 , 9 , 11 , 13 , 15 are randomly selected, respectively. The parameter β is generated as follows [14]:
β i = ξ i ( 1 + c i ) , i f i S ; 0 , otherwise ,
where c i follows the N ( 0 , 1 ) distribution, ξ i is randomly selected in the set 1 , 1 , and the random error term σ = 0.01 , 0.5 , 1 . The settings for variable t, measurable function g ( t ) , and response variable Y are consistent with the method in numerical experiment 1. To compare the effects of the two methods, take ( n , p , s ) = ( 100 , 80 i , i ) , i = 3 , 5 , 7 , 9 , 11 , 13 , 15 . For parameters λ = 0.1 , ρ = 0.1 in the adaptive lasso model, and parameters λ 1 = 1 , λ 2 = 0.01 , ρ = 1 in the fused lasso model.
Due to the consistency of kernel function estimation methods for nonparametric parts, only the nonparametric estimation effect at p = 1200 , σ = 1 is shown here, as shown in Figure 1:
From Figure 1, it can be seen that using kernel functions to estimate nonparametric parts has a better effect.
Simulation experiments were conducted for the fused lasso partially linear model and the adaptive lasso partially linear model using the high-dimensional data and parameters generated by the above methods, respectively, and the main results are shown in Table 2.
As can be seen from Table 2, the mean squared error of the fused lasso is smaller than that of the adaptive lasso when the neighboring variables are highly correlated, and as the error interference increases, the difference between fused lasso and adaptive lasso becomes more significant. This shows that when dealing with data with a high correlation of adjacent variables, the fused lasso effect is more prominent. Figure 2, Figure 3 and Figure 4 are the parameter estimates when σ = 0.01 , 0.5 , 1 , respectively.
It can also be seen from Figure 2, Figure 3 and Figure 4 that the fused lasso method performs better than the adaptive lasso in high-dimensional data with high correlation between adjacent variables, which is consistent with the existing conclusions.
In conclusion, the numerical experiments show that the fused lasso partially linear model proposed in this paper has a good fitting effect in general high-dimensional data, and the fitting effect is better than the adaptive lasso in high-dimensional data where the adjacent variables are highly correlated, so it has a certain applicability.

7. Example Analysis

In this section, the method proposed in this paper is applied to the actual data of worker wage analysis provided by the 1985 Current Population Survey (CPS85) [31]. These indicators of data include both quantitative and categorical data, so they are representative. In addition, these data have been studied by other scholars, which facilitates comparison. The data contain 534 observation samples and 11 variables, providing information on worker wages and other attribute of workers, such as sector, occupation, south, race, age, marital status, sex, number of years of education, union, and number of years of work experience. Among these, there is not necessarily a simple linear relationship between number of years of work experience and worker wages, so this is selected as a nonparametric variable T i , and other worker attributes are used as parameter variables X i .
In the experiment, 70 percent of the sample was selected as the training set, and the remaining 30 percent was used as the prediction set for predicting worker wages. The prediction effect is evaluated using median absolute error (MAE) and standard error (SE) as indicators, and the expression is
M S E = m e d a i n y 1 y 1 * , y 2 y 2 * , , y n y n * ,
S E = ( y i y i * ) 2 n .
The smaller the values of MAE and SE, the better the prediction performance. The results are shown in Table 3:
The first few rows of Table 3 display the SE and MAE for predicting worker wages based on various worker characteristics, while the last row shows the overall SE and MAE for predicting worker wages using all worker feature variables. From the results in Table 3, it can be seen that the SE and MAE values predicted using a single worker characteristic variable are very small, and when all variables are combined to predict worker wages, despite the correlation between age and marital status, race, and south, the resulting SE and MAE values are still very small. This indicates that the error between the predicted and actual wages of workers is relatively small, so the method proposed in this article is relatively effective.

8. Conclusions

In this paper, a fused lasso partially linear model is proposed to solve the problem of adjacent data being highly correlated, the LADMM algorithm is designed to estimate the parameters of the model, the algorithm framework is presented, the convergence of the algorithm is analyzed, and the asymptotic property of the model is proved. Numerical experiments were conducted using general high-dimensional data and high-dimensional data with high correlation between adjacent variables, and the method proposed in this paper was applied to the practical problem of worker wage prediction. The results showed that the fused lasso partially linear model proposed in this paper has a certain applicability. This model is an extension of the partially linear model, expanding the applicability of the semiparametric model.

Author Contributions

Conceptualization, A.F. and J.F.; methodology, A.F. and J.F.; software, J.F. and X.C.; validation, J.F. and M.Z.; writing—original draft preparation, J.F.; writing—review and editing, A.F., J.F., Z.J. and X.C.; visualization, J.F., M.Z. and X.C.; supervision, A.F.; project administration, A.F. and Z.J. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded in part by the National Natural Science Foundation of China (Grant No. 12101195, 12071112).

Data Availability Statement

Not applicable.

Acknowledgments

The research was funded in part by the Provincial first-class undergraduate curriculum project of mathematical models. We sincerely thank the anonymous reviewers for their insightful comments, which greatly improved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Engle, R.F.; Granger, C.W.; Rice, J.; Weiss, A. Semiparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc. 1986, 81, 310–320. [Google Scholar] [CrossRef]
  2. Robinson, P.M. Root-N-consistent semiparametric regression. Econom. J. Econom. Soc. 1988, 56, 931–954. [Google Scholar] [CrossRef] [Green Version]
  3. Speckman, P. Kernel smoothing in partial linear models. J. R. Stat. Soc. Ser. R Methodol. 1988, 50, 413–436. [Google Scholar] [CrossRef]
  4. Chen, H.; Shiau, J.J.H. A two-stage spline smoothing method for partially linear models. J. Stat. Plan. Inference 1991, 27, 187–201. [Google Scholar] [CrossRef]
  5. Jiang, Y. S-estimator in partially linear regression models. J. Appl. Stat. 2017, 44, 968–977. [Google Scholar] [CrossRef]
  6. Falck, T.; Signoretto, M.; Suykens, J.A.; De Moor, B. A Two Stage Algorithm for Kernel Based Partially Linear Modeling with Orthogonality Constraints; Technical report, Internal Report 10-03 ESAT-SISTA; KU Leuven: Leuven, Belgium, 2011. [Google Scholar]
  7. Feng, A.; Chang, X.; Shang, Y.; Fan, J. Application of the ADMM Algorithm for a High-Dimensional Partially Linear Model. Mathematics 2022, 10, 4767. [Google Scholar] [CrossRef]
  8. Chand, S.; Ahmad, S.; Batool, M. Solution path efficiency and oracle variable selection by Lasso-type methods. Chemom. Intell. Lab. Syst. 2018, 183, 140–146. [Google Scholar] [CrossRef]
  9. Yazdi, M.; Golilarz, N.A.; Nedjati, A.; Adesina, K.A. An improved lasso regression model for evaluating the efficiency of intervention actions in a system reliability analysis. Neural Comput. Appl. 2021, 33, 7913–7928. [Google Scholar] [CrossRef]
  10. Chen, S.B.; Zhang, Y.M.; Ding, C.H.; Zhang, J.; Luo, B. Extended adaptive Lasso for multi-class and multi-label feature selection. Knowl. Based Syst. 2019, 173, 28–36. [Google Scholar] [CrossRef]
  11. Li, J.; Lu, Q.; Wen, Y. Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2020, 36, 1785–1794. [Google Scholar] [CrossRef]
  12. Huang, J.; Jiao, Y.; Kang, L.; Liu, Y. Fitting sparse linear models under the sufficient and necessary condition for model identification. Stat. Probab. Lett. 2021, 168, 108925. [Google Scholar] [CrossRef]
  13. Zhu, Y. An augmented ADMM algorithm with application to the generalized lasso problem. J. Comput. Graph. Stat. 2017, 26, 195–204. [Google Scholar] [CrossRef]
  14. Li, X.; Mo, L.; Yuan, X.; Zhang, J. Linearized alternating direction method of multipliers for sparse group and fused LASSO models. Comput. Stat. Data Anal. 2014, 79, 203–221. [Google Scholar] [CrossRef]
  15. Li, M.; Guo, Q.; Zhai, W.; Chen, B. The linearized alternating direction method of multipliers for low-rank and fused LASSO matrix regression model. J. Appl. Stat. 2020, 47, 2623–2640. [Google Scholar] [CrossRef] [PubMed]
  16. Ma, C.; Huang, J. Asymptotic properties of lasso in high-dimensional partially linear models. Sci. China Math. 2016, 59, 769–788. [Google Scholar] [CrossRef]
  17. Yang, L.; Fang, Y.; Wang, J.; Shao, Y. Variable selection for partially linear models via learning gradients. Electron. J. Statist. 2017, 11, 2907–2930. [Google Scholar] [CrossRef]
  18. Liu, J.; Lou, L.; Li, R. Variable selection for partially linear models via partial correlation. J. Multivar. Anal. 2018, 167, 418–434. [Google Scholar] [CrossRef]
  19. Li Feng, L.Y.; Gaorong, L. Variable selection of Adaptive lasso partial linear models. Chin. J. Appl. Probab. Stat. 2012, 28, 614–624. [Google Scholar]
  20. Yang, M.; Xiao, Y.; Li, P.; Zhu, H. Semismooth Newton Augmented Lagrangian Algorithm for Adaptive Lasso Penalized Least Squares in Semiparametric Regression. arXiv 2021, arXiv:2111.10766. [Google Scholar]
  21. Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. D Stat. Methodol. 2005, 67, 91–108. [Google Scholar] [CrossRef] [Green Version]
  22. Petersen, A.; Witten, D.; Simon, N. Fused lasso additive model. J. Comput. Graph. Stat. 2016, 25, 1005–1025. [Google Scholar] [CrossRef] [Green Version]
  23. Cui, L.; Bai, L.; Wang, Y.; Philip, S.Y.; Hancock, E.R. Fused lasso for feature selection using structural information. Pattern Recognit. 2021, 119, 108058. [Google Scholar] [CrossRef]
  24. Liu, C.H.; Wu, D.; Shang, Y.L. A new infeasible-interior-point algorithm based on wide neighborhoods for symmetric cone programming. J. Oper. Res. Soc. China 2016, 4, 147–165. [Google Scholar] [CrossRef]
  25. Tibshirani, R.; Wang, P. Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 2008, 9, 18–29. [Google Scholar] [CrossRef] [PubMed]
  26. Parikh, N.; Boyd, S. Proximal algorithms. Found. Trends Optim. 2014, 1, 127–239. [Google Scholar] [CrossRef]
  27. Li, P.; Xiao, Y. An efficient algorithm for sparse inverse covariance matrix estimation based on dual formulation. Comput. Stat. Data Anal. 2018, 128, 292–307. [Google Scholar] [CrossRef]
  28. Jin, Z.F.; Wan, Z.; Jiao, Y.; Lu, X. An alternating direction method with continuation for nonconvex low rank minimization. J. Sci. Comput. 2016, 66, 849–869. [Google Scholar] [CrossRef]
  29. Donoho, D.L. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 1995, 41, 613–627. [Google Scholar] [CrossRef] [Green Version]
  30. Silverman, B.W. Density Estimation for Statistics and Data Analysis; CRC Press: Boca Raton, FL, USA, 1986; Volume 26. [Google Scholar]
  31. Berndt, E.R. The Practice of Econometrics: Classic and Contemporary; Addison-Wesley Publishing Company: Reading, MA, USA; Don Mills, ON, Canada, 1991. [Google Scholar]
Figure 1. Non-parametric estimation results.
Figure 1. Non-parametric estimation results.
Mathematics 11 02726 g001
Figure 2. Parameter estimate for fused lasso (left) and adaptive lasso (right) at σ = 0.01 , p = 1200 .
Figure 2. Parameter estimate for fused lasso (left) and adaptive lasso (right) at σ = 0.01 , p = 1200 .
Mathematics 11 02726 g002
Figure 3. Parameter estimate for fused lasso (left) and adaptive lasso (right) at σ = 0.5 , p = 1200 .
Figure 3. Parameter estimate for fused lasso (left) and adaptive lasso (right) at σ = 0.5 , p = 1200 .
Mathematics 11 02726 g003
Figure 4. Parameter estimate for fused lasso (left) and adaptive Lasso (right) at σ = 1 , p = 1200 .
Figure 4. Parameter estimate for fused lasso (left) and adaptive Lasso (right) at σ = 1 , p = 1200 .
Mathematics 11 02726 g004
Table 1. Mean squared error processed by the two methods.
Table 1. Mean squared error processed by the two methods.
np σ MSEFMSEA σ MSEFMSEA σ MSEFMSEA
1001800.50.00250.002710.00500.003020.01010.0043
2800.00290.00230.00640.00240.01340.0030
3800.00490.00240.01040.00230.02150.0027
4800.00670.00210.01430.00210.02960.0024
5800.00690.00240.01430.00270.02940.0035
6800.00840.00210.01770.00220.03650.0026
7800.00880.00210.01910.00200.03990.0023
8800.01490.00240.03260.00230.06830.0024
9800.02160.00220.04460.00230.09130.0025
10800.00380.00230.00850.00220.01800.0025
Table 2. Mean squared error processed by the two methods.
Table 2. Mean squared error processed by the two methods.
np σ MSEFMSEA σ MSEFMSEA σ MSEFMSEA
1002400.010.00470.00940.50.00470.011210.00470.0134
4000.00470.01000.00470.01120.00460.0127
5600.00440.01000.00440.01090.00450.0115
7200.00450.00920.00450.01100.00460.0120
8800.00450.00890.00450.00940.00450.0120
10400.00460.00850.00460.00860.00470.0103
12000.00460.001310.00460.01430.00470.0157
Table 3. The SE and MAE of worker wage prediction.
Table 3. The SE and MAE of worker wage prediction.
VariableDescriptionSEMAE
SectorManufacturing = 1, other = 00.19090.4813
OccupationProf = 1, sales = 1, Service = 1,
Management = 1, Other = 0
0.20400.7595
SouthSouth = 1, Other = 00.19100.4880
RaceWhite = 1, Other = 00.19030.4800
AgeAge0.19830.5651
MarryMarried = 1, Single = 00.19020.4944
SexFemale = 1, Male = 00.19010.5076
EducationNumber of years of education0.20430.6357
UnionUnion = 1, Not union = 00.18990.4916
TotalAll variables0.24451.0818
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, A.; Fan, J.; Jin, Z.; Zhao, M.; Chang, X. Research Based on High-Dimensional Fused Lasso Partially Linear Model. Mathematics 2023, 11, 2726. https://doi.org/10.3390/math11122726

AMA Style

Feng A, Fan J, Jin Z, Zhao M, Chang X. Research Based on High-Dimensional Fused Lasso Partially Linear Model. Mathematics. 2023; 11(12):2726. https://doi.org/10.3390/math11122726

Chicago/Turabian Style

Feng, Aifen, Jingya Fan, Zhengfen Jin, Mengmeng Zhao, and Xiaogai Chang. 2023. "Research Based on High-Dimensional Fused Lasso Partially Linear Model" Mathematics 11, no. 12: 2726. https://doi.org/10.3390/math11122726

APA Style

Feng, A., Fan, J., Jin, Z., Zhao, M., & Chang, X. (2023). Research Based on High-Dimensional Fused Lasso Partially Linear Model. Mathematics, 11(12), 2726. https://doi.org/10.3390/math11122726

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop