Sparse Estimation Strategies in Linear Mixed Effect Models for High-Dimensional Data Application

In a host of business applications, biomedical and epidemiological studies, the problem of multicollinearity among predictor variables is a frequent issue in longitudinal data analysis for linear mixed models (LMM). We consider an efficient estimation strategy for high-dimensional data application, where the dimensions of the parameters are larger than the number of observations. In this paper, we are interested in estimating the fixed effects parameters of the LMM when it is assumed that some prior information is available in the form of linear restrictions on the parameters. We propose the pretest and shrinkage estimation strategies using the ridge full model as the base estimator. We establish the asymptotic distributional bias and risks of the suggested estimators and investigate their relative performance with respect to the ridge full model estimator. Furthermore, we compare the numerical performance of the LASSO-type estimators with the pretest and shrinkage ridge estimators. The methodology is investigated using simulation studies and then demonstrated on an application exploring how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer’s disease.


Introduction
In many fields such as bio-informatics, physical biology, and epidemiology, the response of interest is represented by repeated measures of some variables of interest that are collected over a specified time period for different independent subjects or individuals. These types of data are commonly encountered in medical research where the responses are subject to various time-dependent and time-constant effects such as pre-and post-treatment types, gender effect, and baseline measures, among others. A widely-used statistical tool in the analysis and modeling of longitudinal and repeated measures data is the linear mixed effects model (LMM) [1,2]. This model provides an effective and flexible way to describe the means and the covariance structures of a response variable after accounting for within subject correlation.
The rapid growth in the size and scope of longitudinal data has created a need for innovative statistical strategies in longitudinal data analysis. Classical methods are based on the assumption that the number of predictors is less than the number of observations. However, there is an increasing demand for efficient prediction strategies for analysis of high-dimensional data, where the number of observed data elements (sample size) are smaller than the number of predictors in a linear model context. Existing techniques that deal with high-dimensional data mostly rely on various penalized estimators. Due to the trade-off between model complexity and model prediction, the statistical inference of model selection becomes an extremely important and challenging problem in high-dimensional data analysis.
Over the years, many penalized regularization approaches have been developed to do variable selection and estimation simultaneously. Among them, the least absolute shrinkage and selection operator (LASSO) is commonly used [3]. It is a useful estimation technique in part due to its convexity and computational efficiency. The LASSO approach is based on an 1 penalty for regularization of regression parameters. Ref. [4] provides a comprehensive summary of the consistency properties of the LASSO approach. Related penalized likelihood methods have been extensively studied in the literature, see for example [5][6][7][8][9][10]. The penalized likelihood methods have a close connection to Bayesian procedures. Thus, the LASSO estimate corresponds to a Bayes method that puts a Laplacian (double-exponential) prior on the regression coefficients [11,12].
In this paper, our interest lies in estimating the fixed effect parameters of the LMM using a ridge estimation technique when it is assumed that some prior information is available in the form of potential linear restrictions on the parameters. One possible source of prior information is using a Bayesian approach. An alternative source of prior information may be obtained from previous studies or expert knowledge that search for or assume sparsity patterns.
We consider the problem of fixed effect parameter estimation for LMMs when there exist many predictors relative to the sample size. These predictors may be classified into two groups: sparse and non-sparse. Thus, there are two choices to be considered: a full model with all predictors, and a sub-model that contains only non-sparse predictors. When the sub-model based on available subspace information is true (i.e., the assumed restriction holds), it then provides more efficient statistical inferences than those based on a full model. In contrast, if the sub-model is not true, the estimates could become biased and inefficient. The consequences of incorporating subspace information therefore depend on the quality or reliability of the information being incorporated into the estimation procedure. One way to deal with uncertain subspace information is to use a pretest estimation strategy. The validity of the information is tested before incorporation into a final estimator. Another approach is shrinkage estimation, which shrinks the full model estimator to the sub-model estimator by utilizing subspace information. Besides these estimation strategies, there is a growing literature on simultaneous model selection and estimation. These approaches are known as penalty strategies. By shrinking some regression coefficients toward zero, the penalty methods simultaneously select a sub-model and estimate its regression parameters. Several authors have investigated the pretest, shrinkage, and penalty estimation strategies in partial linear model, Poisson regression model, and Weibull censored regression model [13][14][15].
To formulate the problem, we suppose that the vector of the fixed effects parameter β in the LMM can be partitioned into two sub-vectors β = (β 1 , β 2 ) , where β 1 is the coefficient vector of non-sparse predictors and β 2 is the coefficient vector of sparse predictors. Our interest lies in the estimation of β 1 when β 2 is close to zero. To deal with this problem in the context of low dimensional data, ref. [16] propose an improved estimation strategy using sub-model selection and post-estimation for the LMM. Within this framework, linear shrinkage and shrinkage pretest estimation strategies are developed, which combine full model and sub-model estimators in an effective way as a trade-off between bias and variance. Ref. [17] extend this study by using a likelihood ratio test to develop James-Stein shrinkage and pretest estimation methods based on LMM for longitudinal data. In addition, the non-penalty estimators are compared with several penalty estimators (LASSO, adaptive LASSO and Elastic Net) for best performance.
In most real data situations, there is also the problem of multicollinearity among predictor variables for high-dimensional data. Various biased estimation techniques such as shrinkage estimation, partial least squares estimation [18] and Liu estimators [19] have been implemented to deal with this problem, but the widely used technique is ridge estimation [20]. The ridge estimator overcomes the weakness of the least squares estimator with a smaller mean squared error. To overcome and combat multicollinearity, ref. [21] propose pretest and Stein-type ridge regression estimators for linear and partially linear models. Furthermore, ref. [22] also develop shrinkage estimation based on Liu regression to overcome multicollinearity in linear models.
Our primary focus is on the estimation and prediction problem for linear mixed effect models when there are many potential predictors that have a weak or no influence on the response of interest. This method simultaneously controls overfitting using general least square estimation with a roughness penalty. We propose pretest and shrinkage estimation strategies using the ridge estimation technique as a base estimator and numerically compare their performance with the LASSO and adaptive LASSO estimators. Our proposed estimation strategy is applied to both high-dimensional and low-dimensional data.
The rest of this article is organized as follows. In Section 2, we present the linear mixed effect model and the proposed estimation techniques. We introduce the full and sub-model estimators based on ridge estimation. Thereafter, we construct the pretest and shrinkage ridge estimators. Section 3 provides the asymptotic bias and risk of these estimators. A Monte Carlo simulation is used to evaluate the performance of the estimators including a comparison with the lasso-type estimators, and the results are reported in Section 4. Section 5 presents a demonstration of the proposed methodology on a high-dimensional resting-state effective brain connectivity and genetic data. We also illustrate the proposed estimation methods in an application to a low-dimensional Amsterdam growth and health study. Section 6 presents a discussion with recommendations.

Model and Estimation Strategies
In this section, we present the linear mixed effect model and the proposed estimation strategies.

Linear Mixed Model
Suppose that we have a sample of N subjects. For the i th subject, we collect the response variable y ij for the jth time, where i = 1 . . . , n; j = 1 . . . , n i and N = ∑ n i=1 n i . Let Y i = (y i1 , . . . y in i ) denotes the n i × 1 vector of responses from the ith subject. Let X i = (x i1 , . . . , x in i ) and Z i = (z i1 , . . . , z in i ) be n i × p and n i ×q known fixed-effects and random-effect design matrix for the ith subject of full rank p and q, respectively. The linear mixed effect model [1] for a vector of repeated responses Y i on the ith subject is assumed to have the form where β = (β 1 , . . . , β p ) is the p × 1 vector of unknown fixed-effect parameters or regression coefficients, a i is the q × 1 vector of unobservable random effects for the ith subject, assumed to come from a multivariate normal distribution with zero mean and a covariance matrix G, where G is an unknown q × q covariance matrix and i denotes n i ×1 vector of error terms assumed to be normally distributed with zero mean, covariance matrix σ 2 I n i . Further, i are assumed to be independent of the random effects a i . The marginal distribution for the response y i is normal with mean X i β and covariance matrix Cov(Y i ) = Z i σ 2 i Z T i + σ 2 I n . By stacking the vectors, the mixed model can be can be expressed as Y = Xβ + Za + . From the Equation (1), the distribution of the model follows

Ridge Full Model and Sub-Model Estimator
The generalized least square estimator (GLS) is defined asβ and the ridge full model estimator can be obtained by introducing a penalized regression Ridge is the ridge full model estimator and k ∈ [0, ∞) is the tuning parameter. If k = 0,β Ridge is the GLS estimator andβ Ridge = 0 for k is sufficiently large. We select the value of k using cross validation.
A sub-model is defined as Y = X β + Za + subject to β T β ≤ φ and β 2 = 0 which corresponds to Y = X 1 β 1 + Za + subject to β 1 T β 1 ≤ φ. The sub-model estimatorβ RSM 1 of β 1 has the formβ RSM 1 We denoteβ RFM 1 as the full model ridge estimator of β 1 and given aŝ

Pretest Ridge Estimation Strategy
Generally, the sub-model estimator will be more efficient than the full model estimator if the information embodied in the imposed linear restrictions is valid, thus β 2 is close to zero. However, if the information is not valid the sub-model estimator is likely to be more biased and may have a higher risk than the full model estimator. There is, therefore, some doubt as to whether or not to impose the restrictions on the model's parameter. It is in response to this uncertainty that a statistical test may be used to determine the validity of the proposed restrictions. Accordingly, the procedure to follow in practice is pretest the validity of the restrictions and if the outcome of the pretest suggests that they are correct then the model parameters are estimated incorporating the restrictions. If the pretest rejects the restrictions then the parameters are estimated from the sample information alone. This motivates the consideration of the pretest estimation strategy for the LMM.
The pretest estimator is a combination of the full model estimatorβ RFM 1 , and sub-model estimatorβ RSM 1 , through an indicator function I(L n ≤ d n,α ), where L n is an appropriate test statistic to test H 0 : β 2 = 0 versus H A : β 2 = 0. Moreover, d n,α is an α level critical value based on distribution of L n under H o . We define test statistics based on the log-likelihood Under H 0 , the test statistic L n follows asymptotic chi-square distribution with p 2 degrees of freedom. The pretest test ridge estimatorβ RPT 1 of β 1 is then defined bŷ )I(L n ≤ d n,α ), p 2 ≥ 1.

Shrinkage Ridge Estimation Strategy
The pre-test estimator is a discontinuous function of the sub-modelβ RSM 1 and full modelβ RFM 1 , which depends on the hard threshold (d n,α = χ 2 p 2 ,α ). We address this limitation by defining the shrinkage ridge estimator based on soft thresholding. The shrinkage ridge estimator (RSE) of β 1 , denoted asβ RSE 1 , is defined aŝ Here,β RSE 1 is the linear combination of the full modelβ RFM 1 and sub-modelβ RSM 1 estimates. . This could happen if (p 2 − 2)L −1 n is larger than one. To counter this, we use the positive-part shrinkage ridge estimator (RPS) defined aŝ The RPS estimator will control possible over-shrinking in the RSE estimator.

Asymptotic Results
In this section, we derive the asymptotic distributional bias and risk of the estimators considered in Section 2. We examine the properties of the estimators for increasing n and as β 2 approaches the null vector under the sequence of local alternatives defined as where κ = (κ 1 , κ 2 . . . , κ p 2 ) ∈ R p 2 is a fixed vector. The vector κ √ n is a measure of how far local alternatives K n differ from the subspace information β 2 = 0. In order to evaluate the performance of the estimators, we define the asymptotic distributional bias of the estimator In order to compute the risk functions, we first compute the asymptotic covariance of the estimators. The asymptotic covariance of an estimatorβ * 1 is expressed as Following the asymptotic covariance matrix, we define the asymptotic risk of an estimator β * 1 as R(β * 1 ) = tr QCov(β Assumption 1. We make the following two regularity conditions to establish the asymptotic properties of the estimators.

Proposition 1.
Assuming the above assumption 1 together with Theorem 1 hold, under the local alternatives K n , we have Under the condition of Theorem 1 and the local alternatives K n , the ADBs of the proposed estimators are is the cumulative distribution function of the non-central chi-squared distribution with non-centrality parameter ∆ and v degrees of freedom, and E(χ −2j v (∆)) is the expected value of the inverse of a non-central χ 2 distribution with v degrees of freedom and non-centrality parameter ∆, Since the ADBs of the estimators are in non-scalar form, we define the following asymptotic quadratic bias (AQDB) ofβ * 1 by When B 11.2 = 0, the AQDB of all estimators are equivalent, and the estimators are therefore asymptotically unbiased. If we assume that B 11.2 = 0, the results for the bias of the estimators can be summarized as follows: 1.
The AQDB ofβ RSM 1 is an unbounded function of γ T B 11.2 γ.

2.
The AQDB ofβ RPT 1 starts from µ T 11.2 B 11.2 µ 11.2 at ∆ = 0, and when ∆ increases, it increases to the maximum and then decreases to zero. 3.
The characteristics ofβ RSE  similarly start from µ T 11.2 B 11.2 µ 11.2 at ∆ = 0, and increase to a point, and then decrease towards zero, since E χ −2 p 2 +2 (∆) is a non-increasing on of ∆.
Theorem 3. Suppose Theorem 1 holds and under the local alternatives K n , the covariance matrices of the estimators are Proof. See Appendix B.2.

Corollary 2.
Under the local alternatives (K n ) and from Theorem 3, the risk of the estimators are obtained as From Theorem 2, when B 12 = 0, the risks of estimatorsβ . If B 12 = 0, the results can be summarized as follows: 1.

2.
The risk ofβ RPT 1 increases as ∆ moves away from zero, achieves it maximum and then decreases towards the risk of the full model estimator.

3.
The risk ofβ RFM 1 is smaller than the risk ofβ RPT 1 for small values in the neighborhood of ∆ and for the rest of the parameter space,β RPT  for all ∆ ≥ 0.

Simulation Studies
In this section, we conduct a simulation study to assess the performance of the suggested estimators for finite samples. The criterion for comparing the performance of any estimator in our study is the mean square error. We simulate the response from the following LMM model where i ∼ N (0, σ 2 I n i ) with σ 2 = 1. We generate random effect covariate a i from a multivariate normal distribution with zero mean and covariance matrix G = 0.5I 2×2 , where I 2×2 is 2 × 2 identity matrix. The design matrix X i = (x i1 , . . . , x in i ) is generated from a n imultivariate normal distribution with mean vector and covariance matrix Σ x . Furthermore, we assume that the off-diagonal elements of the covariance matrix Σ x are equal to ρ, which is the coefficient of correlation between any two predictors, with ρ = 0.3, 0.7, 0.9. The ratio of the largest eigenvalue to the smallest eigen-value of matrix X T V −1 X is calculated as a condition number index (CNI) [24], which assesses the existence of multicollinearity in the design matrix. If the CNI is larger than 30, then the model has significant multicollinearity. Our simulations are based on the linear mixed effects model in Equation (3) with n = 60 and 100 subjects. We consider a situation when the model is assumed to be sparse. In this study, our interest lies in testing the hypothesis H o : β 2 = 0, and our goal is to estimate the fixed effect coefficient β 1 . We partition the fixed effects coefficients as β = (β 1 , β 2 ) = (β 1 , 0 p 2 ) . The coefficients β 1 and β 2 are p 1 and p 2 dimensional vectors, respectively, with p = p 1 + p 2 .

Simulation Results
In this subsection, we present the results from our simulation study. We report the results for n = 60, 100 and p 1 = 5 with different values of correlation coefficient ρ are shown in Table 1. Furthermore, we plot the RMSEs against ∆ * in Figures 1 and 2. The findings can be summarized as follows: 1.
When ∆ * = 0, the sub-model RSM outperforms all other estimators. As ∆ * = 0 moves from zero, the RMSE of the sub-model decreases and goes to zero.

2.
The pretest ridge estimator RPT outperforms shrinkage ridge and positive Stein ridge estimators in the case of ∆ * = 0. However, for large number of sparse predictors p 2 while keeping p 1 and n fixed, RPT is less efficient than RPS and RSE. In the case of ∆ * being larger than zero, the RMSE of RPT decreases, and it remains below 1 for immediate values of ∆ * , after that the RMSE of RPT increases and approaches one for larger values of ∆ * . 3.
RPS performs better than RSE in the entire parameter space induced by ∆ * as presented in Tables 1 and 2. Similarly, both shrinkage estimators RPS and RSE outperforms the full ridge model estimator irrespective of the corrected sub-model selected. This is consistent with the asymptotic theory presented in Section 3.

4.
∆ * which measures the degree of deviation from the Assumption 1 on the parameter space, it is clear that one cannot go wrong with the use of shrinkage estimators even if the selected sub-model is wrongly specified. As evident from Tables 1 and 2, Figures 1 and 2, if the selected sub-model is correct, that is, ∆ * = 0, then the shrinkage estimators are relatively efficient compared with the ridge full model estimator. On the other hand, if the sub-model is misspecified, the gain slowly diminishes. However, in terms of risk, the shrinkage estimators are at least as good as the full ridge model estimator. Therefore, the use of shrinkage estimators makes sense in application when a sub-model cannot be correctly specified.

5.
The RMSE of the ridge-type estimators are an increasing function of the amount of multicollinearity. This indicates that the ridge-type estimators perform better than the classical estimator in the presence of multicollinearity among predictor variables.

Comparison with LASSO-Type Estimators
We compare our listed estimators with the LASSO and adaptive LASSO estimators. A 10-fold cross-validation is used for selecting the optimal value of the penalty parameters that minimizes the mean square errors for the LASSO-type estimators. The results for ρ = 0.3, 0.7, 0.9, n = 60, 100, p 1 = 10 and p 2 = 50, 500, 1000, 2000 are presented in Table 3. We observe the following from Table 3.

1.
The performance of the sub-model estimator is the best among all estimators.

2.
The pretest ridge estimator performs better than the other estimators. However, for larger values of sparse predictors p 2 the shrinkage estimators outperform the pretest estimator. 3.
The performance of the LASSO and aLASSO estimators are comparable when ρ is small. The pretest and shrinkage estimators remain stable for a given value of ρ.

4.
For a large number of sparse predictors p 2 , the shrinkage and pretest ridge estimators outperforms the lasso-type estimators. This indicates the superiority of the shrinkage estimators over the LASSO-type estimators. Therefore shrinkage estimators are preferable when there is multicollinearity in our predictor variables.

Real Data Application
We consider two real data analyses using Amsterdam Growth and Health Data and a genetic and brain network connectivity edge weight data to illustrate the performance of the proposed estimators.

Amsterdam Growth and Health Data (AGHD)
The AGHD data is obtained from the Amsterdam Growth and Health Study [25]. The goal of this study is to investigate the relationship between lifestyle and health in adolescence into young adulthood. The response variable Y is the total serum cholesterol measured over six time points. There are five covariates: X 1 is the baseline fitness level measured as the maximum oxygen uptake on a treadmill, X 2 is the amount of body fat estimated by the sum of the thickness of four skinfolds, X 3 is a smoking indicator (0 = no, 1 = yes), X 4 is the gender (1 = female, 2 = male), and time measurement as X 5 and subject specific random effects.
A total of 147 subjects participated in the study where all variables were measured at n i = 6 time occasions. In order to apply the proposed methods, firstly, we apply a variable selection based on AIC procedure to select the sub-model. For the AGHD data, we fit a linear mixed model with all the five covariates for both fixed and subject specific random effects by two stage selection procedure for the purpose of choosing both the random and fixed effects. The analysis found X 2 and X 5 to be significant covariates for prediction of the response variable serum cholestrol and the other variables are ignored since they are not significantly important. Based on this information, a sub-model is chosen to be X 2 and X 5 and the full model includes all the covariates. We construct the shrinkage estimators from the full-model and sub-model. In terms of null hypothesis, the restriction can be written as β 2 = (β 1 , β 3 , β 4 ) = (0, 0, 0) with p = 5, p 1 = 2 and p 2 = 3.
To evaluate the performance of the estimators, we obtain the mean square prediction error (MSPE) using bootstrap samples. We draw 1000 bootstrap samples of the 147 subjects from the data matrix {(Y ij , X ij ), i = 1, 2, . . . , 147; j = 1, 2, . . . , 6}. We then calculate the relative prediction error (RPE) of β * 1 with respect to β RFM 1 , the full model estimator. The RPE is defined as where β * 1 is one of the listed estimators. If RPE < 1, thenβ * 1 outperformsβ RFM 1 . Table 4 reports the estimates, standard error of the non-sparse predictors and RPEs of the estimators with respect to the full model. As expected, the sub-model ridge estimator β RSM 1 has the minimum RPE because it is computed when the sub-model is correct, that is, ∆ * = 0. It is evident by the RPE values in Table 4 that the shrinkage estimators are superior to the LASSO-type estimators. Furthermore, the positive shrinkage is more efficient than the shrinkage ridge estimator.

Resting-State Effective Brain Connectivity and Genetic Data
This data comprises longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) effective brain connectivity network and genetic study [26] data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The 111 subjects comprise 36 cognitively normal (CN), 63 mild cognitive impairment (MCI) and 12 Alzheimer's Disease (AD) subjects. The response is a network connection between regions of interest estimated from an rs-fMRI scan within the Default Mode Network (DMN), and we observe a longitudinal sequence of such connections for each subject with the number of repeated measurements. The DMN consists of a set of brain regions that tend to be active in resting-state, when a subject is mind wandering with no intended task. For this data analysis, we consider the network edge weight from the left intraparietal cortex to posterior cingulate cortex (LIPC → PCC) as our response. The genetic data are single nucleotide polymorphism (SNPs) from non-sex chromosomes, i.e., chromosome 1 to chromosome 22. SNPs with minor allele frequency less than 5% are removed as are SNPs with a Hardy-Weinberg equilibrium p-value lower than 10 −6 or a missing rate greater than 5%. After preprocessing we are left with 1,220,955 SNPs and the longitudinal rs-fMRI effective connectivity network using the 111 subjects with rs-fMRI data. The response is network edge weight. There are SNPs which are the fixed effects and subject specific random effects.
In order to apply the proposed methods, we use a genome-wide association study (GWAS) for screening the genetic data to 100 SNPs. We implement a second screening by applying multinomial logistic regression to identify a smaller subset of the 100 SNPs that are potentially associated with disease (CN/MCI/AD). This yields a subset of top 10 SNPs. This showed the top 10 SNPs are the most important predictors and the other 90 SNPs are ignored as not significant. We now have two models, which are the full model with all 100 SNPs and sub-model with 10 SNPs selected. Finally, we construct the pretest and shrinkage estimators from the full-model and sub-model. We draw 1000 bootstrap samples with replacements from the corresponding data matrix {(Y ij , X ij ), i = 1, . . . , 111; j = 1, . . . , n i }. We report the RPE of the estimators based on the bootstrap simulation with respect to the full model ridge estimator in Table 5. We observe that the RPE of the sub-model, pretest, shrinkage and positive shrinkage ridge estimators outperforms the full model estimator. Clearly, the sub-model ridge estimator has the smallest RPE since it's computed when the candidate sub-model is correct, i.e., ∆ = 0. Both shrinkage ridge estimators outperform the pretest ridge estimator. Particularly, the positive shrinkage performed better than the shrinkage estimator. The performance of both shrinkage and pretest ridge estimators are better than the LASSO-type estimators. Thus, the data analysis is in line with our simulation and theoretical findings.

Conclusions
In this paper, we present efficient estimation strategies for the linear mixed effect model when there exists multicollinearity among predictor variables for high-dimensional data application. We considered the estimation of fixed effects parameters in the linear mixed model when some of the predictors may have a very weak influence on the response of interest. We introduced pretest and shrinkage estimation in our model using the ridge estimation as the reference estimator. In addition, we established the asymptotic properties of the pretest and shrinkage ridge estimators. Our theoretical findings demonstrate that the shrinkage ridge estimators outperform the full model ridge estimator and perform relatively better than the sub-model estimator in a wide range of the parameter space.
Additionally, a Monte Carlo simulation was conducted to investigate and assess the finite sample behavior of proposed estimators when the model is sparse (restrictions on parameters hold). As expected, the sub-model ridge estimator outshines all other estimators when the restrictions hold. However, when this assumption is violated, the shrinkage and pretest ridge estimators outperform the sub-model estimator. Furthermore, when the number of sparse predictors are extremely large relative to the sample size, the shrinkage estimators outperform the pretest ridge estimator. These numerical results are consistent with our asymptotic result. We also assess the relative performance of the LASSO-type estimators with our ridge-type estimators. We observe that the performance of pretest and shrinkage ridge estimators are superior to the LASSO-type estimators when predictors are highly correlated. For our real data application, the shrinkage ridge estimators are superior with the smallest relative prediction error compared to the LASSO-type estimators.
In summary, the results of the data analyses strongly confirm the findings of the simulation study and suggest the use of the shrinkage ridge estimation strategy when no prior information about the parameter subspace is available. The results of our simulation study and real data application are consistent with available results in [27][28][29].
In our future work, we will focus on other penalty estimators like the Elastic-Net, the minimax concave penalty (MCP), and the smoothly clipped absolute deviation method (SCAD) as estimation strategy in LMM for high-dimensional data. These estimators will be assessed and compared with the proposed ridge-type estimators. Another interesting extension will be integrating two sub-models by incorporating ridge-type estimation strategies in the linear mixed effect models. The goal is to improve the estimation accuracy of the non-sparse set of the fixed effects parameters by combining an over-fitted model estimator with an under-fitted one [27,29]. This approach will include combining two sub-models produced by two different variable selection techniques from the LMM [28]. . Using this expression and under the local alternative {K n }, we obtain the following expressions Since ϕ 2 and ϕ 3 are linear functions of ϕ 1 , as n → ∞, they are also asymptotically normally distributed. Their mean vectors and covariance matrices are as follows: Therefore, the asymptotic distributions of the vectors ϕ 2 and ϕ 3 are obtained as follows:

Appendix B
We next introduce the lemmas given in [30] to aid with the proof of the bias and covariance of the estimators.
Lemma A1. Let V = (V 1 , V 2 , . . . V p ) T be a p-dimensional normal vector distributed as N p (µ v , Σ p ), then for a measurable function Ψ, we have where χ 2 k (∆) is a non-central chi-square distribution with k degrees of freedom and non-centrality parameter ∆.

Appendix B.2
In order to compute the risk functions, we first compute the asymptotic covariance of the estimators. The asymptotic covariance of an estimatorβ * 1 is expressed as Proof of Theorem 3. We first start by computing the asymptotic covariance of the estimatorβ RFM 1 as: 11.2 + µ 11.2 µ T 11.2 .
The asymptotic covariance of the estimatorβ RSE 1 can be obtained as We need to compute E ϕ 3 ϕ T 3 L −2 n and E ϕ 3 ϕ T 1 L −1 n . By using Lemma 1, the first term is obtained as follows: The second term is computed from normal theory From above, we can find E ϕ 3 δ T L −1 n = δδ T E χ −2 p 2 +2 (∆) and E ϕ 3 L −1 n = δE χ −2 p 2 +2 (∆) . Putting these terms together and simplifying, we obtain