Abstract
The identification of model parameters is a central challenge in the analysis of nonignorable nonresponse data. In this paper, we propose a novel penalized semiparametric likelihood method to obtain sparse estimators for a parametric nonresponse mechanism model. Based on these sparse estimators, an instrumental variable is introduced, enabling the identification of the observed likelihood. Two classes of estimating equations for the nonlinear regression model are constructed, and the empirical likelihood approach is employed to make inferences about the model parameters. The oracle properties of the sparse estimators in the nonresponse mechanism model are systematically established. Furthermore, the asymptotic normality of the maximum empirical likelihood estimators is derived. It is also shown that the empirical log-likelihood ratio functions are asymptotically weighted chi-squared distributed. Simulation studies are conducted to validate the effectiveness of the proposed estimation procedure. Finally, the practical utility of our approach is demonstrated through the analysis of ACTG 175 data.
MSC:
62J02
1. Introduction
Consider a dataset comprising n independent observations , where each observation includes a covariate vector and a scalar response variable . We consider a family of nonlinear regression models given by
where is a known nonlinear function with an unknown vector of parameters . The error term consists of two components: (1) , a variance function that modulates the error scale as a function of the covariates, and (2) , a sequence of i.i.d. random variables with and . Model (1) has been extensively studied in the statistical literature, including seminal contributions by Jennrich [1] and Wu [2]. A key example of such models is the Gompertz growth process, which is widely used in biology, epidemiology, and economics. The corresponding function is given by , where is the upper asymptote, controls displacement, and represents the growth rate (Fekedulegn et al. [3]). Similarly, the logistic growth function, frequently employed in population dynamics and epidemic modeling, is , where is the carrying capacity, is the growth rate, and is the inflection point. For a fully observed dataset , the parameter in model (1) is traditionally estimated using the weighted least squares (WLS) criterion. This estimator is obtained by minimizing the weighted residual sum of squares: . As a fundamental method in nonlinear regression, the WLS estimator achieves asymptotic efficiency under heteroscedasticity and serves as the theoretical foundation for various inferential procedures. For further details, see Ivanov [4].
Missing data frequently arise in practical applications due to factors such as reluctance to answer sensitive survey questions. In such cases, directly applying conventional least squares procedures to estimate the parameter vector may lead to biased estimates and invalid conclusions (see, for example, Little and Rubin [5]). The inverse probability weighting (IPW) method, introduced by Horvitz and Thompson [6], remains a fundamental approach for addressing missing data challenges. To improve efficiency, Robins et al. [7] developed the augmented inverse probability weighting (AIPW) method, which builds upon a corrected version of complete-case analysis. Subsequent extensions of this methodology across various domains include significant contributions by Han [8], Xue and Xie [9], Sharghi et al. [10], and Li et al. [11], among others. For missing at random (MAR) scenarios, Tang and Zhao [12] developed IPW and AIPW estimating equations for empirical likelihood (EL) inference on , extending the foundational methodology of Owen [13]. In more challenging not missing at random (NMAR) settings, where nonresponse depends on the unobserved values, Yang and Tang [14] proposed an EL approach for inference in this modeling framework.
The identification challenge remains a fundamental issue in the analysis of nonignorable missing data. The observed likelihood is identifiable if two distinct populations do not produce identical observed likelihood functions. Crucially, identifiability can fail even when both the outcome model and the nonresponse mechanism model are parametrically defined as demonstrated by Wang et al. [15]. Significant methodological advancements have been made in recent decades to address this issue. For parametric logistic nonresponse mechanisms, Yang and Tang [14] established identifiability conditions within the EL framework. In broader parametric settings, Wang et al. [15] introduced an instrumental variable (IV) approach to resolve the identifiability issue. More recently, Wang et al. [16] investigated an optimal subset selection method for identifying the IV from a set of candidate models. In addition, Chen et al. [17] suggested an IV selection technique based on pseudo-likelihood principles. Further advancements include the work of Du et al. [18] and Beppu and Morikawa [19]. Current estimation strategies for nonresponse mechanisms typically involve a two-stage process, starting with the identification of an appropriate IV, and followed by the estimation of the parameters in the nonresponse mechanism model. However, these methods face significant computational challenges as the candidate model space expands.
A novel penalized semiparametric likelihood method for IV selection is proposed under the parametric assumption of the missingness mechanism. By leveraging the sparse structure of the observed likelihood, we develop a regularized approach to obtain the sparse estimators for the nonresponse mechanism model. To achieve this, we integrate the semiparametric likelihood framework with the SCAD penalty function (Fan and Li [20]). This shrinkage technique enables the simultaneous identification of IV and estimation of a sparse nonresponse mechanism model. Subsequently, the unbiased estimating equations based on IPW and AIPW methods are constructed, and the profile empirical log-likelihood ratio functions (ELLRFs) are rigorously formulated.
Our primary contributions are threefold. First, we propose a penalized semiparametric likelihood framework that effectively combines the SCAD penalty and the sparse likelihood structure. This approach facilitates simultaneous IV selection and parameter estimation for the nonresponse mechanism model. The resulting sparse estimators exhibit the oracle properties, ensuring both selection consistency and asymptotic efficiency. Second, the flexibility of the EL method enables the proposed estimation procedure to produce confidence regions with natural shapes and orientations. Third, under some regularity conditions, we show that the ELLRFs are asymptotically weighted chi-squared distributed, while the maximum empirical likelihood estimators (MELEs) are asymptotically normally distributed, providing a valid foundation for regression parameter inference.
The rest of this article is organized as follows. In Section 2, we present the penalized semiparametric likelihood methodology and construct two types of unbiased estimating equations. The MELEs and ELLRFs are also introduced. In Section 3, we investigate the oracle properties of the sparse estimators for the nonresponse mechanism model, as well as the asymptotic normality of the MELEs and the asymptotic properties of the proposed ELLRFs. Simulation studies and a real data analysis are conducted to evaluate the finite sample performance of the proposed estimates in Section 4 and Section 5, respectively. The concluding discussions are included in Section 6. Proofs of the asymptotic results are relegated to Appendix A.
2. Methods
2.1. Penalized Semiparametric Likelihood Estimation
Let be the unconditional joint distribution of and Y. Suppose that out of the n individuals respond on both Y and , which results in data . For the rest of the individuals, their Y values are not observed, but their values are always observed. Let represent a missingness indicator of Y, i.e., it takes 1 if Y is observed, and takes 0 otherwise. Suppose that the covariate has two components, such that the nonresponse mechanism can be modeled as
where is an unknown parameter to be estimated, and is a known, strictly monotonic, twice-differentiable function from to . Since model (2) depends explicitly on the potentially unobserved Y when , it describes a nonignorable missingness mechanism, often referred to as NMAR. In this context, the missingness indicator is typically assumed to follow a conditional Bernoulli distribution with probability . Notably, when , the missingness mechanism simplifies to MAR, as the dependence on the unobserved Y is eliminated.
Following Qin et al. [21], the likelihood of based on the complete observations is
which can be rewritten as
where is the unconditional respondent rate. The first term in Equation (3) is the likelihood conditioning on , and the term is the binomial likelihood of . The direct maximization of in Equation (3) may lose some information contained in . To address this limitation, we assume that the auxiliary information on can be characterized as , where is a known r-vector (or scalar) function. To illustrate the rationale underlying the construction of the auxiliary function , consider the case where the population mean of , denoted by , is known. In this setting, one may define to serve as auxiliary information. When the population mean is unavailable, it can be replaced by the estimated mean . Thus, part of the information contained in is recovered through or , thereby enhancing the efficiency of estimation under incomplete data.
By the auxiliary information on and without assuming any specific form of , we can maximize the semiparametric likelihood (3) subject to the constraints
where is the jump of at .
By introducing Lagrange multipliers and profiling for all of the values of , we obtain
where and are Lagrange multipliers as described in Qin and Lawless [22].
Substituting all of the values of into Equation (3), the log-likelihood with respect to and W becomes
The identifiability of the observed likelihood as established by Wang et al. [16] relies on the existence of an IV that satisfies two conditions: (i) can be excluded from the nonresponse mechanism model, i.e., , and (ii) must be related to the study variable Y. Specifically, if the true parameter subvector corresponding to satisfies , then qualifies as a valid IV by design. This critical insight motivates the development of a penalized semiparametric likelihood framework to achieve the sparse estimation of in the nonresponse mechanism model. The penalized likelihood estimator of can be obtained by maximizing the following objective function:
where represents the SCAD penalty function. The first derivative of the penalty term is specified as
for , where is a fixed constant, is a tuning parameter, and . Following Fan and Li [20], we set throughout this study. As demonstrated in Theorem 1, the sparse estimator achieves the oracle properties, ensuring that as . This guarantees the consistent identification of as the IV.
Implementing the optimal procedure for (4) presents a notable challenge due to the involvement of the non-concave penalized function . To enhance numerical stability, we adopt the local quadratic approximation method introduced by Fan and Li [20]. Given the m-th iteration estimate , the penalty function can be approximated quadratically as follows:
This approximation simplifies the non-concave penalty function, thereby improving both the computational tractability and convergence properties of the optimization procedure. In addition to the approximation strategy, selecting an appropriate penalty parameter is crucial for optimizing model performance. To achieve this, we employ the following Bayesian information criterion:
where is the number of nonzero elements in . By minimizing over , the resulting optimal tuning parameter can be obtained.
2.2. Construction of Estimating Equations
For complete data , the WLS estimator can be obtained by solving the following equations:
where .
When Y is subject to NMAR, we introduce the following estimating function based on the IPW approach for the ith individual:
where and is the consistent estimate of .
To improve efficiency, we develop the AIPW-type estimating function with imputation
where . Following Tang et al. [23], the conditional density satisfies
where is the conditional density of given and , and is the conditional odds of nonresponse. Simple algebraic manipulations show that
A nonparametric kernel estimator of can be obtained by
where the weight , and with being a -dimensional kernel function and h representing a bandwidth sequence. Given , a kernel-assisted estimating function for the ith observation is given by
2.3. MELEs of Model Parameters
To fix the notation, we temporarily assume that is known. Let be non-negative weight allocated to with a total mass of 1. Under moment condition , the profile EL ratio for is defined as
By introducing Lagrange multiplier , we have
where satisfies
Therefore, the ELLRF for can be shown to be
Maximizing leads to the MELE of , denoted by . Under some smoothness conditions, can be obtained by simultaneously solving
where denotes the partial derivative with respect to .
In practical applications, since the parameter is typically unknown, the ELLRF in Equation (5) cannot be used directly for inference about . To address this, given , the estimated ELLRF based on the IPW method is
where is the Lagrange multiplier and satisfies
Thus, the IPW-based MELE of , denoted by , can be obtained by maximizing . Similarly, the AIPW-based MELE of , denoted by , can be obtained by maximizing , where with solving the corresponding Lagrange multiplier equations.
Remark 1.
The proposed method is developed under the assumption that the response variable is subject to NMAR. This assumption is commonly adopted in practical applications, particularly in contexts such as longitudinal studies or clinical trials with outcome-dependent dropout. Notably, as demonstrated by the sensitivity analyses by Ding and Tang [24] and Yang and Tang [14], estimation methods based on the NMAR assumption can still perform well when the true missingness mechanism is MAR, suggesting their robustness to MAR data. However, when the data exhibit a mixture of MAR and NMAR mechanisms, such as when different individuals follow distinct missingness patterns, the validity of NMAR-based methods may be compromised unless a hierarchical structure missingness framework is explicitly incorporated as discussed by Morikawa and Kano [25]. Consequently, in real-world data applications, it is crucial to assess the plausibility of the NMAR assumption on a case-by-case basis, as model performance and identifiability may be sensitive to deviations from the assumed missingness mechanism.
3. Main Results
3.1. Asymptotic Properties
The asymptotic properties of the MELEs and ELLRFs require the following assumptions:
- (A1)
- The nonresponse mechanism almost surely and almost surely; in a neighborhood of , , and exists and is bounded by an integrable function.
- (A2)
- The probability density function is bounded away from ∞ in the support of ; the first and second derivatives of are continuous, smooth and bounded; and and are finite.
- (A3)
- is twice continuously differentiable in the neighborhood of .
- (A4)
- The function is continuous with respect to , where lies in a compact set; and exist; has full column rank.
- (A5)
- has full column rank.
- (A6)
- The kernel function is a probability density function such that (a) it is bounded and has a compact support; (b) it is symmetric with ; (c) for some in some closed interval centered at zero; and (d) the bandwidth h satisfies and as .
- (A7)
- As , , and the tuning parameter satisfies as and .
- (A8)
- The penalty function satisfies and , where .
- (A9)
- The moment conditionsandhold for , , and , where with ℵ being the compact set, and is defined in (A1). The notation denotes the k-th component of .
Condition A(1) is similar to that used by Qin et al. [21] and is necessary to establish the asymptotic normality of . Condition A(2) is a standard condition in probability theory. Assumptions A(3)–A(5) are typical in empirical likelihood-based inference with estimating equations. Condition A(6) is a common assumption in the nonparametric literature. Assumptions A(7)–A(9) are required to establish the oracle properties of penalized semiparametric likelihood estimators.
Let denote the true value of , where . As discussed in Fan and Li [20], the SCAD penalty function possesses the oracle properties. These properties ensure that the SCAD penalty not only promotes a sparse model structure but also yields an estimator that is nearly unbiased for large parameter values. We establish the oracle properties and the consistency of in Theorem 1.
Theorem 1.
Under Assumptions A(1) and A(7)–A(9), as , we have
;
;
, where I represents the identity matrix, and is defined in the Appendix A.
From Theorem 1, we establish the stochastic expansion
where the influence function is defined in (A2) of the Appendix. The first part of Theorem 1 demonstrates that by appropriately selecting the tuning parameter , a root-n consistent penalized likelihood estimator can be obtained. Furthermore, Theorem 1 (ii) establishes the sparsity property, ensuring that with probability approaching one. This result confirms that the penalized estimator effectively identifies and selects the IV with high probability. Finally, Theorem 1 (iii) establishes the asymptotic normality of , suggesting that the penalized likelihood method can yield efficient estimates of the nonzero components by effectively reducing the dimensionality of .
Within the framework of the penalized semiparametric likelihood, the asymptotic properties of and are established below. For any vector , let , and convergence in distribution is denoted by the symbol . We first define the key quantities:
Theorem 2.
Suppose that Conditions (A1)–(A9) hold, and are positive definite matrices, is the unique true parameter value of θ, and α is estimated by . Define
Then, as , we have
- (1)
- Asymptotic normality:
- (2)
- Likelihood ratio convergence:where are independent chi-squared variates with 1 degree of freedom, and () are eigenvalues of .
Theorem 2 (1) establishes the asymptotic normality of and , enabling normal approximation (NA)-based inference. Specifically, the -level NA confidence region for is
where is a consistent plug-in estimator of , and denotes the quantile of the chi-squared distribution with p degrees of freedom. Theorem 2 (2) characterizes the ELLRFs, yielding the EL confidence region
where is the quantile of the distribution , and are the eigenvalues of .
3.2. Double Robustness
From Theorem 2, we know that if the model is correctly specified, the proposed estimators are unbiased and consistent under certain regularity conditions. However, verifying these specifications is a challenging task, and the misspecification of may result in biased estimates and reduced efficiency unless additional data assumptions are imposed. To address these limitations, a double robust estimation procedure has been developed in the NMAR settings. Specifically, following Miao and Tchetgen [26] and Liu and Yuan [27], the conditional density function of can be factorized as
where , is the baseline propensity score, is the baseline outcome density, and
is the log of the conditional odds ratio function relating Y and given .
When focusing on the estimation of the response mean, , we have . As demonstrated by Liu and Yuan [27], if is correctly specified, the estimator is unbiased and consistent if either or is correctly specified. Therefore, by selecting an appropriate model of the log odds ratio from a set of candidate models, the proposed estimation procedure is recommended within the EL framework for nonlinear regression. This approach helps mitigate potential biases arising from the misspecification of the missingness mechanism.
Moreover, if both and the moment functions are correctly specified, the proposed estimator remains unbiased and consistent under certain regularity conditions. Following Zhao et al. [28], we show that the moment functions possess the double robustness property when the missingness mechanism, as specified in model (2), is modeled parametrically. The double robustness property is summarized in the following Proposition 1.
Proposition 1.
(1) Regardless of the choice of , has mean zero, provided that the model for is correctly specified. (2) If the true missingness mechanism is a parametric logistic model , where is a smooth function with an unknown parameter vector , and is an arbitrary user-specified function that measures the deviation from the ignorable missing-data mechanism assumption, then the AIPW moment functions still have mean zero, even if the model for is misspecified.
3.3. Dimension Reduction
In many practical applications, the covariate dimension can be large, making it challenging to obtain an appropriate estimator for using kernel-smoothing imputation methods. To address this issue, let S be a continuous function from to such that with . Under this assumption, we have
where . Consequently, the kernel-assisted estimating equations can be constructed as
where is structurally identical to except that is replaced by S. Given , one can obtain a semiparametric dimension reduction EL estimator based on . It is common to assume that the working index involves an unknown parameter vector . If an estimator is available, following the arguments of Hu et al. [29], we can show that the resulting EL estimator based on is asymptotically equivalent to when .
3.4. Asymptotic Variance Estimation
In order to construct confidence regions for the proposed estimators, it is essential to estimate their asymptotic variances and consistently from the sample . To achieve this, we employ the plug-in method in the simulation studies. For instance, the sample-based estimate of is
Other estimates for , , , , and can be obtained in a similar manner. We omit the details here for brevity.
While the plug-in method is effective in NMAR settings, it can be computationally intensive due to the complexity of the asymptotic variances involved. As an alternative, particularly when dealing with large datasets, the bootstrap procedure provides an effective approach for approximating these asymptotic variances. This method, which has been explored in studies such as those of Zhao et al. [30] and Jiang et al. [31] for NMAR data, can help alleviate computational challenges and provide more practical estimations in large-scale applications.
4. Simulation Study
Finite-sample performance of the proposed methods is evaluated through Monte Carlo experiments. For bandwidth selection, we implement the data-driven approaches of Zhou et al. [32] and Tang et al. [23], adopting the rule-of-thumb bandwidth: , where denotes the sample standard deviation of the observed covariate X. This practical bandwidth selector balances asymptotic optimality and computational simplicity.
4.1. Simulation 1
Let , and . The true parameter is set as , and the error term . The covariates are generated under two scenarios: In Model A, and are independently sampled from ; in Model B, while (), allowing us to examine the impact of covariate dependence. We implement a sample size of , with response variables generated following the specification in model (1).
The missing indicator follows the nonignorable mechanism
where . The covariates and serve as instrumental variables. The parameter is estimated using the penalized semiparametric likelihood method, incorporating the following auxiliary information matrix:
where and . We adopt the product Gaussian kernel and set the bandwidth as following Tang et al. [23].
Based on 500 independent replications, the proposed penalized method achieves an average IV selection rate of 92.8% for and , demonstrating its effectiveness. For Model A and Model B, the 95% confidence regions for parameter and their coverage probabilities are calculated based on the EL methods and , as well as the normal approximation approaches and . The simulation results for the confidence regions are displayed in Figure 1.
Figure 1.
Simulation 1 results comparing EL and NA methods. Line specifications: (red solid), (green dotted), (black dash-dot), (blue thick solid).
The left panel of Figure 1 presents the simulated confidence regions for Model A based on the four aforementioned approaches, whereas the right panel displays the corresponding results for Model B. Several notable findings emerge from Figure 1. First, the confidence regions constructed using EL approaches are smaller than those based on NA methods, indicating the superior efficiency of EL-based inference. Second, the EL-based confidence region for is smaller than that for , highlighting the enhanced efficiency of the AIPW estimator relative to the IPW estimator. Third, even in the presence of correlation between covariates in Model B, the EL and NA approaches yield confidence regions similar to those in Model A, implying the stability of these methods. The coverage probabilities for all four approaches are comparable in models A and B, closely aligning with the nominal 95% level. Overall, the EL-based approaches demonstrate superior efficiency relative to the NA-based methods, and the AIPW-based estimation is shown to be more efficient than the IPW-based estimation.
4.2. Simulation 2
We consider the regression model with nonlinear components
where the true parameter vector is . The covariates follow the multivariate normal distribution with covariance matrix . The error terms are independently distributed as .
The nonresponse mechanism follows the nonignorable logistic model
where . The IV is identified through the proposed penalized estimation process. To address high-dimensional challenges, we employ MAR-based propensity score estimation
where denotes the maximum likelihood estimates. This enables the construction of a semiparametric estimator
where represents a univariate kernel density function. For each data generating mechanism, we generate 500 Monte Carlo random samples of sizes 150 and 250.
Table 1 summarizes the finite-sample performance of the proposed method, presenting three key metrics for nonzero components in : empirical bias (Bias), root mean square (RMS) error, and standard deviation (SD). The variable selection outcomes are quantified through two measures: “T” (mean count of correctly excluded irrelevant variables) and “F” (mean count of erroneously excluded significant variables). Table 2 compares the estimation accuracy of regression coefficients between IPW and AIPW approaches, reporting their respective bias, RMS, and SD values.
Table 1.
Simulation results on the estimation performance of in Simulation 2.
Table 2.
Simulation results on the estimation performance of in Simulation 2.
The principal findings emerge as follows:
(1) Variable Selection Efficacy: The penalized semiparametric likelihood method demonstrates robust variable selection capability in the nonresponse mechanism model, effectively distinguishing between relevant and irrelevant covariates.
(2) Estimation Precision: For active components in , the observed minimal bias with closely matched SD and RMS values confirms the estimator’s statistical efficiency.
(3) MELE Performance Consistency: For both and , the SD and RMS are nearly identical, suggesting that the proposed method for MELEs effectively estimates parameters through the penalized parametric likelihood approach.
(4) Sample Size Effects: Enhanced estimation efficiency emerges with larger samples for both the missingness data model and regression model.
5. Application to the ACTG 175 Data
We demonstrate the proposed methodology using data from the AIDS Clinical Trials Group Protocol 175 (ACTG 175) involving 2139 HIV-infected participants (Hammer et al. [33]). Following the established analytical approaches of Davidian et al. [34], Tsiatis et al. [35], and Han [8], we classify treatments into two groups: zidovudine (ZDV) monotherapy (532 subjects) versus combined therapies (1607 subjects). The analysis focuses on CD4 counts at weeks post-baseline () as the primary endpoint, with the following covariates:
- Treatment assignment (: 0 = ZDV monotherapy)
- Baseline CD4 count (: )
- Demographic covariates: age (), weight (), race (: 0 = White), gender (: 0 = Female)
- Clinical covariates: antiretroviral history (: 0 = naive), early treatment termination (: 0 = completed)
The binary indicator variable r encodes the missingness status of the response Y, where indicates an observed outcome, and denotes a missing value. Previous studies of Davidian et al. [34], Tsiatis et al. [35], and Han [8] assumed that the missingness mechanism depends solely on covariates through a MAR framework. Our penalized semiparametric likelihood approach enhances robustness by incorporating shrinkage estimation within the nonresponse mechanism model. Specifically, shrinkage of the response variable coefficient toward zero provides formal evidence supporting the MAR assumption, while a nonzero estimate suggests NMAR.
To facilitate direct comparison with Han [8], we specialize the general model (1) to a linear regression framework
where – represent the baseline covariates defined previously. The nonresponse mechanism is parameterized via logistic regression
with parameter vector . To address dimensionality challenges, we implement the regularization strategy detailed in Section 4.2, constructing consistent estimators for through MAR-based nonresponse mechanism weighting.
The penalized semiparametric likelihood estimates are presented in Table 3, with p-values calculated using 200 bootstrap replications (Efron and Tibshirani [36]). The weight () and age () show nonsignificant contributions to the nonresponse mechanism, as their coefficients are shrunk to zero with p-values exceeding 0.1. The significant coefficient for CD4 counts at weeks () indicates an NMAR in this dataset.
Table 3.
Estimation of response model parameters .
Table 4 presents the analysis results for model (6), with standard errors estimated through 200 bootstrap replications. The comparative results from the complete-case analysis and Han’s multiply robust method, as described by Han [8], are also included. The nonsignificant predictors include age, weight, and gender. The analysis reveals five critical clinical insights:
Table 4.
Results of the analysis on the ACTG 175 data.
(1) Treatment Superiority: Combination antiretroviral therapies (Trt = 1) demonstrate significantly higher CD4 counts at weeks compared to ZDV monotherapy, establishing the enhanced therapeutic effectiveness of newer regimens.
(2) Baseline Predictive Power: Baseline CD4 counts (CD40) show significant positive association with follow-up counts, confirming their prognostic value in HIV management.
(3) Racial Disparity: White patients maintain clinically significant CD4 count advantage over nonwhite counterparts, suggesting differential disease progression trajectories.
(4) Treatment History Impact: Antiretroviral-experienced patients exhibit substantially reduced CD4 counts compared to naive patients, indicating potential cumulative treatment effects.
(5) Adherence Consequences: Early treatment discontinuation associates with marked CD4 count reduction, underscoring the critical importance of sustained therapeutic engagement.
6. Conclusions and Future Work
We developed a penalized semiparametric likelihood approach that resolves the identification challenges in nonignorable missing data analysis. The proposed estimator achieves the oracle properties under appropriate tuning parameter selection as established in our theoretical framework. The construction of profile EL ratio functions incorporated IPW and AIPW estimating equations. Our analysis demonstrated that when using consistently estimated nonresponse mechanism parameters, the ELLRFs follow an asymptotic weighted distribution. Furthermore, we systematically established the asymptotic normality of regression parameter estimators. Simulation studies and real-data applications confirmed the method’s practical effectiveness in both parameter estimation and variable selection. Comparative analyses revealed superior performance over existing approaches in handling nonignorable nonresponse data.
In practical applications, nonlinear regression models often involve high-dimensional covariates, which can lead to sparsity within the model. The direct application of the proposed estimation procedure in such contexts may lead to biased estimates. One potential approach to address this challenge is the application of the penalized EL method, as studied by Ren and Zhang [37], for model selection. It could effectively balance the model complexity and goodness of fit, thereby reducing the bias induced by high-dimensional covariates. This extension requires a systematic and separate investigation within the NMAR framework. A detailed exploration of this important issue will be undertaken in future research.
Author Contributions
Conceptualization, X.D. and X.L.; methodology, X.D. and X.L.; validation, X.D.; formal analysis, X.L.; investigation, X.D.; writing— original draft, X.D.; writing—review and editing, X.D. and X.L.; supervision, X.L.; project administration, X.D.; funding acquisition, X.D. and X.L. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (Grant Numbers 12426666 and 12426668), Zhongwu Young Teachers Program for the Innovative Talents of Jiangsu University of Technology, and Doctoral Research Project of Yuncheng University (YQ-2023074).
Data Availability Statement
The real data that are used to illustrate the proposed methods are available at https://github.com/dingxianwen-dxw/ACTG175 (accessed on 22 March 2025).
Acknowledgments
The authors wish to thank the Editor-in-Chief, the Associate Editor and two reviewers for their many helpful and insightful comments and suggestions that greatly improved the paper.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
To establish the proofs for Theorem 1, we first introduce some essential notations and supporting lemmas.
The log-likelihood (after profiling ) can be rewritten as
where with and . The components of the log-likelihood function are defined as follows:
where the function is given by
Following the arguments presented by Qin et al. [21], it can be shown that
It is worthwhile to note that .
Lemma A1.
Assume that , where for some constant . Then, for any and , we have
Proof of Lemma A1.
By Markov’s inequality, we obtain
Then, applying the Cauchy–Schwarz inequality, we have
This completes the proof. □
Lemma A2.
Under condition A(1), let denote the parameter vector, and let its true value be . Define and assume is a positive definite matrix. Then, for all , we have
Proof of Lemma A2.
We begin by considering the first part of Lemma A2. By Lemma A1, applying the Taylor series expansion to Equation (A1) yields , which establishes the desired result. The second part of Lemma A2 follows directly from Owen [13]. □
Proof of Theorem 1.
We begin by considering the first part of Theorem 1. By noting that , we have . Let . Following the arguments of Fan and Li [20], it is necessary to show that for any given , there exists a sufficiently large constant C such that
This result implies the existence of a local maximum of in the ball .
From the condition , we have
where k is the number of components of . Taking the Taylor series expansion of around yields
Following the results of Lemma A2, we have
where , and . Because is the log binomial likelihood, we have
It follows from the Taylor expansion that
Note that
When is chosen to be large enough such that dominates the other terms , and , and taking into account the negative term , we conclude that may be negative. Thus, the first part of Theorem 1 holds.
Now, we proceed to prove Theorem 1 (ii).
By Lemma A1, we have . Taking the Taylor series expansion of the first partial derivative of at yields
Let denote the column vector of matrix . It follows from Assumptions A(7)–A(9) and Lemma A2 that
Furthermore, by Assumption A(9), we have
So we obtain , which implies that the sign of is dominated by the sign of . Thus, for any and , we have when and when with probability tending to one. This result implies that with probability tending to one. Therefore, Theorem 1 (ii) holds.
Now we proceed to prove Theorem 1 (iii).
For simplicity of notation, we temporarily denote . Let denote the d× d identity matrix, where and with being the cardinality of . Let
where is another Lagrange multiplier vector. The penalized likelihood can be rewritten as . Let
where , , and . Thus, and satisfy for . Let and , we obtain
Let and . Taking the Taylor expansion of () at yields
Define the matrix Q as follows:
where , , , and
Additionally, let . Then, we have
Let . By applying the block matrix inversion formula, we obtain . Define as the appropriate submatrix of the matrix . Then, we have
By invoking the Lindeberg–Feller conditions, we conclude that , where . □
Lemma A3.
Suppose Conditions A(1)–A(9) hold; if α is estimated by the penalized likelihood method, , then as , we have
Proof of Lemma A3.
We begin by proving part (1). Expanding at using a Taylor series gives
We observe that
Thus, we obtain
By direct calculation, we obtain
We note that
where is defined in Assumption A(4). This completes the proof of Lemma A3 (1).
Now, we proceed to prove the second part of Lemma A3.
We observe that
where .
Through direct computation, we obtain
By leveraging the consistency property of the kernel regression estimator, we establish that . Define . Additionally, let
Using the kernel regression method, we obtain the following estimators:
Consequently, we have
Let and . For notation simplicity, we temporarily denote , and . By performing a further decomposition of , we obtain
For , we have
By applying standard arguments, we obtain for . For , we have
Standard arguments can also be employed to conclude that for . Combining the above results, we obtain and .
Next, we consider . A straightforward calculation yields
On the other hand,
The third equality holds because
which results in
Then, for , we have Furthermore, we have
which is equivalent to
The second and third parts of Lemma A3 (2) can be proved using similar arguments as those in the proof of the corresponding parts in Lemma A3 (1). Thus, the proof of Lemma A3 is complete. □
Proof of Theorem 2.
We begin by considering the first part of Theorem 2. Let and be the solutions to the following equations:
Taking the Taylor series expansion of and at , we obtain
where .
Through direct calculation, we obtain
Then, we have
where
From Lemma A3, we obtain the following convergence result for :
Additionally, from Lemma A3, we have , which implies that . Thus, we obtain
Therefore, we have . Following the same procedure as outlined above, we can also establish that
We now consider the second part of Theorem 2. Using the same argument as in Tang et al. [23], we obtain
where . Applying Lemma A3, we obtain the desired result. The asymptotic distribution of can be derived by following the same reasoning as in the proof of . This completes the proof of Theorem 2. □
References
- Jennrich, R.I. Asymptotic properties of non-linear least squares estimators. Ann. Math. Stat. 1969, 40, 633–643. [Google Scholar] [CrossRef]
- Wu, C.F. Asymptotic theory of nonlinear least squares estimation. Ann. Stat. 1981, 9, 501–513. [Google Scholar] [CrossRef]
- Fekedulegn, D.; Mac Siurtain, M.P.; Colbert, J.J. Parameter estimation of nonlinear growth models in forestry. Silva Fenn 1999, 33, 327–336. [Google Scholar] [CrossRef]
- Ivanov, A.V. Asymptotic Theory of Nonlinear Regression; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997. [Google Scholar]
- Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: New York, NY, USA, 2019. [Google Scholar]
- Horvitz, D.G.; Thompson, D.J. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 1952, 47, 663–685. [Google Scholar] [CrossRef]
- Robins, J.M.; Rotnitzky, A.; Zhao, L. Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 1994, 89, 846–866. [Google Scholar] [CrossRef]
- Han, P. Multiply robust estimation in regression analysis with missing data. J. Am. Stat. Assoc. 2014, 109, 1159–1173. [Google Scholar] [CrossRef]
- Xue, L.; Xie, J. Efficient robust estimation for single-index mixed effects models with missing observations. Stat. Pap. 2024, 65, 827–864. [Google Scholar] [CrossRef]
- Sharghi, S.; Stoll, K.; Ning, W. Statistical inferences for missing response problems based on modified empirical likelihood. Stat. Pap. 2024, 65, 4079–4120. [Google Scholar] [CrossRef]
- Li, W.; Luo, S.; Xu, W. Calibrated regression estimation using empirical likelihood under data fusion. Comput. Stat. Data Anal. 2024, 190, 107871. [Google Scholar] [CrossRef]
- Tang, N.; Zhao, P. Empirical likelihood-based inference in nonlinear regression models with missing responses at random. Statistics 2013, 47, 1141–1159. [Google Scholar] [CrossRef]
- Owen, A.B. Empirical likelihood ratio confidence regions. Ann. Stat. 1990, 18, 90–120. [Google Scholar] [CrossRef]
- Yang, Z.; Tang, N. Empirical likelihood for nonlinear regression models with nonignorable missing responses. Can. J. Stat. 2020, 48, 386–416. [Google Scholar] [CrossRef]
- Wang, S.; Shao, J.; Kim, J.K. An instrumental variable approach for identification and estimation with nonignorable nonresponse. Stat. Sin. 2014, 24, 1097–1116. [Google Scholar] [CrossRef]
- Wang, L.; Shao, J.; Fang, F. Propensity model selection with nonignorable nonresponse and instrument variable. Stat. Sin. 2021, 31, 647–672. [Google Scholar] [CrossRef]
- Chen, J.; Shao, J.; Fang, F. Instrument search in pseudo-likelihood approach for nonignorable nonresponse. Ann. Inst. Stat. Math. 2021, 73, 519–533. [Google Scholar] [CrossRef]
- Du, J.; Li, Y.; Cui, X. Identification and estimation of generalized additive partial linear models with nonignorable missing response. Commun. Math. Stat. 2024, 12, 113–156. [Google Scholar] [CrossRef]
- Beppu, K.; Morikawa, K. Verifiable identification condition for nonignorable nonresponse data with categorical instrumental variables. Stat. Theory Relat. Fields. 2024, 8, 40–50. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Qin, J.; Leung, D.; Shao, J. Estimation with survey data under nonignorable nonresponse or informative sampling. J. Am. Stat. Assoc. 2002, 97, 193–200. [Google Scholar] [CrossRef]
- Qin, J.; Lawless, J.F. Empirical likelihood and general estimating equations. Ann. Stat. 1994, 22, 300–325. [Google Scholar] [CrossRef]
- Tang, N.; Zhao, P.; Zhu, H. Empirical likelihood for estimating equations with nonignorably missing data. Stat. Sin. 2014, 24, 723–747. [Google Scholar] [CrossRef] [PubMed]
- Ding, X.; Tang, N. Adjusted empirical likelihood estimation of distribution function and quantile with nonignorable missing data. J. Syst. Sci. Complex. 2018, 31, 820–840. [Google Scholar] [CrossRef]
- Morikawa, K.; Kano, Y. Statistical inference with different missing-data mechanisms. arXiv 2014, arXiv:1407.4971. [Google Scholar]
- Miao, W.; Tchetgen, E.J. On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika 2016, 103, 475–482. [Google Scholar] [CrossRef]
- Liu, T.; Yuan, X. Doubly robust augmented-estimating-equations estimation with nonignorable nonresponse data. Stat. Pap. 2020, 61, 2241–2270. [Google Scholar] [CrossRef]
- Zhao, P.; Tang, N.; Zhu, H. Generalized empirical likelihood inferences for nonsmooth moment functions with nonignorable missing values. Stat. Sin. 2020, 30, 217–249. [Google Scholar]
- Hu, Z.; Follmann, D.A.; Qin, J. Semiparametric dimension reduction estimation for mean response with missing data. Biometrika 2010, 97, 305–319. [Google Scholar] [CrossRef]
- Zhao, P.; Tang, N.; Qu, A.; Jiang, D. Semiparametric estimating equations inference with nonignorable missing data. Stat. Sin. 2017, 27, 89–113. [Google Scholar]
- Jiang, D.; Zhao, P.; Tang, N. A propensity score adjustment method for regression models with nonignorable missing covariates. Comput. Stat. Data Anal. 2016, 94, 98–119. [Google Scholar] [CrossRef]
- Zhou, Y.; Wan, A.T.K.; Wang, X. Estimating equations inference with missing data. J. Am. Stat. Assoc. 2008, 103, 1187–1199. [Google Scholar] [CrossRef]
- Hammer, S.M.; Katzenstein, D.A.; Hughes, M.D.; Gundacker, H.; Schooley, R.T.; Haubrich, R.H.; Henry, W.K.; Lederman, M.M.; Phair, J.P.; Niu, M.; et al. A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. N. Engl. J. Med. 1996, 335, 1081–1090. [Google Scholar] [CrossRef] [PubMed]
- Davidian, M.; Tsiatis, A.A.; Leon, S. Semiparametric estimation of treatment effect in a pretest–posttest study with missing data. Statist. Sci. 2005, 20, 261–301. [Google Scholar] [CrossRef] [PubMed]
- Tsiatis, A.A.; Davidian, M.; Zhang, M.; Lu, X. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Stat. Med. 2008, 27, 4658–4677. [Google Scholar] [CrossRef]
- Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
- Ren, Y.; Zhang, X. Variable selection using penalized empirical likelihood. Sci. China Math. 2011, 54, 1829–1845. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).