Estimating the Complier Average Causal Effect with Non-Ignorable Missing Outcomes Using Likelihood Analysis

.


Introduction
Non-compliance and missing data are phenomena which usually occur in studies of economics, medicine, and public health.Non-compliance arises when certain participants do not adhere to their prescribed treatments, while the problem of missing data arises when the study's researchers are unable to gather information for some participants [1].These unmeasured confounding variables may complicate the inference of causal effects if the missing mechanism is non-ignorable [2,3].The missingness is named ignorable if it depends on the observed data only; otherwise, it is named non-ignorable [4,5].Identifying the complier average causal effect becomes challenging in the presence of both non-compliance and non-ignorable missing values, as it is impossible to identify the full data distribution or the causal effect without additional assumptions.
To address non-compliance, one can assume that the response in the dataset follows a specific distribution, such as exponential families, and use maximum likelihood estimation.The authors in [6][7][8] presented the estimation of the complier average causal effect (CACE) with the maximum likelihood estimation method with the EM algorithm [9].The maximum likelihood method provides the advantage of relaxing the exclusion restrictions, which can often be unrealistic, particularly in natural experiment scenarios [10].Another approach to addressing non-compliance is the instrumental variable method [11].Utilizing instrumental variables is a viable method when stringent assumptions about the response distribution could result in mis-specification.
The shadow variable strategy is a mainstream solution for establishing identifiability in cases of non-ignorable missingness [12].Analogous approaches also employ instrumental variables, as suggested by [13].Nonetheless, the selection of appropriate instrumental or shadow variables poses challenges, particularly among a multitude of covariates [14].The authors of [15] illustrate that stronger assumptions regarding the response mechanism enable the establishment of identifiability derived from the distribution of the observed data.When the response adheres to exponential families, as noted by [16,17], the identifiability under non-ignorable missingness is readily attainable without the use of instrumental variables.
It is critically important to integrate causal inference with research on non-ignorable missing data [18,19].The authors of [20][21][22] introduced estimators for the complier average causal effect parameters, assuming that the missing data are ignorable.However, if the missingness is non-ignorable, these methods may yield biased estimators.The recent literature has introduced strategies for addressing missing covariates or outcomes in the context of non-ignorable missingness, which includes but is not limited to the following papers.The study in [1] initially explored the identifiability of parameters in randomized clinical trials characterized by non-compliance and non-ignorable missing binary outcome variables.The study in [2] examined semi-parametric identifiability and developed an estimation method for the complier average causal effect in randomized clinical trials, addressing the challenge of non-ignorable missingness in continuous outcomes.The study in [23] formulated a method using a shadow variable to identify and estimate the complier average causal effect parameters arising from outcomes that represent non-ignorable missingness.The study in [19] tackled the challenge of non-ignorable missingness confounders in causal analysis from observational data, demonstrating that causal effects are ascertainable in scenarios wherein the missingness mechanism is outcome-independent, conditional on the treatment and potential missing confounders.The study in [24] developed semi-parametric estimators to determine the average causal effect, accounting for non-ignorable missing confounders, based on the premise that the missingness of data is independent of the outcome.The study in [25] addressed the problem of identifying the treatment benefit rate and treatment harm rate in scenarios wherein treatment, endpoints, or covariates are missing.The study in [26] investigates a treatment-independent missingness assumption, which facilitates the identification of causal effects in situations wherein confounders represent non-ignorable missingness.
In the existing literature, limited attention has been given to situations wherein a general missing mechanism model is selected to address non-ignorable missingness.The study in [2] utilized an outcome-dependent non-ignorable missingness model without incorporating an auxiliary variable.The study in [3] proposed using an auxiliary variable that serves as both a shadow variable for non-ignorable missingness and an instrumental variable for causal effects.However, the models used in the above approach assume the absence of covariates, despite their common presence in practical applied research.In this paper, we examine the complier average causal effect parameters under non-ignorable missingness by including covariates to mitigate the sensitivity to the violation of specific identification assumptions.The missing data mechanism is assumed to follow a logistic model, wherein the absence of the outcome is explained by the outcome itself, the treatment received, and the covariates.We establish the identifiability of the models under mild conditions by assuming that the outcome follows a normal distribution.We develop a computational method to estimate model parameters through a two-step likelihood estimation approach, employing subgroup analysis.The bootstrap method is employed for variance estimation, and the effectiveness of our approach is confirmed through simulation.The proposed method is applied to analyze the household income dataset from the Chinese Household Income Project Survey 2013.
The rest of this article is organized as follows.Section 2 introduces the general modeling framework, with notation and assumptions.In Section 3, we give the theoretical results on the identifiability of the parameters and the estimation approach.The performance of the proposed method is evaluated through simulation studies in Section 4. The application to CHIP data is presented in Section 5. Concluding remarks are provided in Section 6.

Notation and Assumptions
Let Y i denote the individual outcome and let X i represent the individual covariate vector, where Y i ∈ R, X ≡ (1, X 1,i , . . ., X p,i ) ⊤ ∈ R p+1 .Z i denotes the randomized treatment assignment for the ith unit in the study, where i ranges from 1 to n.We assign Z i = 1 if the ith individual is allocated to the treatment group and Z i = 0 if it is assigned to the control group.Moreover, D i represents the treatment received by the ith individual, with D i = 1 indicating treatment and D i = 0 indicating no treatment.D(z) and Y(z) denote the potential treatment received and the potential outcome under the assigned treatment Z = z, respectively.We define R(z) as the binary response indicator for Y(z), where R(z) = 1 if Y(z) is observed and R(z) = 0 if Y(z) is missing.Additionally, we define Y(z, d) as representing the potential outcome under the assigned treatment Z = z and actual treatment D = d.
Following [6], we define U i as the compliance status of the ith patient, which is determined as follows: where the potential intermediate outcomes c, n, a, and d stand for complier, never-taker, always-taker, and defier, respectively.The complier average causal effect gives the predictors X = x, which equals Next, we provide the necessary assumptions to ensure the identifiability of CACE(x) under the non-ignorable missingness.These are formalized by the following assumptions [1,11]: Assumption 2 (Randomization).The treatment assignment Z is randomization.

Assumption 3 (Exclusion restrictions)
Assumption 4 (First-stage).Given X = x, there is a non-zero average causal effect of Z on D, i.e., E{D Assumption 8 (Subgroup).U is independent of variable X.
Assumption 9 (Normal).The conditional density of the outcome variable Y has the following normal form: zu where u ∈ {c, n, a} is the compliance status and 0), indicating that the observed outcome equals the potential outcome evaluated at the observed treatment value, and the observed treatment equals the potential treatment evaluated at the assigned treatment [27].This assumption is typically reasonable when dealing with randomly sampled units.Assumption 2 [1] means that Z is as good as randomly assigned.Assumption 3 asserts that, given X, Y(d) = Y(1, d) = Y(0, d).This concept encapsulates the fundamental principle of instrumental variable procedures, indicating that any influence of Z on Y must be channeled through an impact of Z on D [28].Under Assumption 3, with X fixed, the conditional density of Y remains independent of Z for never-takers and always-takers.Assumption 4 stipulates that, for every stratum defined by X = x, ω c is strictly positive.Assumption 5 asserts the absence of defiers within the population.Assumptions 1-5 are standard assumptions in causal effect models.
Assumption 6 [2] is credible in a double-blinded clinical trial, as patients are unaware of the treatment assigned to them, and the treatment assignment is performed through randomization.Assumption 7 suggests that the absence of an outcome variable is accounted for by the outcome itself, the treatment received, and the covariates.It implies that the expectation of R(z) given Y(z) = y, D(z) = d, U = u, and X = x is equal to the expectation of observing R given Y = y, D = d, and X = x.The missing data mechanism is assumed to follow a logistic model, expressed as Assumption 8 incorporates the independence between X and U into the computational method for estimating model parameters.This inclusion enhances the robustness and validity of the estimation procedure, particularly through a two-step maximum likelihood estimation approach using subgroup analysis.The purpose of Assumption 9 is to establish the identifiability of the models under mild conditions, facilitating the creation of a computational method to estimate model parameters.

Identifiability and Estimation
In this section, we initially construct a model for non-compliant data and missing data, outlining the steps to solve the model.Subsequently, we theoretically establish the identifiability of the model involved in each step.Given that the second step entails a system of non-linear equations and integral operations, which can be challenging to handle analytically, we present a numerical calculation method to address this complexity.
We introduce a convenient two-stage estimation process based on the random sample (R i , D i , Z i , X i , Y i ), i = 1, . . ., N. Given X = x, the joint density function with respect to variables (y, z, u) can be expressed as follows: Assumptions 2 and 8 ensure the validity of the equation above.
denoted as f (y|x; θ a ), and f (y|x; θ 1n ) = f (y|x; θ 0n ), and denoted as f (y|x; [2], the full likelihood for (η, α, ϕ, θ) can be expressed as where It is evident from (2) that the maximum likelihood estimator for η is ( ξ, ω a , ω n ) = (n 1 /n, n 01 /n 1 , n 10 /n 0 ), which is equivalent to the moment estimator, thus ensuring its identifiability.Next, we propose a two-stage estimation process (α d , ϕ d , θ zu ) and establish the identifiability of the models involved in these two steps.
Non-ignorable missingness in Y presents challenges to the identifiability of the observed likelihood function ( 4) and ( 5), as emphasized by [12].The next theorem shows that the parameters are all identifiable under mild conditions.Theorem 1. Suppose there exists one continuous covariate; if the sign of some element of (α d , ϕ ⊤ d ) ⊤ is known, then the vector of parameters (α d , ϕ d , θ zu ) is identifiable, where d ∈ {0, 1} and zu ∈ {1c, 0c, n, a}.
The sign of some element of (α d , ϕ ⊤ d ) ⊤ is easy to verify.According to [30], factors such as respondents' cognitive level, motivation, and social status influence non-response probability.Leveraging this insight, we can speculate on the trend of non-response probability and infer the sign of the parameters in the missing mechanism model.For example, in a household income survey, low-income individuals may be less inclined to disclose their true income, implying α > 0.
Given the similarity between the methods for estimating (α 0 , ϕ 0 , θ n ) and θ 0c and those for estimating α 1 , ϕ 1 , θ a and θ 1c , we will delineate the estimation methods for estimating (α 1 , ϕ 1 , θ a ) and θ 1c separately in two steps.The likelihood function computed on alwaystakers with (Z i , D i ) = (1, 0) is (4).Thus, the maximization problem in (4) with respect to parameters (α 1 , ϕ 1 , θ a ) is equivalent to finding α 1 , ϕ 1 , and θ a to maximize with where θ a = (β ⊤ a , σ 2 a ) ⊤ .For convenience, we denote π(y, X i ; α 1 , ϕ 1 ) by π 1,i .The score equations can be derived by means of taking the derivatives of ( 16) with respect to α 1 , ϕ 1 and θ a , yielding where However, computing the integral involved in ( 17) is challenging.To circumvent this difficulty, we propose utilizing Monte Carlo approximation.Given θ a , we can straightforwardly sample Y directly from the conditional distribution with density function p(y|X i ; θ a ).Let Y i1 , Y i2 , . . ., Y im denote a sample of size m.For convenience, we denote As stated in [17], the introduced additional variability becomes ignorable when the resample size m is sufficiently large; for example, m = O(n 1+ϵ ), where ϵ is a small positive constant, so we use ≈ here.We record the parameters of t iterations as (α t 1 , ϕ t 1 , θ t a ) and then obtain the estimated value of (α t+1 1 , ϕ t+1 1 , θ t+1 a ) times as follows: The above formula is iterated through until the following is satisfied: where tol is a given constant that can be arbitrarily small.Ultimately, we obtain ( α 1 , ϕ 1 , θ a ) for the parameters (α 1 , ϕ 1 , θ a ).In the same manner, we can obtain ( α 0 , ϕ 0 , θ n ), where θ a = In practice, the above computation procedure is straightforward to implement, as it relies on the use of the empirical distribution.Moreover, it is attractive because the introduced additional variability becomes ignorable when the resample size m is sufficiently large; for instance, m = O(n 1+ϵ ), where ϵ is a small positive constant.
According to the assumption of Condition 7, the absence of Y is unrelated to Z.Then, we can focus on the individuals having (Z i , D i ) = (1, 1) and, using the inverse probability weighting method, the estimated equation is as follows: We can readily obtain that the estimators for β M and σ 2 M are β M and σ 2 M from the above estimation equation, and considering the numerical characteristics of the mixed normal distribution, we have By solving the two equations above, we can derive estimators for β 1c and σ 2 1c , denoted as β 1c and σ 2 1c .In the same manner, we can obtain θ 0c .Here, , we arrive at the complier average causal effect given the predictors X = x as Considering the variance of the mixed normal distribution, the estimator for the corresponding variance In practice, we can utilize the bootstrap method to approximate the sampling variance of the estimator of CACE.
The Monte Carlo samples have a size of n = 3000.Tables 1 and 2 report the empirical bias, standard deviation obtained by means of non-parametric bootstrap with B = 1000 replications, and coverage of 95% Wald-type confidence intervals.The numerical results of Table 1 demonstrate a highly accurate estimation of η, the missing mechanism models parameters, the always-taker parameters, and the never-taker parameters, which means that the proposed method can still accurately estimate the parameters in the causal model under the non-ignorable missingness.The numerical results in Table 2 demonstrate a highly accurate estimation of the complier parameters and CACE components.However, there is some under-coverage in the Wald confidence intervals for β 1c,0 and β c,0 .The estimators for η exhibit significantly smaller bias and variance compared to other parameters, attributed to the larger sample size used for their estimation.Conversely, the estimators for the always-taker and never-taker parameters show similar bias and variance due to their comparable missing rates.The subpar estimation results for the compliers parameters arise from various factors, including the numerical characteristics of the normal distribution, which we aim to optimize in future iterations.

Real Data Analysis
The CHIP study monitors income distribution and economic factors among rural, rural-to-urban migrant, and urban households in China [31].The purpose of this study is to investigate whether there is a significant change in the return rate of education when the male population transitions from rural to migrant status.The survey captures details such as employment status, education level, age, income, and other relevant information.Data collection involves systematic random sampling from various regions to ensure geographic representation.The survey includes data from cities and towns in fifteen provinces, representing different regions across the country.These provinces include Liaoning, Shanxi, Jiangsu, Shandong, Guangdong, Anhui, Henan, Sichuan, Hunan, Hubei, Gansu, Xinjiang, Yunnan, Beijing, and Chongqing, covering the north, eastern coastal areas, interior regions, and western regions of China.Each respondent's actual annual income serves as a proxy for the individual's earnings E, utilized in the response of the Mincer earnings function [32].We compute the number of years since leaving school as age − years of schooling − 6, as Chinese children typically start school at 7 years old.The years since the onset of labor market experience are calculated as age − 16, as Chinese individuals who have reached the age of 16 can legally participate in the labor market.
The rural population's migration for work primarily depends on two factors: the economic development status of the region and the distance between their home and work.If the region's economic development is favorable, migration may be less likely.Conversely, shorter distances between the home and work place increase the likelihood of migration.Hence, leveraging Chinese administrative division data, we assign Z i = 1 if the individual's household head's registered residence is Beijing, Shanxi, Guangdong, Hubei Chongqing, or Liaoning , and Z i = 0 otherwise.The sample sizes of the observed data are shown in Table 3.In the 2013 wave of the study, the rural and migrant sub-samples accounted for 62.97% and 4.54% of the dataset, respectively, including urban individuals.However, in China in 2013, the percentages of rural and migrant households were of 45.77% and 13.30%, respectively [33].To align with this distribution, we adjust the sample weights.Specifically, each migrant respondent is weighted four times as much as a rural one, since (62.97/4.54)/(45.77/13.30)≈ 4.  The Mincer earnings function is a single-equation model that explains wage income as a function of education and work experience, which can be expressed as follows: In the specification, E represents earnings (in CNY), S indicates years of schooling, and Exper stands for potential work experience.ε is an unobserved normal random error with mean 0 and variance σ 2 > 0. Without considering the cost of schooling, the returns to education are given by ∂ log (1 + E)/∂S = β 1 .The missing data mechanism is modeled using the following model: Using the Assumption 9 described in Section 2 in this setting, model (19) implies a normal distribution of the outcome Y = log(1 + E): where x = (s, exper) ⊤ is the realization of the covariates vector X = (S, Exper) ⊤ .Thus, the complier average causal effect which captures the causal effect of migrant work on the earnings of compliers is estimated by CACE s, exper; Specifically, the coefficient β c,1 gives the estimated causal effect of migrant work on the returns to education of a complier.The analysis of the 2013 CHIP data conducted in R version 4.3.2,which invented in August 1993 by statisticians Ross Ihaka and Robert Jetman of the University of Auckland, New Zealand.Utilizing the estimation methods outlined in Section 3 of this paper, yielded results presented in Tables 4 and 5.The results from Table 4 show that log-income has a significant positive effect on the probability of missingness, as evidenced by α 1 = 0.4130 and α 0 = 0.5284.Table 5 illustrates that the estimated returns to education are β 1c,1 = 2.81% for migrant compliers and β 0c,1 = −0.85%for rural compliers.The difference, β c,1 = 3.66%, indicates that migrant work enhances the returns to education for compliers.Additionally, based on the observation that β 0 > 0, we infer that the initial wages of migrant compliers are notably higher than those of rural compliers.This suggests that migrant work also contributes to higher initial wages for individuals.In conclusion, the estimated returns to education vary significantly among different target groups.However, it is noteworthy that the estimated returns to education of migrant workers and rural residents are consistently lower than those of urban residents across the board, with an average difference of 8.4%, as documented by [34].This finding scientifically validates the social and economic significance of migrant work from a human capital perspective, offering a basis for decision-making aimed at enhancing the condition of Chinese rural labor.

Conclusions
In this study, the challenge of identifying and estimating complier average causal effect parameters under non-ignorable missingness is tackled by increasing covariates to mitigate the sensitivity to the violation of specific identification assumptions.The missing data mechanism is assumed to follow a logistic model, wherein the absence of the outcome is explained by the outcome itself, the treatment received and the covariates, giving it an advantage over the assumptions proposed by [2,3].The identifiability of the models is established under mild conditions by assuming that the outcome follows a normal distribution.A computational method is developed to estimate model parameters through a two-step likelihood estimation approach, utilizing subgroup analysis.
Some studies in the literature discuss the consistency of parameter estimation in the presence of non-ignorable missing and non-compliant data.The authors of [23] demonstrated that parameter estimators are consistent under the assumption of a correct missing mechanism model.The authors of [26] obtained the consistency of parameters of interest even when confounders are missing not at random.The authors of [3] established the asymptotic results of the estimators when the missing outcome depends only on itself.Estimating the parameters of interest based on subgroup analysis poses significant challenges when the absence of the outcome is explained by the outcome itself, the treatment received, and the covariates.We plan to address this aspect in future research endeavors to enhance our methodologies.
There are many directions worthy of further research.A possible extension in this research area involves utilizing instrumental variables to transform the identifiability of the observation likelihood into the identifiability of the parameters of interest.Indeed, we propose a relaxation of the exclusion restriction based on likelihood analysis, resulting in a parametric model characterized by mixtures of distributions.Furthermore, we can adopt a semi-parametric model for theoretical modeling, incorporating more sophisticated structures into the missing mechanism models and regression models.
Author Contributions: J.D.: methodology, software, writing-original draft preparation; G.W.: data curation, software, validation; X.L.: methodology, writing-review and editing.All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Social Science Foundation of Guangxi Province in China, grant number 23BTJ001.

Table 1 .
Empirical bias (Bias), bootstrap standard deviation (Std.dev.) and 95% Wald coverage probability (95% Cover) of the estimators for η, the missing mechanism models parameters, the always-taker parameters, and the never-taker parameters.

Table 2 .
Empirical bias (Bias), bootstrap standard deviation (Std.dev.) and 95% Wald coverage probability (95% Cover) of the estimators for the compliers parameters and the CACE components.

Table 3 .
Sample sizes of different groups.The initials in brackets identify never-takers (n), alwaystakers (a), and compliers (c).

Table 4 .
Parameter estimators, bootstrap standard deviation (Std.dev.), and empirical 95% Wald confidence intervals (95% CI) along with the estimators for η, the missing mechanism model parameters, the always-taker parameters, and the never-taker parameters.

Table 5 .
Parameter estimators, bootstrap standard deviation (Std.dev.), and empirical 95% Wald confidence intervals (95% CI) along with the estimators for the complier parameters and the CACE components.