Next Article in Journal
Dirichlet–Kernel Methods for Geometric Conditional Quantiles: Bahadur Expansions and Boundary Adaptivity on the d-Simplex
Previous Article in Journal
Sharpness Estimation of Hankel Determinants and Logarithmic Coefficients for a Family of Analytic Functions Related to a Lung-Shaped Domain
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Network Causal Effects with Misclassified Outcomes: Evidence from Karnataka

by
Yaqin Liao
1,* and
Ming Lin
1,2
1
Department of Statistics and Data Science, School of Economics, Xiamen University, Xiamen 361005, China
2
Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen 361005, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(8), 1241; https://doi.org/10.3390/math14081241
Submission received: 10 March 2026 / Revised: 5 April 2026 / Accepted: 6 April 2026 / Published: 8 April 2026

Abstract

Misclassification of binary outcomes in network settings may bias the estimates of causal effects, including spillover effects that arise from social interactions, and may generate spurious causal effects. To address this issue, we develop a parametric framework that jointly estimates misclassification probabilities and causal effect parameters within a binary choice model with neighborhood exposure mappings. Monte Carlo simulations show that ignoring outcome misclassification or network-related variables leads to substantial bias, whereas the proposed method achieves a smaller bias and RMSE. By applying the method to microfinance and social network data from Karnataka, we find that under binary exposure, ignoring outcome misclassification yields statistically significant spillover and overall effects, whereas these effects become statistically insignificant once outcome misclassification is corrected for. Furthermore, omitting network-related variables overstates the direct effect. These results underscore the importance of jointly correcting for outcome misclassification and accounting for network-related variables to obtain credible causal inference.

1. Introduction

Accumulating evidence suggests that social, spatial, and organizational linkages give rise to non-negligible inter-individual interactions. These linkages violate the Stable Unit Treatment Value Assumption and induce network spillovers, whereby one unit’s treatment affects others’ potential outcomes [1]. For example, Tchetgen Tchetgen et al. [2] studied the spillovers of prior incarceration for HIV, STIs, and hepatitis C using network ties based on recent sexual or injection partnerships. Hence, credible policy evaluation requires rigorous identification of such spillovers.
In the presence of spillovers, defining each individual’s potential outcome as a function of the entire treatment vector leads to severe dimensionality and identifiability problems. Hence, the literature commonly assumes that an individual’s potential outcome depends only on their own treatment and the treatments of others under a given interference structure. Two canonical frameworks arise: clustered interference, where spillovers are confined within pre-specified groups, as in cluster-randomized trials [3,4,5], and network neighborhood interference, where ties are encoded by an adjacency matrix and exposure mappings (e.g., the number or share of treated neighbors) summarize neighborhood treatments into a scalar, enabling the definition and estimation of causal effects [6,7,8,9,10,11].
Building on these frameworks, a large body of research studied causal inference under interference by addressing a range of distinct challenges through various modeling approaches, including approximate neighborhood interference [10], heterogeneous spillover effects across network-driven subpopulations [12], and nonparametric estimation of stochastic policy effects under clustered interference [13]. Despite addressing different problems, this literature largely shares a common focus on identifying and estimating direct, spillover, and overall effects, typically assuming that binary outcomes are measured without error. However, this assumption may be unrealistic in many empirical contexts. Binary outcomes—such as microfinance loan take-up [12] and self-reported injection risk behaviors [14]—are often self-reported or constructed from administrative records and are therefore prone to misclassification. This concern is particularly relevant in our empirical application, which uses microfinance and social network data from [15], drawn from a sample of villages in Karnataka, India. The data combine a household census with a partial individual census, providing detailed information on household characteristics, economic behavior, and social interactions. As the data rely on self-reported borrowing and multiple sources, informal borrowing is likely subject to measurement error due to reporting errors or recall bias. Measurement errors of this kind are pervasive in applied research and have long been recognized as an important source of bias in econometric analysis. The literature on causal inference without interference likewise emphasizes the importance of correcting for such measurement errors, as neglecting them can lead to biased estimates. For example, Zeng et al. [16] considered outcome-dependent sampling, Shu and Yi [17] studied joint covariate–outcome correction in inverse probability weighting, and Wei et al. [18] developed validation-based efficient estimation.
In this context, identifying causal effects in the presence of both outcome misclassification and network interference remains largely unexplored. To address this gap, we develop a novel framework for estimating average causal effects under such settings with misclassified outcomes. We specify a parametric misclassification mechanism and develop a framework that jointly identifies misclassification probabilities and causal effect parameters within a binary choice model with neighborhood exposure mappings. Monte Carlo simulations evaluate the finite-sample performance of the proposed method and highlight the bias arising from ignoring outcome misclassification or network-related variables. An application to data on rural microfinance and social networks from Karnataka further demonstrated the empirical relevance of these issues, showing that under binary exposure, failure to correct for outcome misclassification can generate spurious spillover and overall effects. In particular, once outcome misclassification is corrected for, we find no evidence of positive spillover or overall effects, whereas an estimator that ignores outcome misclassification suggests statistically significant positive effects. Moreover, omitting network-related variables distorts the decomposition of causal effects and overstates the direct effect. Finally, we assess the robustness of our results to the choice of exposure mapping and to the network unconfoundedness assumption.
The remainder of this paper is organized as follows. Section 2 introduces the framework for network causal effects; Section 3 presents identification and estimation; Section 4 reports the simulations; Section 5 provides an empirical application; and Section 6 concludes the paper. Additional proofs and simulation results are provided in Appendix A.

2. Model Framework

We considered a population of a size N, indexed by i N = { 1 , , N } . For each individual i, we observed a binary treatment D i D i : = { 0 , 1 } , covariates X i ind R p , and a binary outcome Y i { 0 , 1 } . Let Y i denote the latent (true) outcome, with Y i observed while subject to measurement error. Let d 1 : N = [ d 1 , , d N ] { 0 , 1 } N be the treatment assignment vector for the population, and let us denote the potential outcome of i under assignment d by Y i ( d 1 : N ) .
To make the potential outcomes framework tractable, we adopted the neighborhood interference structure proposed in Forastiere et al. [8]. Consider a population connected through a known network with an adjacency matrix A = { A i j } i , j = 1 N . For each unit i, let N i = { j : A i j = 1 } denote its neighbor set and n i : = | N i | denote its cardinality. Let d N i and d N i denote the treatment vectors for the neighbors and all other units, respectively, and let us define g i : { 0 , 1 } n i Z i as the associated exposure mapping. We assumed that interference operated only through this exposure mapping.
Assumption 1 (Neighborhood interference).
If the exposure mapping yields the same value for two neighbor-treatment assignments, i.e.,  g i ( d N i ) = g i ( d N i ) , then the corresponding potential outcomes are equal:
Y i d i , d N i , d N i = Y i d i , d N i , d N i .
We define the exposure received by unit i as Z i = g i ( D N i ) , where D N i : = ( D j ) j N i denotes the vector of neighbor treatments. Specifically, we define
Z i = 1 j N i D j > 0 ,
where Z i is a binary indicator equal to one if individual i has at least one treated neighbor and is zero otherwise.
Under Assumption 1, potential outcomes depend only on an individual’s own treatment and the corresponding induced neighborhood exposure:
Y i ( d 1 : N ) Y i ( d , z ) , z = g i ( d N i ) ,
where d { 0 , 1 } denotes the individual’s own treatment and z { 0 , 1 } denotes the exposure level induced by neighbors’ treatment assignments.
In addition to network exposure, we allowed the covariate vector to include both individual-level characteristics and those of an individual’s neighbors. Specifically, for each unit i, we observe
X i = X i ind , X i neigh ,
where X i ind denotes individual-level characteristics and X i neigh summarizes the characteristics of i’s neighbors. We constructed the neighborhood covariates as the average of the neighbors’ individual covariates
X i neigh = 1 n i j N i X j ind , n i > 0 , 0 , n i = 0 ,
so that neighboring characteristics entered the model as observed controls.
Given the model for potential outcomes and the observed data, we now introduce the identifying assumptions, establish the identification results, and develop estimation procedures for the causal parameters of interest.

3. Identification and Estimation

The parameter of interest is the average dose–response function (ADRF)
μ 0 ( d , z ) = E [ Y i ( d , z ) ] ,
which characterizes the average potential outcome under each combination of its own treatment d and neighborhood exposure z. We first define the average direct effect (ADE) as
ADE = z Z i μ 0 ( 1 , z ) μ 0 ( 0 , z ) P ( Z i = z ) ,
which captures the average effect of its own treatment averaged over the distribution of neighborhood exposure. Next, the average spillover effect (ASE) for the exposure level z is defined as
ASE ( z ) = d D i μ 0 ( d , z ) μ 0 ( d , 0 ) P ( D i = d ) ,
which quantifies the average change in potential outcomes induced solely by shifting the neighborhood exposure from zero to z while marginalizing over the distribution of an individual’s own treatment status. Finally, the average overall effect (AOE) is defined as
AOE = z Z i [ μ 0 ( 1 , z ) μ 0 ( 0 , 0 ) ] P ( Z i = z ) ,
which represents the total effect of moving from no treatment and no exposure to treatment with exposure.
Since potential outcomes are unobserved, we next introduce identifying assumptions, drawing on the framework of Forastiere et al. [8].
Assumption 2 (Causal framework assumptions).
(A1) 
Consistency: Y i = Y i ( D i , Z i ) .
(A2) 
Network unconfoundedness: Y i ( d , z ) ( D i , Z i ) X i for all d { 0 , 1 } and z Z i .
(A3) 
Positivity: 0 < P ( D i = d , Z i = z X i ) < 1 for all d { 0 , 1 } and z Z i .
Together, Assumption 2 (A1–A3) ensures identification of a well-defined spillover effect in our setting. Assumption 2 (A1) links the observed outcomes to the relevant potential outcomes. Assumption 2 (A2) requires that, depending on the covariates, neither an individual’s treatment nor the treatment status of their neighbors is driven by unobserved factors that also affect the potential outcomes. Assumption 2 (A3) guaranties sufficient overlap so that all treatment–exposure combinations occur with positive probability in the data.
Under Assumption 2 (A1–A3), by letting p ( d , z , x ) = E [ Y i D i = d , Z i = z , X i = x ] , the population ADRF can be written as follows:
μ 0 ( d , z ) = p ( d , z , x ) f X ( x ) d x .
Identification of the ADRF relies on accurate measurement of the outcome variable Y i . When Y i is subject to misclassification, conditional expectations based on the observed outcome may be systematically biased, thereby affecting the identification and estimation of μ 0 ( d , z ) . Let X ˜ i = [ D i , Z i , X i ] R 2 p + 2 . We assume that the latent outcome Y i follows a binary choice model:
P Y i = 1 | X ˜ i = G X ˜ i β ,
where G ( · ) : R ( 0 , 1 ) is a known and strictly increasing cumulative distribution function (CDF), such as G ( t ) = Φ ( t ) in the probit model, where Φ denotes the standard normal CDF.
Let π 01 = P ( Y i = 1 Y i = 0 ) and π 10 = P ( Y i = 0 Y i = 1 ) denote the false positive rate and false negative rate, respectively. The parameter π 01 represents the probability that a positive outcome is recorded when the true outcome is zero, while π 10 represents the probability that a true positive outcome is recorded as zero. To characterize the dependence structure of the latent outcomes and the measurement process, we further imposed the following assumptions.
Assumption 3 (Conditional independence).
Conditional on ( X ˜ i ) i = 1 N , the latent outcomes ( Y i ) i = 1 N are independent across individuals. In addition, conditional on ( Y i ) i = 1 N , the misclassification process generating Y i from Y i is independent across individuals.
Assumption 4 (Nondifferential misclassification).
Conditional on Y i , the observed outcome is independent of the treatment, exposure, and covariates:
Y i ( D i , Z i , X i ) Y i .
Under Assumption 4, the conditional probability of the observed outcome can be written as follows. The derivation is provided in Appendix A.1:
P ( Y i = 1 X ˜ i ) = π 01 + 1 π 10 π 01 P ( Y i = 1 X ˜ i ) .
Let
f ( Y i X ˜ i ; θ ) = P ( Y i = 1 X ˜ i ; θ ) Y i 1 P ( Y i = 1 X ˜ i ; θ ) 1 Y i .
Under Assumption 3, the likelihood function is given by
L ( θ ) = i = 1 N f ( Y i X ˜ i ; θ ) ,
where θ = ( β , π 10 , π 01 ) R 2 p + 4 collects all the model parameters.
To ensure identification, we imposed the following condition on the misclassification probabilities. Together with standard regularity conditions for the covariates, this condition yields identification of the parameter vector, as formalized in Theorem 1.
Assumption 5 (Monotonicity condition).
π 10 + π 01 < 1 .
Theorem 1.
Suppose that G = Φ is the standard normal CDF. Assume that E [ X ˜ i 2 ] < , the support of X ˜ i contains a nonempty open subset of R 2 p + 2 , and Assumptions 1–5 hold. Then, the parameter vector θ is identified, and the expected log-likelihood E log f ( Y i X ˜ i ; θ ) is uniquely maximized at the true parameter θ 0 .
The result builds on arguments similar to those in Hausman et al. [19]. For completeness, we provide a proof of Theorem 1 in Appendix A.2, explicitly stating the support conditions required for identification. In particular, we assumed that the support of the covariates contained a nonempty open set, which ensured sufficient variation so that the conditional mean function was identified. Although Theorem 1 is stated under the probit model, the identification argument extends to other strictly increasing and continuously differentiable link functions (such as the logistic link) under additional regularity conditions.
We restricted the parameter space to
Θ = θ : 0 π 10 < 1 , 0 π 01 < 1 , π 10 + π 01 < 1 ,
and estimated θ by maximizing the log-likelihood:
θ ^ = arg max θ Θ log L ( θ ) .
We refer to this estimator as the network-based misclassification-corrected (Net-MC) estimator. Based on the estimated parameters, the estimated average dose–response function (ADRF) is constructed as follows:
μ ^ 0 ( d , z ) = 1 N i = 1 N G X ˜ i ( d , z ) β ^ ,
where X ˜ i ( d , z ) = [ d , z , X i ] . The average direct effect is estimated as follows:
ADE ^ = z Z μ ^ 0 ( 1 , z ) μ ^ 0 ( 0 , z ) P ^ N ( Z = z ) ,
where P ^ N ( Z = z ) = N 1 i = 1 N 1 ( Z i = z ) denotes the empirical distribution of the exposure level. The average spillover effect is then computed as follows:
ASE ^ ( z ) = d D μ ^ 0 ( d , z ) μ ^ 0 ( d , 0 ) P ^ N ( D = d ) ,
where P ^ N ( D = d ) = N 1 i = 1 N 1 ( D i = d ) denotes the empirical distribution of the treatment. Finally, we estimated the average overall effect as follows:
AOE ^ = z Z [ μ ^ 0 ( 1 , z ) μ ^ 0 ( 0 , 0 ) ] P ^ N ( Z = z ) .
Given the dependence across individuals induced by the network structure, we conducted inference using bootstrap methods tailored to the data configuration. When the network could be partitioned into independent connected components (e.g., villages or schools), we implemented a cluster bootstrap that resampled components with replacement and recomputed the estimator in each replication (see Forastiere et al. [8]). When dependence is primarily local rather than cluster-based, a block bootstrap can instead be used to accommodate network dependence (see Kojevnikov [20]). Standard errors and confidence intervals were obtained from the bootstrap.

4. Monte Carlo Simulations

We evaluated the finite-sample performance of the proposed method through a series of Monte Carlo simulations. The experiments examined performance across varying network structures and investigated the consequences of ignoring outcome misclassification in the presence of network interference.
We considered undirected networks of a size N with an adjacency matrix A. To examine how network topology affected the performance of the estimators, we generated networks from two canonical models that differed in their link formation mechanisms and the degree distributions they induced. These designs allowed us to compare settings with relatively homogeneous degree distributions to those with substantial degree heterogeneity:
  • ER networks: As a benchmark design, the network was generated from an Erdős–Rényi random graph [21], in which each pair of distinct nodes i j was independently connected with a probability of q = 6 / N . This yielded a sparse network with an expected average degree approximately equal to 6. The Erdős–Rényi model produces relatively homogeneous connectivity, with node degrees concentrated around their mean and limited variation across individuals.
  • BA networks: To introduce substantial degree heterogeneity, we generated networks using the Barabási–Albert model [22]. Starting from a small connected core, each new node attached to m = 3 existing nodes with a probability proportional to their degrees. This produced networks with an average degree approximately equal to 6. Unlike the Erdős–Rényi model, the Barabási–Albert process generates a heavy-tailed degree distribution characterized by a few highly connected hub nodes and many sparsely connected ones.
Given the realized network, individual-level covariates were generated as X i ind i . i . d . N ( 0 , I p ) . Let n i = | N i | denote the number of neighbors of unit i. We defined the neighborhood covariate as the average of the neighbors’ covariates:
X i neigh = 1 n i j N i X j ind , n i > 0 , 0 , n i = 0 .
and defined the full covariate vector as X i = [ X i ind , X i neigh ] .
We next specify the treatment assignment mechanism. Treatment was generated according to a logistic model
logit P ( D i = 1 X i ) = 1.5 + j = 1 2 p X i j .
so that treatment assignment depended on the observed covariates. Neighborhood exposure is defined as
Z i = 1 j N i D j > 0 ,
indicating whether at least one neighbor is treated. Conditional on the treatment and exposure, the latent outcome was generated as follows:
Y i = 1 D i + Z i + j = 1 2 p X i j + ε i 0 , ε i i . i . d . N ( 0 , 1 ) .
Finally, we introduce outcome misclassification. Let
π 01 = P ( Y i = 1 Y i = 0 ) , π 10 = P ( Y i = 0 Y i = 1 ) ,
denote the false positive rate and false negative rate, respectively. These parameters represent the probabilities that the observed outcome differs from the true outcome due to misclassification. Conditional on the true outcome Y i , the misclassification indicator satisfies
R i Y i i . i . d . Bernoulli π 10 , Y i = 1 , Bernoulli π 01 , Y i = 0 .
The observed outcome is then constructed as follows:
Y i = ( 1 R i ) Y i + R i ( 1 Y i ) .
We conducted 1000 Monte Carlo replications with sample sizes N { 2000 , 4000 , 8000 } and the covariate dimension p = 20 . Although the two network designs were calibrated to have a similar average degree, they differed in their higher-order structural properties.
Table 1 reports a summary of the statistics of the simulated networks used in the Monte Carlo experiments. For the ER networks, the mean degree was approximately six for all sample sizes, with relatively small dispersion and moderate maximum degrees. In contrast, the BA networks exhibited substantially greater heterogeneity in the degree distribution, as reflected in the much larger standard deviations and maximum degrees. This difference reflects the presence of highly connected hub nodes characteristic of scale-free networks.
Table 2 reports the median absolute error (MAE) of the estimated misclassification probabilities ( π 01 , π 10 ) for the proposed Net-MC estimator across different misclassification rates and sample sizes. For both the ER and BA networks, the MAE decreased steadily as N increased from 2000 to 8000, indicating improved estimation accuracy with larger samples. The estimation performance was also quite similar across the two network designs, suggesting that the proposed method is robust to differences in network structure. As expected, the MAE tended to be slightly larger when the underlying misclassification probabilities were higher; however, the estimation error remained small and declined consistently with the sample size. Overall, these results indicate that the proposed method yields accurate and stable estimation of the misclassification probabilities.
Table 3 reports the median absolute error (MAE) of the estimates of β and compares the finite-sample performance of three estimators corresponding to different estimation methods:
  • Oracle: the infeasible probit estimator based on the latent outcome Y i , which served as a benchmark and was invariant to misclassification probabilities.
  • Naive: the standard probit estimator applied to the observed outcome Y i , ignoring outcome misclassification.
  • Net-MC: the proposed misclassification-corrected probit estimator that corrects for outcome misclassification and incorporates network-related variables.
As expected, the Oracle estimator achieved the smallest MAE across all settings. In contrast, the Naive estimator exhibited substantially larger errors in the presence of outcome misclassification, indicating that failure to correct for misclassification leads to substantial estimation errors. The proposed Net-MC estimator substantially outperformed the Naive estimator across all misclassification probabilities, with MAEs that were markedly smaller and approached those of the Oracle estimator as the sample size increased. Overall, these results demonstrate that explicitly correcting for outcome misclassification, together with incorporating network-related variables, led to more accurate estimation of the coefficient vector β .
Table 4 reports the simulation results for the ADE estimates under ER networks, evaluated in terms of bias and root mean squared error (RMSE). In addition to the three estimators considered above, we considered a benchmark specification, namely the non-network misclassification-corrected (Non-Net-MC) estimator, which applies the misclassification-corrected probit estimator while omitting neighbor exposure and network-related covariates. The results show that this benchmark exhibited substantial bias, underscoring the importance of accounting for network spillovers in causal inference. We report the results for Non-Net-MC estimator only for the ADE analysis. By construction, it does not incorporate variation in neighbors’ variables and therefore precludes the identification of network-related effects. Consequently, neither the average spillover effect (ASE) nor the average overall effect (AOE) was well defined under this specification. Among the feasible estimators, the proposed Net-MC estimator yielded ADE estimates with the smallest bias and RMSE, indicating that jointly accounting for outcome misclassification and network spillovers is crucial for accurate estimation of the ADE.
The results for the ASE and AOE under ER networks were qualitatively similar to those under BA networks, and they are reported in Table A1, Table A2, Table A3, Table A4, Table A5 and Table A6 in Appendix A.3. Overall, the results were consistent across all settings. The proposed Net-MC estimator yielded ASE and AOE estimates with a uniformly smaller bias and RMSE than the Naive estimator, and its performance approached that of the Oracle estimator as the sample size increased. In contrast, estimators that failed to correct for outcome misclassification or to account for network-related variables produced substantially biased estimates. These findings underscore the importance of jointly correcting for outcome misclassification and accounting for network-related variables to obtain credible and reliable causal inference.

5. Empirical Analysis

5.1. Data Description

Our empirical analysis used village-level microfinance and social network data compiled by Banerjee et al. [15]. The dataset covers 75 villages in Karnataka, India, of which the microfinance institution Bharatha Swamukti Samsthe (BSS) operated in 43 villages. It combines a household census with a partial individual census, providing detailed information on household characteristics, economic behavior, and social interactions. This dataset is particularly well suited for our analysis, as our research question required data that jointly capture treatment, network relationships, covariates, and binary outcomes, variables that are rarely jointly observed in empirical settings. It provides detailed information on both the social network structure and household-level borrowing, thereby meeting the key data requirements for studying network spillover effects in the presence of outcome misclassification. The dataset has also been widely used in the literature on social networks and economic behavior (e.g., Breza et al. [23], Lubold et al. [24], Lambotte [25]), supporting its reliability and empirical relevance. In addition, microfinance participation and informal borrowing behavior are inherently shaped by social interactions, as information diffusion and peer effects play an important role in household financial decision making.
The network adjacency matrix A i j was constructed by following Banerjee et al. [15]. In particular, an undirected link between two households was defined as the union of reported visits, where a connection was formed if either household reported visiting the other. This definition is intended to capture economically meaningful relationships through which financial assistance and information may flow. Let D i denote whether household i participates in the BSS program, and let Y i indicate whether the household engages in informal borrowing, including loans from moneylenders, relatives, or friends. Following Banerjee et al. [15], we controlled for 11 baseline household characteristics that may affect both microfinance participation and borrowing behavior. These variables included the presence of an eligible female household member (age 18–57), general or OBC caste status, whether any household member served as a BSS leader, household size, the number of rooms and beds, access to a latrine and electricity, housing tenure status, and indicators for an RCC roof and a thatched roof. Together, these covariates capture key dimensions of household demographics, socioeconomic status, and housing quality that may influence both program participation and borrowing behavior.
The sample consisted of 8080 households across 43 BSS villages. There was considerable variation in village-level coverage, with complete-case sample sizes ranging from a minimum of 75 to a maximum of 305 households per village, with a mean of 187.91 and a median of 184.00. The village social networks exhibited substantial heterogeneity in connectivity. Household degrees ranged from 0 to 49, with an average of approximately nine social connections. This structural diversity indicates significant variation in the potential exposure to treated neighbors, which is essential for identifying spillover effects. Figure 1 presents the social network in a representative village. The network exhibited both densely connected clusters and households occupying more central positions. For visual clarity, the figure displays only the largest connected component of the village network.

5.2. Empirical Results

Because the dataset combines multiple sources and relies on self-reported borrowing, the informal credit variable may be subject to measurement error due to reporting mistakes or recall bias. To address this issue, we allowed for potential misclassification of the binary outcome in estimating causal effects. We primarily report the estimates and 95% percentile confidence intervals (CIs) for the misclassification probabilities of informal borrowing, as well as the key causal effect parameters, under binary exposure. These confidence intervals were obtained from 1000 cluster bootstrap replications, in which villages were resampled with replacement.
Table 5 reports the estimated misclassification probabilities for the informal borrowing outcome obtained using the proposed Net-MC estimator. False positives ( π 01 ), defined as recording borrowing when none occurred, were extremely rare (0.0085). In contrast, false negatives ( π 10 ), defined as failing to report borrowing when it did occur, were substantially more common (0.1771). This asymmetry suggests that informal borrowing is more likely to be underreported than overreported in survey data.
Table 6 reports the ADE, ASE, and AOE estimates based on alternative estimators. For the ADE, we reported estimates from three estimators: Naive, Non-Net-MC, and Net-MC. The Non-Net-MC estimator does not incorporate variation in network-related variables and therefore does not identify spillover effects; accordingly, the ASE and AOE were reported only for the Naive and Net-MC estimators.
For the ADE, the Non-Net-MC estimator yielded an estimate of 0.0003, which is slightly larger than the corresponding estimate from the Net-MC estimator. Once network-related variables were accounted for, the estimated ADE was substantially reduced, taking values of 0.0153 under the Naive estimator and −0.0005 under the Net-MC estimator. Although all three estimates were statistically insignificant, the comparison remains informative. The Non-Net-MC estimator tended to overstate the ADE when network-related variables were excluded. A similar pattern emerged for the spillover-related effects. Under the Naive estimator, the ASE and AOE were 0.0812 and 0.0968, respectively, both of which were statistically significant. In contrast, once outcome misclassification was corrected for, the spillover and overall effects became negligible and statistically insignificant, with the estimated ASE and AOE equal to −0.0085 and −0.0090, respectively, under the Net-MC estimator.
Taken together, these results point to two distinct sources of bias in the empirical settings with network interactions. First, comparing the Non-Net-MC estimator with those that incorporate network-related variables shows that the omission of neighbor exposure and covariates can distort the decomposition of causal effects, attributing part of the spillover effect to the direct effect and thereby inflating the estimated ADE. Second, comparing the Naive and Net-MC estimators indicates that failure to correct for outcome misclassification can generate spurious evidence of spillover and overall effects, substantially inflating both their magnitude and statistical significance. Overall, these findings underscore the importance of jointly accounting for network-related variables and correcting for outcome misclassification in order to obtain credible and unbiased estimates of causal effects.

5.3. Robustness Checks

In this subsection, we examine the robustness of our results to the modeling choices and identifying assumptions underlying the baseline analysis. We begin by replacing the binary exposure with a proportion-based exposure, defined as the fraction of treated neighbors, and examine whether our main conclusions continue to hold. Formally, the proportion-based exposure is defined as follows:
Z i = 1 n i j N i D j .
For isolated individuals with no social ties ( n i = 0 ), Z i = 0 was set to zero.
In the analysis based on proportion-based exposure, we selected 0.1 , 0.3 , 0.5 , and 0.7 as representative exposure levels according to several quantiles of the neighboring treatment proportion Z and estimated the corresponding causal effects at each level. The results are reported in Table 7. Compared with the results under the binary exposure specification (Table 6), the estimates were not fully identical across the two exposure definitions. In particular, under the Naive estimator, the spillover effect exhibited substantial differences; under binary exposure, the Naive estimate of the ASE was significantly positive, whereas under proportion-based exposure, the corresponding Naive estimates at different exposure levels were all close to zero and slightly negative. This suggests that the Naive estimator is sensitive to the definition of exposure.
It is worth noting that the Naive estimator is based directly on the observed outcome and does not correct for outcome misclassification. Therefore, the difference between the Naive and corrected estimates should not be interpreted solely as arising from the change in the exposure definition. By contrast, the Net-MC estimator exhibited a more stable pattern across the two exposure specifications, with the estimates of the ADE, ASE, and AOE all remaining statistically insignificant. Moreover, under proportion-based exposure, the Net-MC estimates of the ASE declined gradually from 0.0004 to 0.0027 as the exposure level increased from 0.1 to 0.7 , indicating a mildly stronger negative spillover effect at higher exposure levels. However, the magnitude of this pattern remained small, and the statistical evidence was limited.
Overall, this robustness check indicates that although the Naive estimator varied across exposure definitions, the main conclusion based on the Net-MC estimator remained largely unchanged. The estimated direct, spillover, and overall effects were generally small and statistically insignificant. Therefore, our core empirical findings appear reasonably robust to alternative exposure definitions, although this robustness is primarily reflected in the corrected estimates rather than in the uncorrected estimates.
Beyond examining the robustness of our results to the exposure definition, we next turn to another potential concern. Because identification in our observational network setting relies on a network unconfoundedness assumption, our empirical findings may still be affected by omitted factors both observed and unobserved. In particular, the household treatment status and neighborhood exposure may be correlated with characteristics that are not fully accounted for by the available baseline covariates. To assess the sensitivity of our results to this concern, we conducted an additional robustness check. Although this issue cannot be fully ruled out empirically, this analysis allowed for a more cautious and ultimately more persuasive interpretation of the empirical findings.
As a diagnostic, we augmented the baseline specification with the village-level fixed effects. Specifically, we replaced the latent index model with
P Y i = 1 X ˜ i , V i = G X ˜ i β + k = 2 K δ k 1 { V i = k } ,
where 1 { V i = k } denotes an indicator equal to one if household i belongs to village k and the first village is omitted as the reference category. With the village-level fixed effects, the comparison was made among households within the same village. This means that the estimates were not driven by average differences across villages but by differences among households living in the same village.
This analysis is particularly relevant in our context because both treatment participation and network exposure may be influenced by village-level characteristics. The results are reported in Table 8. Compared with the baseline Net-MC results under binary exposure, the estimates obtained after including the village-level fixed effects remained broadly similar in both sign and magnitude. Specifically, the estimated ADE, ASE, and AOE were all statistically insignificant. These findings suggest that our main results were not driven primarily by omitted village-level factors, and they provide further support for the conclusion that the direct, spillover, and overall effects are all statistically insignificant after correcting for outcome misclassification.

6. Conclusions

This paper studied how outcome misclassification affects the estimation of causal effects in network settings. We modeled interference through exposure mappings and adopted a parametric binary choice framework that allowed the misclassification probabilities and causal effect parameters to be jointly identified.
Monte Carlo simulations showed that estimators that ignore outcome misclassification or network-related variables can suffer from substantial bias, whereas the proposed method performed well with finite samples. The empirical analysis based on data from Karnataka further highlights, under binary exposure, two distinct sources of bias in network settings that help explain these differences. First, omitting network-related variables can distort the decomposition of causal effects and lead to an overstatement of the direct effect. Second, failure to correct for outcome misclassification can generate misleading evidence of spillover and overall effects. Once outcome misclassification was corrected for, we found no evidence of spillover or overall effects of the neighbors’ microcredit participation on households’ informal borrowing. Taken together, these results underscore that credible estimation of causal effects in network settings requires jointly accounting for network-related variables and correcting for outcome misclassification.
Finally, we assessed the robustness of our results to the choice of exposure mapping and the network unconfoundedness assumption. Although our empirical analysis delivered a clear and robust message, its causal interpretation remains conditional on the plausibility of several core assumptions, including the parametric link specification, the exposure mapping, and the network unconfoundedness assumption. In addition, our analysis assumed nondifferential misclassification, under which the misclassification error was independent of the observed covariates. Relaxing this assumption by allowing misclassification probabilities to vary with the observed characteristics would introduce additional identification challenges. Addressing these challenges is an important direction for future research and would further broaden the scope of causal inference with misclassified outcomes in network settings.
While our empirical application was based on a specific regional context, the framework we developed is not restricted to that setting. More generally, in empirical applications where network relationships, treatment, and outcome measures are available, the variables required for our approach can typically be constructed. Our framework therefore provides a general tool for evaluating causal effects in network environments, particularly when outcome misclassification is present.

Author Contributions

Conceptualization, Y.L. and M.L.; methodology, Y.L. and M.L.; software, Y.L.; formal analysis, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and M.L.; visualization, Y.L.; supervision, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available from Zenodo at https://doi.org/10.5281/zenodo.7706650 (Banerjee et al. [15]).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Derivation of Equation (13)

Proof. 
Let
p ( d , z , x ) = P ( Y i = 1 D i = d , Z i = z , X i = x ) , p ( d , z , x ) = P ( Y i = 1 D i = d , Z i = z , X i = x ) .
Under the law of total probability, we have
p ( d , z , x ) = P ( Y i = 1 Y i = 1 , D i = d , Z i = z , X i = x ) P ( Y i = 1 D i = d , Z i = z , X i = x ) + P ( Y i = 1 Y i = 0 , D i = d , Z i = z , X i = x ) P ( Y i = 0 D i = d , Z i = z , X i = x ) .
Under Assumption 4, the misclassification probabilities do not depend on ( D i , Z i , X i ) . Hence, we have
P ( Y i = 1 Y i = 1 , D i = d , Z i = z , X i = x ) = P ( Y i = 1 Y i = 1 ) = 1 π 10 , P ( Y i = 1 Y i = 0 , D i = d , Z i = z , X i = x ) = P ( Y i = 1 Y i = 0 ) = π 01 .
Substituting these expressions into the previous equation yields
p ( d , z , x ) = ( 1 π 10 ) p ( d , z , x ) + π 01 ( 1 p ( d , z , x ) ) .
Rearranging the terms gives
p ( d , z , x ) = π 01 + ( 1 π 10 π 01 ) p ( d , z , x ) ,
This completes the proof. □

Appendix A.2. Proof of Theorem 1

Proof. 
Note that
P ( Y i = 1 X ˜ i = x ; θ ) = π 01 + 1 π 10 π 01 Φ ( x β ) ,
where θ = ( β , π 10 , π 01 ) Θ .
First, we explain why Assumption 5 is required for identification. Consider the probit link Φ , which satisfies the symmetry property
Φ ( t ) = 1 Φ ( t ) .
For any
θ = ( β , π 10 , π 01 ) Θ ,
we define
θ ˜ : = ( β , 1 π 01 , 1 π 10 ) .
Then, we have
P ( Y i = 1 X ˜ i = x ; θ ˜ ) = ( 1 π 10 ) + 1 ( 1 π 01 ) ( 1 π 10 ) Φ ( x β ) = ( 1 π 10 ) ( 1 π 10 π 01 ) Φ ( x β ) = ( 1 π 10 ) ( 1 π 10 π 01 ) { 1 Φ ( x β ) } = π 01 + ( 1 π 10 π 01 ) Φ ( x β ) = P ( Y i = 1 X ˜ i = x ; θ ) .
Thus, θ and θ ˜ generate the same conditional probability function. However, we have
( 1 π 01 ) + ( 1 π 10 ) = 2 ( π 10 + π 01 ) > 1 ,
and thus θ ˜ does not belong to the parameter space Θ . Therefore, without the restriction π 10 + π 01 < 1 , the model would not be identified.
We next prove identification. Let
a : = π 01 , b : = 1 π 10 π 01 ,
Similarly, we define
a : = π 01 , b : = 1 π 10 π 01 .
Under Assumption 5, b > 0 , and b > 0 .
Suppose that
a + b Φ ( X ˜ i β ) = a + b Φ ( X ˜ i β ) a . s .
Since both sides are continuously differentiable in x, and the support of X ˜ i contains a nonempty open subset, it follows that
a + b Φ ( x β ) = a + b Φ ( x β )
for all x in some nonempty open subset U .
Differentiating both sides of Equation (A1) twice with respect to x yields
b ϕ ( x β ) β = b ϕ ( x β ) β x U .
Because ϕ ( · ) > 0 , the two sides are nonzero scalar multiples of β and β . Hence, β and β must be collinear. There exists λ 0 such that
β = λ β .
Substituting this into the gradient identity gives
b ϕ ( x β ) = b λ ϕ ( λ x β ) x U .
Since U is open and β 0 , the scalar x β varies over a nondegenerate interval. Hence, we have
b ϕ ( t ) = b λ ϕ ( λ t ) t I
for some nondegenerate interval I.
By using
ϕ ( t ) = 1 2 π e t 2 / 2 ,
this implies
e t 2 / 2 = b λ b e λ 2 t 2 / 2 t I .
Therefore, λ 2 = 1 . Since b > 0 , b > 0 , and ϕ > 0 , we must have λ > 0 , and hence λ = 1 . Thus, we have
β = β .
By substituting β = β back into Equation (A1), we get
( a a ) + ( b b ) Φ ( x β ) = 0 x U .
Because U is open and β 0 , the function Φ ( x β ) is nonconstant on U . Hence, we have
a = a , b = b .
Therefore, we have
π 01 = π 01 , π 10 = π 10 ,
and thus θ = θ .
Note that
f ( Y i X ˜ i ; θ ) = P ( Y i = 1 X ˜ i ; θ ) Y i 1 P ( Y i = 1 X ˜ i ; θ ) 1 Y i .
We define
Q ( θ ) : = E log f ( Y i X ˜ i ; θ ) .
Finally, we verify that the population log-likelihood Q ( θ ) is uniquely maximized at θ 0 . Let θ 0 denote the true parameter value. Through identification, if θ θ 0 , then
f ( Y i X ˜ i ; θ ) f ( Y i X ˜ i ; θ 0 ) with positive probability .
If, in addition,
E | log f ( Y i X ˜ i ; θ ) | < θ Θ ,
then under Lemma 2.2 of Newey and McFadden [26], it follows that Q ( θ ) is uniquely maximized at θ 0 .
It remains to verify Equation (A2). Since
P ( Y i = 1 X ˜ i ; θ ) = π 01 + 1 π 10 π 01 Φ ( X ˜ i β ) ,
and 1 π 10 π 01 > 0 , there exists a constant c > 0 such that
P ( Y i = 1 X ˜ i ; θ ) c Φ ( X ˜ i β ) , 1 P ( Y i = 1 X ˜ i ; θ ) c Φ ( X ˜ i β ) .
Hence, for some constant C > 0 , we have
| log f ( Y i X ˜ i ; θ ) | C + | log Φ ( X ˜ i β ) | + | log Φ ( X ˜ i β ) | .
Let u : = X ˜ i β . It is well known that
d d u log Φ ( u ) = ϕ ( u ) Φ ( u ) = : λ ( u ) ,
where ϕ ( u ) is the standard normal density. The function λ ( u ) is continuous, convex, and satisfies
λ ( u ) u 1 as u , λ ( u ) 0 as u .
Hence, there exists a constant C > 0 such that
| λ ( u ) | C ( 1 + | u | ) u R .
Under the mean value theorem, for some u ˜ between 0 and u, we have
| log Φ ( u ) | = | log Φ ( 0 ) + λ ( u ˜ ) u | | log Φ ( 0 ) | + | λ ( u ˜ ) | | u | .
Using | u ˜ | | u | and the above bound on λ ( · ) , we obtain
| log Φ ( u ) | C 1 + ( 1 + | u | ) | u | C ( 1 + u 2 ) ,
for some constant C > 0 , which may vary from line to line. The same bound applies to | log Φ ( u ) | . Therefore, we have
| log f ( Y i X ˜ i ; θ ) | C ( 1 + u 2 ) .
Since u = X ˜ i β , we have
u 2 β 2 X ˜ i 2 ,
and hence
| log f ( Y i X ˜ i ; θ ) | C 1 + X ˜ i 2 .
Thus, under E [ X ˜ i 2 ] < , we obtain
E | log f ( Y i X ˜ i ; θ ) | < ,
which verifies Equation (A2).
This completes the proof. □

Appendix A.3. Additional Simulation Results

This subsection reports additional simulation results that complement the main analysis. Specifically, we present the results for the ADE and AOE estimates under the ER network design, as well as the results for the coefficient estimates and for the ADE, ASE, and AOE estimates under the BA network design.
Table A1. Simulation results for ASE estimates under ER networks.
Table A1. Simulation results for ASE estimates under ER networks.
( π 01 , π 10 ) MetricNOracleNaiveNet-MC
( 0.01 , 0.01 ) Bias20000.0016−0.00110.0016
4000−0.0002−0.0018−0.0001
80000.0009−0.00070.0008
RMSE20000.02040.02630.0217
40000.01350.01810.0144
80000.00970.01310.0101
( 0.01 , 0.10 ) Bias20000.00160.02030.0041
4000−0.00020.02050.0010
80000.00090.02060.0012
RMSE20000.02040.03630.0265
40000.01350.02950.0166
80000.00970.02580.0117
( 0.05 , 0.10 ) Bias20000.00160.00350.0030
4000−0.00020.00350.0008
80000.00090.00420.0009
RMSE20000.02040.03350.0306
40000.01350.02380.0191
80000.00970.01780.0129
( 0.10 , 0.01 ) Bias20000.0016−0.03080.0006
4000−0.0002−0.0316−0.0008
80000.0009−0.02950.0004
RMSE20000.02040.04450.0269
40000.01350.03860.0172
80000.00970.03360.0118
( 0.10 , 0.05 ) Bias20000.0016−0.01800.0032
4000−0.0002−0.0187−0.0004
80000.0009−0.01720.0004
RMSE20000.02040.03810.0313
40000.01350.02960.0190
80000.00970.02400.0129
Notes: The table reports the bias and root mean squared error (RMSE) of the ASE estimates across the Monte Carlo replications. Definitions of the estimation methods are provided in the notes to Table 4.
Table A2. Simulation results for AOE estimates under ER networks.
Table A2. Simulation results for AOE estimates under ER networks.
( π 01 , π 10 ) MetricNOracleNaiveNet-MC
( 0.01 , 0.01 ) Bias20000.0047−0.01290.0073
40000.0008−0.01380.0013
80000.0016−0.01300.0015
RMSE20000.03390.03810.0415
40000.02140.02920.0236
80000.01510.02230.0162
( 0.01 , 0.10 ) Bias20000.0047−0.00690.0145
40000.0008−0.00640.0022
80000.0016−0.00610.0024
RMSE20000.03390.04280.0576
40000.02140.03070.0288
80000.01510.02290.0197
( 0.05 , 0.10 ) Bias20000.00470.00740.0130
40000.00080.00790.0018
80000.00160.00880.0024
RMSE20000.03390.04580.0601
40000.02140.03320.0314
80000.01510.02530.0212
( 0.10 , 0.01 ) Bias20000.00470.06790.0087
40000.00080.06690.0010
80000.00160.06820.0012
RMSE20000.03390.08170.0495
40000.02140.07400.0262
80000.01510.07200.0184
( 0.10 , 0.05 ) Bias20000.00470.02990.0133
40000.00080.02910.0014
80000.00160.03000.0013
RMSE20000.03390.05410.0590
40000.02140.04300.0302
80000.01510.03810.0204
Notes: The table reports the bias and root mean squared error (RMSE) of the AOE estimates across the Monte Carlo replications. Definitions of the estimation methods are provided in the notes to Table 4.
Table A3. MAEs of β estimates under different misclassification probabilities in BA networks.
Table A3. MAEs of β estimates under different misclassification probabilities in BA networks.
( π 01 , π 10 ) NOracleNaiveNet-MC
( 0.01 , 0.01 ) 20000.13590.44420.1536
40000.07830.46140.0859
80000.05030.47010.0537
( 0.01 , 0.10 ) 20000.13590.71920.2315
40000.07830.72430.1057
80000.05030.72600.0659
( 0.05 , 0.10 ) 20000.13590.75730.3593
40000.07830.76100.1264
80000.05030.76280.0749
( 0.10 , 0.01 ) 20000.13590.67830.2395
40000.07830.68470.1062
80000.05030.68770.0657
( 0.10 , 0.05 ) 20000.13590.74520.3771
40000.07830.74880.1215
80000.05030.74990.0753
Notes: The table reports the MAE of the estimates of the coefficient vector β across the Monte Carlo replications. Definitions of the estimation methods are provided in the notes to Table 3.
Table A4. Simulation results for ADE estimates under BA networks.
Table A4. Simulation results for ADE estimates under BA networks.
( π 01 , π 10 ) MetricNOracleNaiveNon-Net-MCNet-MC
( 0.01 , 0.01 ) Bias20000.0034−0.00780.31850.0044
40000.0016−0.00810.30740.0015
80000.0003−0.00820.30390.0005
RMSE20000.02200.02690.33380.0296
40000.01560.01940.31420.0168
80000.01040.01470.30660.0114
( 0.01 , 0.10 ) Bias20000.0034−0.02000.32540.0055
40000.0016−0.01890.31150.0019
80000.0003−0.01960.30670.0007
RMSE20000.02200.03460.33250.0393
40000.01560.02710.31590.0211
80000.01040.02370.31160.0150
( 0.05 , 0.10 ) Bias20000.00340.01160.32480.0072
40000.00160.01310.31300.0022
80000.00030.01240.31170.0007
RMSE20000.02200.03350.33200.0461
40000.01560.02480.31620.0225
80000.01040.01910.31370.0158
( 0.10 , 0.01 ) Bias20000.00340.10340.31640.0070
40000.00160.10370.30650.0021
80000.00030.10330.30810.0005
RMSE20000.02200.10890.32160.0385
40000.01560.10640.30930.0186
80000.01040.10460.31110.0122
( 0.10 , 0.05 ) Bias20000.00340.05440.32240.0061
40000.00160.05570.31310.0028
80000.00030.05510.30840.0005
RMSE20000.02200.06270.32810.0413
40000.01560.05980.31600.0221
80000.01040.05720.31340.0142
Notes: The table reports the bias and root mean squared error (RMSE) of the ADE estimates across the Monte Carlo replications. Definitions of the estimation methods are provided in the notes to Table 4.
Table A5. Simulation results for ASE estimates under BA networks.
Table A5. Simulation results for ASE estimates under BA networks.
( π 01 , π 10 ) MetricNOracleNaiveNet-MC
( 0.01 , 0.01 ) Bias2000−0.0001−0.0008−0.0001
40000.0006−0.00010.0005
80000.00020.00000.0002
RMSE20000.01760.02330.0189
40000.01260.01700.0134
80000.00810.01160.0087
( 0.01 , 0.10 ) Bias2000−0.00010.01890.0008
40000.00060.01990.0007
80000.00020.01960.0004
RMSE20000.01760.03340.0228
40000.01260.02830.0150
80000.00810.02410.0100
( 0.05 , 0.10 ) Bias2000−0.00010.00540.0012
40000.00060.00720.0005
80000.00020.00610.0003
RMSE20000.01760.02980.0258
40000.01260.02190.0167
80000.00810.01660.0108
( 0.10 , 0.01 ) Bias2000−0.0001−0.02390.0003
40000.0006−0.02280.0005
80000.0002−0.02330.0001
RMSE20000.01760.03700.0234
40000.01260.03040.0157
80000.00810.02730.0100
( 0.10 , 0.05 ) Bias2000−0.0001−0.01240.0018
40000.0006−0.01100.0009
80000.0002−0.01190.0003
RMSE20000.01760.03230.0274
40000.01260.02390.0169
80000.00810.01950.0110
Notes: The table reports the bias and root mean squared error (RMSE) of the ASE estimates across the Monte Carlo replications. Definitions of the estimation methods are provided in the notes to Table 4.
Table A6. Simulation results for AOE estimates under BA networks.
Table A6. Simulation results for AOE estimates under BA networks.
( π 01 , π 10 ) MetricNOracleNaiveNet-MC
( 0.01 , 0.01 ) Bias20000.0034−0.00870.0044
40000.0022−0.00830.0021
80000.0005−0.00840.0007
RMSE20000.02950.03640.0368
40000.02150.02670.0229
80000.01410.01940.0154
( 0.01 , 0.10 ) Bias20000.0034−0.00190.0066
40000.00220.00020.0026
80000.0005−0.00090.0011
RMSE20000.02950.04220.0479
40000.02150.03000.0272
80000.01410.02040.0187
( 0.05 , 0.10 ) Bias20000.00340.01650.0088
40000.00220.01980.0028
80000.00050.01800.0010
RMSE20000.02950.04800.0551
40000.02150.03700.0294
80000.01410.02810.0199
( 0.10 , 0.01 ) Bias20000.00340.08130.0076
40000.00220.08290.0026
80000.00050.08200.0006
RMSE20000.02950.09320.0475
40000.02150.08880.0258
80000.01410.08500.0168
( 0.10 , 0.05 ) Bias20000.00340.04260.0082
40000.00220.04540.0037
80000.00050.04390.0008
RMSE20000.02950.06130.0517
40000.02150.05500.0295
80000.01410.04910.0189
Notes: The table reports the bias and root mean squared error (RMSE) of the AOE estimates across the Monte Carlo replications. Definitions of the estimation methods are provided in the notes to Table 4.

References

  1. Halloran, M.E.; Struchiner, C.J. Causal inference in infectious diseases. Epidemiology 1995, 6, 142–151. [Google Scholar] [CrossRef]
  2. Tchetgen Tchetgen, E.J.; Fulcher, I.R.; Shpitser, I. Auto-g-computation of causal effects on a network. J. Am. Stat. Assoc. 2021, 116, 833–844. [Google Scholar] [CrossRef]
  3. Sobel, M.E. What do randomized studies of housing mobility demonstrate? Causal inference in the face of interference. J. Am. Stat. Assoc. 2006, 101, 1398–1407. [Google Scholar] [CrossRef]
  4. Tchetgen Tchetgen, E.J.; VanderWeele, T.J. On causal inference in the presence of interference. Stat. Methods Med. Res. 2012, 21, 55–75. [Google Scholar] [CrossRef]
  5. Qu, Z.; Xiong, R.; Liu, J.; Imbens, G. Semiparametric estimation of treatment effects in observational studies with heterogeneous partial interference. arXiv 2021, arXiv:2107.12420. [Google Scholar] [CrossRef]
  6. Manski, C.F. Identification of treatment response with social interactions. Econom. J. 2013, 16, S1–S23. [Google Scholar] [CrossRef]
  7. Aronow, P.M.; Samii, C. Estimating average causal effects under general interference, with application to a social network experiment. Ann. Appl. Stat. 2017, 11, 1912–1947. [Google Scholar] [CrossRef]
  8. Forastiere, L.; Airoldi, E.M.; Mealli, F. Identification and estimation of treatment and interference effects in observational studies on networks. J. Am. Stat. Assoc. 2021, 116, 901–918. [Google Scholar] [CrossRef]
  9. Leung, M.P. Treatment and spillover effects under network interference. Rev. Econ. Stat. 2020, 102, 368–380. [Google Scholar] [CrossRef]
  10. Leung, M.P. Causal inference under approximate neighborhood interference. Econometrica 2022, 90, 267–293. [Google Scholar] [CrossRef]
  11. Bargagli-Stoffi, F.J.; Tortu, C.; Forastiere, L. Heterogeneous treatment and spillover effects under clustered network interference. Ann. Appl. Stat. 2025, 19, 28–55. [Google Scholar] [CrossRef]
  12. Bong, H.; Fogarty, C.B.; Levina, E.; Zhu, J. Heterogeneous treatment effects under network interference: A nonparametric approach based on node connectivity. arXiv 2024, arXiv:2410.11797. [Google Scholar]
  13. Lee, C.; Zeng, D.; Hudgens, M.G. Efficient nonparametric estimation of stochastic policy effects with clustered interference. J. Am. Stat. Assoc. 2025, 120, 382–394. [Google Scholar] [CrossRef]
  14. Buchanan, A.L.; Hernández-Ramírez, R.U.; Lok, J.J.; Vermund, S.H.; Friedman, S.R.; Forastiere, L.; Spiegelman, D. Assessing direct and spillover effects of intervention packages in network-randomized studies. Epidemiology 2024, 35, 481–488. [Google Scholar] [CrossRef]
  15. Banerjee, A.; Breza, E.; Chandrasekhar, A.G.; Duflo, E.; Jackson, M.O.; Kinnan, C. Changes in social network structure in response to exposure to formal credit markets. Rev. Econ. Stud. 2024, 91, 1331–1372. [Google Scholar] [CrossRef]
  16. Zeng, M.; Jia, Z.; Sui, Z.; Xu, J.; Zhang, H. Causal inference with outcome dependent sampling and mismeasured outcome. arXiv 2023, arXiv:2309.11764. [Google Scholar]
  17. Shu, D.; Yi, G.Y. Weighted causal inference methods with mismeasured covariates and misclassified outcomes. Stat. Med. 2019, 38, 1835–1854. [Google Scholar] [CrossRef] [PubMed]
  18. Wei, S.; Zhang, C.; Geng, Z.; Luo, S. Identifiability and estimation for potential-outcome means with misclassified outcomes. Mathematics 2024, 12, 2801. [Google Scholar] [CrossRef]
  19. Hausman, J.A.; Abrevaya, J.; Scott-Morton, F.M. Misclassification of the dependent variable in a discrete-response setting. J. Econom. 1998, 87, 239–269. [Google Scholar] [CrossRef]
  20. Kojevnikov, D. The bootstrap for network dependent processes. arXiv 2021, arXiv:2101.12312. [Google Scholar]
  21. Erdős, P.; Rényi, A. On random graphs I. Publ. Math. Debr. 1959, 6, 290–297. [Google Scholar] [CrossRef]
  22. Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef]
  23. Breza, E.; Chandrasekhar, A.G.; McCormick, T.H.; Pan, M. Using aggregated relational data to feasibly identify network structure without network data. Am. Econ. Rev. 2020, 110, 2454–2484. [Google Scholar] [CrossRef] [PubMed]
  24. Lubold, S.; Chandrasekhar, A.G.; McCormick, T.H. Identifying the latent space geometry of network models through analysis of curvature. J. R. Stat. Soc. Ser. B–Stat. Methodol. 2023, 85, 240–292. [Google Scholar] [CrossRef]
  25. Lambotte, M. Peer effects in binary outcomes: Strategic complementarity and taste for conformity with endogenous networks. J. Appl. Econom. 2025, 40, 608–626. [Google Scholar] [CrossRef]
  26. Newey, W.K.; McFadden, D. Large sample estimation and hypothesis testing. Handb. Econom. 1994, 4, 2111–2245. [Google Scholar]
Figure 1. An illustrative social network from a representative village.
Figure 1. An illustrative social network from a representative village.
Mathematics 14 01241 g001
Table 1. Summary statistics of simulated networks.
Table 1. Summary statistics of simulated networks.
NetworksNMeanSDMax
ER20005.99872.444116.0030
40005.99912.446416.6360
80005.99852.447317.2880
BA20005.99107.5321138.0380
40005.99558.0531195.3080
80005.99778.5512276.6300
Notes: The table reports a summary of the statistics of the node degrees in the simulated networks. The columns report the mean, standard deviation (SD), and maximum degree.
Table 2. MAE of the estimated misclassification probabilities across network designs.
Table 2. MAE of the estimated misclassification probabilities across network designs.
Networks ( π 01 , π 10 ) 200040008000
ER(0.01, 0.01)0.00320.00220.0015
(0.01, 0.10)0.00590.00370.0027
(0.05, 0.10)0.00890.00520.0038
(0.10, 0.01)0.00680.00440.0030
(0.10, 0.05)0.00950.00560.0039
BA(0.01, 0.01)0.00310.00210.0015
(0.01, 0.10)0.00600.00380.0027
(0.05, 0.10)0.00960.00520.0038
(0.10, 0.01)0.00680.00430.0028
(0.10, 0.05)0.01000.00550.0037
Notes: The table reports the median across Monte Carlo replications of 1 2 | π ^ 01 π 01 | + | π ^ 10 π 10 | , which corresponds to the median absolute error (MAE) of the estimated misclassification probabilities. Columns correspond to different sample sizes N.
Table 3. MAE of β estimates under different misclassification probabilities in ER networks.
Table 3. MAE of β estimates under different misclassification probabilities in ER networks.
( π 01 , π 10 ) NOracleNaiveNet-MC
( 0.01 , 0.01 ) 20000.13590.43940.1561
40000.08010.45380.0888
80000.05140.46160.0570
( 0.01 , 0.10 ) 20000.13590.71390.2254
40000.08010.71940.1105
80000.05140.72160.0698
( 0.05 , 0.10 ) 20000.13590.75300.3246
40000.08010.75750.1305
80000.05140.75870.0790
( 0.10 , 0.01 ) 20000.13590.67430.2272
40000.08010.67920.1085
80000.05140.68260.0681
( 0.10 , 0.05 ) 20000.13590.74050.3363
40000.08010.74460.1296
80000.05140.74560.0778
Notes: The table reports the median absolute error (MAE) of the estimates for the coefficient vector β across the Monte Carlo replications. Oracle denotes the infeasible probit estimator based on the latent outcome Y i , which is invariant to misclassification probabilities. Naive denotes the standard probit estimator applied to the observed outcome Y i , ignoring misclassification. Net-MC denotes the proposed misclassification-corrected probit estimator that corrects for outcome misclassification and incorporates network-related variables.
Table 4. Simulation results for ADE estimates under ER networks.
Table 4. Simulation results for ADE estimates under ER networks.
( π 01 , π 10 ) MetricNOracleNaiveNon-Net-MCNet-MC
( 0.01 , 0.01 ) Bias20000.0030−0.01150.29540.0055
40000.0010−0.01170.27700.0013
80000.0006−0.01210.26860.0007
RMSE20000.02510.02770.30540.0331
40000.01520.02120.28280.0171
80000.01070.01730.27210.0117
( 0.01 , 0.10 ) Bias20000.0030−0.02620.29940.0100
40000.0010−0.02590.27840.0011
80000.0006−0.02560.27430.0012
RMSE20000.02510.03790.30790.0499
40000.01520.03190.28250.0216
80000.01070.02920.27740.0149
( 0.05 , 0.10 ) Bias20000.00300.00460.29910.0097
40000.00100.00510.27970.0010
80000.00060.00530.27690.0014
RMSE20000.02510.03030.30860.0516
40000.01520.02090.28390.0226
80000.01070.01620.27940.0156
( 0.10 , 0.01 ) Bias20000.00300.09710.28370.0078
40000.00100.09680.27300.0018
80000.00060.09580.27030.0008
RMSE20000.02510.10260.29060.0396
40000.01520.09940.27450.0186
80000.01070.09710.27090.0128
( 0.10 , 0.05 ) Bias20000.00300.04750.29170.0097
40000.00100.04730.28440.0017
80000.00060.04670.27440.0009
RMSE20000.02510.05690.29950.0470
40000.01520.05160.28910.0219
80000.01070.04910.28020.0149
Notes: The table reports the bias and root mean squared error (RMSE) of the average direct effect (ADE) estimates. Oracle denotes the infeasible probit estimator based on the latent outcome Y i and is thus invariant to misclassification probabilities. Naive denotes the standard probit estimator applied to the observed outcome Y i , ignoring misclassification. Non-Net-MC refers to a benchmark specification that employs the misclassification-corrected probit estimator but omits network-related covariates. In contrast, Net-MC denotes the proposed misclassification-corrected probit estimator that corrects for outcome misclassification and incorporates network-related variables.
Table 5. Estimated misclassification probabilities for informal borrowing.
Table 5. Estimated misclassification probabilities for informal borrowing.
ParameterEstimate95% Percentile CI
π 01 0.0085[0.0052, 0.2499]
π 10 0.1771[0.0001, 0.2002]
Notes: This table reports the estimated probabilities of outcome misclassification under binary exposure. π 01 is the probability of reporting borrowing when none occurred, and π 10 is the probability of failing to report actual borrowing. The 95% CI refers to the percentile bootstrap confidence interval, based on 1000 village-level bootstrap replications.
Table 6. Estimates of ADE, ASE, and AOE using the Karnataka data under binary exposure.
Table 6. Estimates of ADE, ASE, and AOE using the Karnataka data under binary exposure.
ParameterEstimatorEstimate95% CI
ADENaive0.0153[−0.0103, 0.0448]
Non-Net-MC0.0003[−0.0058, 0.0262]
Net-MC−0.0005[−0.0025, 0.0013]
ASENaive0.0812[0.0501, 0.1147]
Net-MC−0.0085[−0.0101, 0.0023]
AOENaive0.0968[0.0516, 0.1454]
Net-MC−0.0090[−0.0116, 0.0037]
Notes: This table reports point estimates and 95% confidence intervals for the ADE, ASE, and AOE under binary exposure. The 95% CI refers to the percentile bootstrap confidence interval, based on 1000 village-level bootstrap replications.
Table 7. Estimates of ADE, ASE, and AOE using the Karnataka data under proportion-based exposure.
Table 7. Estimates of ADE, ASE, and AOE using the Karnataka data under proportion-based exposure.
ParameterEstimatorEstimate95% CI
ADENaive0.0343[0.0105, 0.0588]
Non-Net-MC0.0003[−0.0054, 0.0530]
Net-MC−0.0008[−0.0026, 0.0020]
ASE( 0.1 )Naive−0.0020[−0.0126, 0.0076]
Net-MC−0.0004[−0.0006, 0.0000]
ASE( 0.3 )Naive−0.0061[−0.0370, 0.0231]
Net-MC−0.0011[−0.0018, 0.0000]
ASE( 0.5 )Naive−0.0101[−0.0600, 0.0391]
Net-MC−0.0019[−0.0030, 0.0001]
ASE( 0.7 )Naive−0.0141[−0.0817, 0.0554]
Net-MC−0.0027[−0.0040, 0.0003]
AOENaive0.0298[−0.0047, 0.0655]
Net-MC−0.0016[−0.0036, 0.0018]
Notes: This table reports the point estimates and 95% percentile confidence intervals for the ADE, ASE, and AOE under proportion-based exposure. The 95% CI refers to the percentile bootstrap confidence interval, based on 1000 village-level bootstrap replications.
Table 8. Estimated ADE, ASE, and AOE with village-level fixed effects under binary exposure.
Table 8. Estimated ADE, ASE, and AOE with village-level fixed effects under binary exposure.
ParameterEstimate95% CI
ADE−0.0021[−0.0057, 0.0042]
ASE−0.0059[−0.0090, 0.0074]
AOE−0.0080[−0.0127, 0.0098]
Notes: This table reports estimates from the Net-MC estimator with village-level fixed effects. The 95% CI refers to the percentile bootstrap confidence interval, based on 1000 village-level bootstrap replications.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liao, Y.; Lin, M. Estimating Network Causal Effects with Misclassified Outcomes: Evidence from Karnataka. Mathematics 2026, 14, 1241. https://doi.org/10.3390/math14081241

AMA Style

Liao Y, Lin M. Estimating Network Causal Effects with Misclassified Outcomes: Evidence from Karnataka. Mathematics. 2026; 14(8):1241. https://doi.org/10.3390/math14081241

Chicago/Turabian Style

Liao, Yaqin, and Ming Lin. 2026. "Estimating Network Causal Effects with Misclassified Outcomes: Evidence from Karnataka" Mathematics 14, no. 8: 1241. https://doi.org/10.3390/math14081241

APA Style

Liao, Y., & Lin, M. (2026). Estimating Network Causal Effects with Misclassified Outcomes: Evidence from Karnataka. Mathematics, 14(8), 1241. https://doi.org/10.3390/math14081241

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop