1. Introduction
Accumulating evidence suggests that social, spatial, and organizational linkages give rise to non-negligible inter-individual interactions. These linkages violate the Stable Unit Treatment Value Assumption and induce network spillovers, whereby one unit’s treatment affects others’ potential outcomes [
1]. For example, Tchetgen Tchetgen et al. [
2] studied the spillovers of prior incarceration for HIV, STIs, and hepatitis C using network ties based on recent sexual or injection partnerships. Hence, credible policy evaluation requires rigorous identification of such spillovers.
In the presence of spillovers, defining each individual’s potential outcome as a function of the entire treatment vector leads to severe dimensionality and identifiability problems. Hence, the literature commonly assumes that an individual’s potential outcome depends only on their own treatment and the treatments of others under a given interference structure. Two canonical frameworks arise: clustered interference, where spillovers are confined within pre-specified groups, as in cluster-randomized trials [
3,
4,
5], and network neighborhood interference, where ties are encoded by an adjacency matrix and exposure mappings (e.g., the number or share of treated neighbors) summarize neighborhood treatments into a scalar, enabling the definition and estimation of causal effects [
6,
7,
8,
9,
10,
11].
Building on these frameworks, a large body of research studied causal inference under interference by addressing a range of distinct challenges through various modeling approaches, including approximate neighborhood interference [
10], heterogeneous spillover effects across network-driven subpopulations [
12], and nonparametric estimation of stochastic policy effects under clustered interference [
13]. Despite addressing different problems, this literature largely shares a common focus on identifying and estimating direct, spillover, and overall effects, typically assuming that binary outcomes are measured without error. However, this assumption may be unrealistic in many empirical contexts. Binary outcomes—such as microfinance loan take-up [
12] and self-reported injection risk behaviors [
14]—are often self-reported or constructed from administrative records and are therefore prone to misclassification. This concern is particularly relevant in our empirical application, which uses microfinance and social network data from [
15], drawn from a sample of villages in Karnataka, India. The data combine a household census with a partial individual census, providing detailed information on household characteristics, economic behavior, and social interactions. As the data rely on self-reported borrowing and multiple sources, informal borrowing is likely subject to measurement error due to reporting errors or recall bias. Measurement errors of this kind are pervasive in applied research and have long been recognized as an important source of bias in econometric analysis. The literature on causal inference without interference likewise emphasizes the importance of correcting for such measurement errors, as neglecting them can lead to biased estimates. For example, Zeng et al. [
16] considered outcome-dependent sampling, Shu and Yi [
17] studied joint covariate–outcome correction in inverse probability weighting, and Wei et al. [
18] developed validation-based efficient estimation.
In this context, identifying causal effects in the presence of both outcome misclassification and network interference remains largely unexplored. To address this gap, we develop a novel framework for estimating average causal effects under such settings with misclassified outcomes. We specify a parametric misclassification mechanism and develop a framework that jointly identifies misclassification probabilities and causal effect parameters within a binary choice model with neighborhood exposure mappings. Monte Carlo simulations evaluate the finite-sample performance of the proposed method and highlight the bias arising from ignoring outcome misclassification or network-related variables. An application to data on rural microfinance and social networks from Karnataka further demonstrated the empirical relevance of these issues, showing that under binary exposure, failure to correct for outcome misclassification can generate spurious spillover and overall effects. In particular, once outcome misclassification is corrected for, we find no evidence of positive spillover or overall effects, whereas an estimator that ignores outcome misclassification suggests statistically significant positive effects. Moreover, omitting network-related variables distorts the decomposition of causal effects and overstates the direct effect. Finally, we assess the robustness of our results to the choice of exposure mapping and to the network unconfoundedness assumption.
The remainder of this paper is organized as follows.
Section 2 introduces the framework for network causal effects;
Section 3 presents identification and estimation;
Section 4 reports the simulations;
Section 5 provides an empirical application; and
Section 6 concludes the paper. Additional proofs and simulation results are provided in
Appendix A.
2. Model Framework
We considered a population of a size N, indexed by . For each individual i, we observed a binary treatment , covariates , and a binary outcome . Let denote the latent (true) outcome, with observed while subject to measurement error. Let be the treatment assignment vector for the population, and let us denote the potential outcome of i under assignment d by .
To make the potential outcomes framework tractable, we adopted the neighborhood interference structure proposed in Forastiere et al. [
8]. Consider a population connected through a known network with an adjacency matrix
. For each unit
i, let
denote its neighbor set and
denote its cardinality. Let
and
denote the treatment vectors for the neighbors and all other units, respectively, and let us define
as the associated exposure mapping. We assumed that interference operated only through this exposure mapping.
Assumption 1 (Neighborhood interference)
. If the exposure mapping yields the same value for two neighbor-treatment assignments, i.e., , then the corresponding potential outcomes are equal: We define the exposure received by unit
i as
, where
denotes the vector of neighbor treatments. Specifically, we define
where
is a binary indicator equal to one if individual
i has at least one treated neighbor and is zero otherwise.
Under Assumption 1, potential outcomes depend only on an individual’s own treatment and the corresponding induced neighborhood exposure:
where
denotes the individual’s own treatment and
denotes the exposure level induced by neighbors’ treatment assignments.
In addition to network exposure, we allowed the covariate vector to include both individual-level characteristics and those of an individual’s neighbors. Specifically, for each unit
i, we observe
where
denotes individual-level characteristics and
summarizes the characteristics of
i’s neighbors. We constructed the neighborhood covariates as the average of the neighbors’ individual covariates
so that neighboring characteristics entered the model as observed controls.
Given the model for potential outcomes and the observed data, we now introduce the identifying assumptions, establish the identification results, and develop estimation procedures for the causal parameters of interest.
3. Identification and Estimation
The parameter of interest is the average dose–response function (ADRF)
which characterizes the average potential outcome under each combination of its own treatment
d and neighborhood exposure
z. We first define the average direct effect (ADE) as
which captures the average effect of its own treatment averaged over the distribution of neighborhood exposure. Next, the average spillover effect (ASE) for the exposure level
z is defined as
which quantifies the average change in potential outcomes induced solely by shifting the neighborhood exposure from zero to
z while marginalizing over the distribution of an individual’s own treatment status. Finally, the average overall effect (AOE) is defined as
which represents the total effect of moving from no treatment and no exposure to treatment with exposure.
Since potential outcomes are unobserved, we next introduce identifying assumptions, drawing on the framework of Forastiere et al. [
8].
Assumption 2 (Causal framework assumptions)
. - (A1)
Consistency: .
- (A2)
Network unconfoundedness: for all and .
- (A3)
Positivity: for all and .
Together, Assumption 2 (A1–A3) ensures identification of a well-defined spillover effect in our setting. Assumption 2 (A1) links the observed outcomes to the relevant potential outcomes. Assumption 2 (A2) requires that, depending on the covariates, neither an individual’s treatment nor the treatment status of their neighbors is driven by unobserved factors that also affect the potential outcomes. Assumption 2 (A3) guaranties sufficient overlap so that all treatment–exposure combinations occur with positive probability in the data.
Under Assumption 2 (A1–A3), by letting
, the population ADRF can be written as follows:
Identification of the ADRF relies on accurate measurement of the outcome variable
. When
is subject to misclassification, conditional expectations based on the observed outcome may be systematically biased, thereby affecting the identification and estimation of
. Let
. We assume that the latent outcome
follows a binary choice model:
where
is a known and strictly increasing cumulative distribution function (CDF), such as
in the probit model, where
denotes the standard normal CDF.
Let and denote the false positive rate and false negative rate, respectively. The parameter represents the probability that a positive outcome is recorded when the true outcome is zero, while represents the probability that a true positive outcome is recorded as zero. To characterize the dependence structure of the latent outcomes and the measurement process, we further imposed the following assumptions.
Assumption 3 (Conditional independence)
. Conditional on , the latent outcomes are independent across individuals. In addition, conditional on , the misclassification process generating from is independent across individuals.
Assumption 4 (Nondifferential misclassification)
. Conditional on , the observed outcome is independent of the treatment, exposure, and covariates: Under Assumption 4, the conditional probability of the observed outcome can be written as follows. The derivation is provided in
Appendix A.1:
Let
Under Assumption 3, the likelihood function is given by
where
collects all the model parameters.
To ensure identification, we imposed the following condition on the misclassification probabilities. Together with standard regularity conditions for the covariates, this condition yields identification of the parameter vector, as formalized in Theorem 1.
Assumption 5 (Monotonicity condition)
. .
Theorem 1. Suppose that is the standard normal CDF. Assume that , the support of contains a nonempty open subset of , and Assumptions 1–5 hold. Then, the parameter vector θ is identified, and the expected log-likelihood is uniquely maximized at the true parameter .
The result builds on arguments similar to those in Hausman et al. [
19]. For completeness, we provide a proof of Theorem 1 in
Appendix A.2, explicitly stating the support conditions required for identification. In particular, we assumed that the support of the covariates contained a nonempty open set, which ensured sufficient variation so that the conditional mean function was identified. Although Theorem 1 is stated under the probit model, the identification argument extends to other strictly increasing and continuously differentiable link functions (such as the logistic link) under additional regularity conditions.
We restricted the parameter space to
and estimated
by maximizing the log-likelihood:
We refer to this estimator as the network-based misclassification-corrected (Net-MC) estimator. Based on the estimated parameters, the estimated average dose–response function (ADRF) is constructed as follows:
where
. The average direct effect is estimated as follows:
where
denotes the empirical distribution of the exposure level. The average spillover effect is then computed as follows:
where
denotes the empirical distribution of the treatment. Finally, we estimated the average overall effect as follows:
Given the dependence across individuals induced by the network structure, we conducted inference using bootstrap methods tailored to the data configuration. When the network could be partitioned into independent connected components (e.g., villages or schools), we implemented a cluster bootstrap that resampled components with replacement and recomputed the estimator in each replication (see Forastiere et al. [
8]). When dependence is primarily local rather than cluster-based, a block bootstrap can instead be used to accommodate network dependence (see Kojevnikov [
20]). Standard errors and confidence intervals were obtained from the bootstrap.
4. Monte Carlo Simulations
We evaluated the finite-sample performance of the proposed method through a series of Monte Carlo simulations. The experiments examined performance across varying network structures and investigated the consequences of ignoring outcome misclassification in the presence of network interference.
We considered undirected networks of a size N with an adjacency matrix A. To examine how network topology affected the performance of the estimators, we generated networks from two canonical models that differed in their link formation mechanisms and the degree distributions they induced. These designs allowed us to compare settings with relatively homogeneous degree distributions to those with substantial degree heterogeneity:
ER networks: As a benchmark design, the network was generated from an Erdős–Rényi random graph [
21], in which each pair of distinct nodes
was independently connected with a probability of
. This yielded a sparse network with an expected average degree approximately equal to 6. The Erdős–Rényi model produces relatively homogeneous connectivity, with node degrees concentrated around their mean and limited variation across individuals.
BA networks: To introduce substantial degree heterogeneity, we generated networks using the Barabási–Albert model [
22]. Starting from a small connected core, each new node attached to
existing nodes with a probability proportional to their degrees. This produced networks with an average degree approximately equal to 6. Unlike the Erdős–Rényi model, the Barabási–Albert process generates a heavy-tailed degree distribution characterized by a few highly connected hub nodes and many sparsely connected ones.
Given the realized network, individual-level covariates were generated as
. Let
denote the number of neighbors of unit
i. We defined the neighborhood covariate as the average of the neighbors’ covariates:
and defined the full covariate vector as
.
We next specify the treatment assignment mechanism. Treatment was generated according to a logistic model
so that treatment assignment depended on the observed covariates. Neighborhood exposure is defined as
indicating whether at least one neighbor is treated. Conditional on the treatment and exposure, the latent outcome was generated as follows:
Finally, we introduce outcome misclassification. Let
denote the false positive rate and false negative rate, respectively. These parameters represent the probabilities that the observed outcome differs from the true outcome due to misclassification. Conditional on the true outcome
, the misclassification indicator satisfies
The observed outcome is then constructed as follows:
We conducted 1000 Monte Carlo replications with sample sizes and the covariate dimension . Although the two network designs were calibrated to have a similar average degree, they differed in their higher-order structural properties.
Table 1 reports a summary of the statistics of the simulated networks used in the Monte Carlo experiments. For the ER networks, the mean degree was approximately six for all sample sizes, with relatively small dispersion and moderate maximum degrees. In contrast, the BA networks exhibited substantially greater heterogeneity in the degree distribution, as reflected in the much larger standard deviations and maximum degrees. This difference reflects the presence of highly connected hub nodes characteristic of scale-free networks.
Table 2 reports the median absolute error (MAE) of the estimated misclassification probabilities
for the proposed Net-MC estimator across different misclassification rates and sample sizes. For both the ER and BA networks, the MAE decreased steadily as
N increased from 2000 to 8000, indicating improved estimation accuracy with larger samples. The estimation performance was also quite similar across the two network designs, suggesting that the proposed method is robust to differences in network structure. As expected, the MAE tended to be slightly larger when the underlying misclassification probabilities were higher; however, the estimation error remained small and declined consistently with the sample size. Overall, these results indicate that the proposed method yields accurate and stable estimation of the misclassification probabilities.
Table 3 reports the median absolute error (MAE) of the estimates of
and compares the finite-sample performance of three estimators corresponding to different estimation methods:
Oracle: the infeasible probit estimator based on the latent outcome , which served as a benchmark and was invariant to misclassification probabilities.
Naive: the standard probit estimator applied to the observed outcome , ignoring outcome misclassification.
Net-MC: the proposed misclassification-corrected probit estimator that corrects for outcome misclassification and incorporates network-related variables.
As expected, the Oracle estimator achieved the smallest MAE across all settings. In contrast, the Naive estimator exhibited substantially larger errors in the presence of outcome misclassification, indicating that failure to correct for misclassification leads to substantial estimation errors. The proposed Net-MC estimator substantially outperformed the Naive estimator across all misclassification probabilities, with MAEs that were markedly smaller and approached those of the Oracle estimator as the sample size increased. Overall, these results demonstrate that explicitly correcting for outcome misclassification, together with incorporating network-related variables, led to more accurate estimation of the coefficient vector .
Table 4 reports the simulation results for the ADE estimates under ER networks, evaluated in terms of bias and root mean squared error (RMSE). In addition to the three estimators considered above, we considered a benchmark specification, namely the non-network misclassification-corrected (Non-Net-MC) estimator, which applies the misclassification-corrected probit estimator while omitting neighbor exposure and network-related covariates. The results show that this benchmark exhibited substantial bias, underscoring the importance of accounting for network spillovers in causal inference. We report the results for Non-Net-MC estimator only for the ADE analysis. By construction, it does not incorporate variation in neighbors’ variables and therefore precludes the identification of network-related effects. Consequently, neither the average spillover effect (ASE) nor the average overall effect (AOE) was well defined under this specification. Among the feasible estimators, the proposed Net-MC estimator yielded ADE estimates with the smallest bias and RMSE, indicating that jointly accounting for outcome misclassification and network spillovers is crucial for accurate estimation of the ADE.
The results for the ASE and AOE under ER networks were qualitatively similar to those under BA networks, and they are reported in
Table A1,
Table A2,
Table A3,
Table A4,
Table A5 and
Table A6 in
Appendix A.3. Overall, the results were consistent across all settings. The proposed Net-MC estimator yielded ASE and AOE estimates with a uniformly smaller bias and RMSE than the Naive estimator, and its performance approached that of the Oracle estimator as the sample size increased. In contrast, estimators that failed to correct for outcome misclassification or to account for network-related variables produced substantially biased estimates. These findings underscore the importance of jointly correcting for outcome misclassification and accounting for network-related variables to obtain credible and reliable causal inference.
5. Empirical Analysis
5.1. Data Description
Our empirical analysis used village-level microfinance and social network data compiled by Banerjee et al. [
15]. The dataset covers 75 villages in Karnataka, India, of which the microfinance institution Bharatha Swamukti Samsthe (BSS) operated in 43 villages. It combines a household census with a partial individual census, providing detailed information on household characteristics, economic behavior, and social interactions. This dataset is particularly well suited for our analysis, as our research question required data that jointly capture treatment, network relationships, covariates, and binary outcomes, variables that are rarely jointly observed in empirical settings. It provides detailed information on both the social network structure and household-level borrowing, thereby meeting the key data requirements for studying network spillover effects in the presence of outcome misclassification. The dataset has also been widely used in the literature on social networks and economic behavior (e.g., Breza et al. [
23], Lubold et al. [
24], Lambotte [
25]), supporting its reliability and empirical relevance. In addition, microfinance participation and informal borrowing behavior are inherently shaped by social interactions, as information diffusion and peer effects play an important role in household financial decision making.
The network adjacency matrix
was constructed by following Banerjee et al. [
15]. In particular, an undirected link between two households was defined as the union of reported visits, where a connection was formed if either household reported visiting the other. This definition is intended to capture economically meaningful relationships through which financial assistance and information may flow. Let
denote whether household
i participates in the BSS program, and let
indicate whether the household engages in informal borrowing, including loans from moneylenders, relatives, or friends. Following Banerjee et al. [
15], we controlled for 11 baseline household characteristics that may affect both microfinance participation and borrowing behavior. These variables included the presence of an eligible female household member (age 18–57), general or OBC caste status, whether any household member served as a BSS leader, household size, the number of rooms and beds, access to a latrine and electricity, housing tenure status, and indicators for an RCC roof and a thatched roof. Together, these covariates capture key dimensions of household demographics, socioeconomic status, and housing quality that may influence both program participation and borrowing behavior.
The sample consisted of 8080 households across 43 BSS villages. There was considerable variation in village-level coverage, with complete-case sample sizes ranging from a minimum of 75 to a maximum of 305 households per village, with a mean of 187.91 and a median of 184.00. The village social networks exhibited substantial heterogeneity in connectivity. Household degrees ranged from 0 to 49, with an average of approximately nine social connections. This structural diversity indicates significant variation in the potential exposure to treated neighbors, which is essential for identifying spillover effects.
Figure 1 presents the social network in a representative village. The network exhibited both densely connected clusters and households occupying more central positions. For visual clarity, the figure displays only the largest connected component of the village network.
5.2. Empirical Results
Because the dataset combines multiple sources and relies on self-reported borrowing, the informal credit variable may be subject to measurement error due to reporting mistakes or recall bias. To address this issue, we allowed for potential misclassification of the binary outcome in estimating causal effects. We primarily report the estimates and 95% percentile confidence intervals (CIs) for the misclassification probabilities of informal borrowing, as well as the key causal effect parameters, under binary exposure. These confidence intervals were obtained from 1000 cluster bootstrap replications, in which villages were resampled with replacement.
Table 5 reports the estimated misclassification probabilities for the informal borrowing outcome obtained using the proposed Net-MC estimator. False positives (
), defined as recording borrowing when none occurred, were extremely rare (0.0085). In contrast, false negatives (
), defined as failing to report borrowing when it did occur, were substantially more common (0.1771). This asymmetry suggests that informal borrowing is more likely to be underreported than overreported in survey data.
Table 6 reports the ADE, ASE, and AOE estimates based on alternative estimators. For the ADE, we reported estimates from three estimators: Naive, Non-Net-MC, and Net-MC. The Non-Net-MC estimator does not incorporate variation in network-related variables and therefore does not identify spillover effects; accordingly, the ASE and AOE were reported only for the Naive and Net-MC estimators.
For the ADE, the Non-Net-MC estimator yielded an estimate of 0.0003, which is slightly larger than the corresponding estimate from the Net-MC estimator. Once network-related variables were accounted for, the estimated ADE was substantially reduced, taking values of 0.0153 under the Naive estimator and −0.0005 under the Net-MC estimator. Although all three estimates were statistically insignificant, the comparison remains informative. The Non-Net-MC estimator tended to overstate the ADE when network-related variables were excluded. A similar pattern emerged for the spillover-related effects. Under the Naive estimator, the ASE and AOE were 0.0812 and 0.0968, respectively, both of which were statistically significant. In contrast, once outcome misclassification was corrected for, the spillover and overall effects became negligible and statistically insignificant, with the estimated ASE and AOE equal to −0.0085 and −0.0090, respectively, under the Net-MC estimator.
Taken together, these results point to two distinct sources of bias in the empirical settings with network interactions. First, comparing the Non-Net-MC estimator with those that incorporate network-related variables shows that the omission of neighbor exposure and covariates can distort the decomposition of causal effects, attributing part of the spillover effect to the direct effect and thereby inflating the estimated ADE. Second, comparing the Naive and Net-MC estimators indicates that failure to correct for outcome misclassification can generate spurious evidence of spillover and overall effects, substantially inflating both their magnitude and statistical significance. Overall, these findings underscore the importance of jointly accounting for network-related variables and correcting for outcome misclassification in order to obtain credible and unbiased estimates of causal effects.
5.3. Robustness Checks
In this subsection, we examine the robustness of our results to the modeling choices and identifying assumptions underlying the baseline analysis. We begin by replacing the binary exposure with a proportion-based exposure, defined as the fraction of treated neighbors, and examine whether our main conclusions continue to hold. Formally, the proportion-based exposure is defined as follows:
For isolated individuals with no social ties (
),
was set to zero.
In the analysis based on proportion-based exposure, we selected
,
,
, and
as representative exposure levels according to several quantiles of the neighboring treatment proportion
Z and estimated the corresponding causal effects at each level. The results are reported in
Table 7. Compared with the results under the binary exposure specification (
Table 6), the estimates were not fully identical across the two exposure definitions. In particular, under the Naive estimator, the spillover effect exhibited substantial differences; under binary exposure, the Naive estimate of the ASE was significantly positive, whereas under proportion-based exposure, the corresponding Naive estimates at different exposure levels were all close to zero and slightly negative. This suggests that the Naive estimator is sensitive to the definition of exposure.
It is worth noting that the Naive estimator is based directly on the observed outcome and does not correct for outcome misclassification. Therefore, the difference between the Naive and corrected estimates should not be interpreted solely as arising from the change in the exposure definition. By contrast, the Net-MC estimator exhibited a more stable pattern across the two exposure specifications, with the estimates of the ADE, ASE, and AOE all remaining statistically insignificant. Moreover, under proportion-based exposure, the Net-MC estimates of the ASE declined gradually from to as the exposure level increased from to , indicating a mildly stronger negative spillover effect at higher exposure levels. However, the magnitude of this pattern remained small, and the statistical evidence was limited.
Overall, this robustness check indicates that although the Naive estimator varied across exposure definitions, the main conclusion based on the Net-MC estimator remained largely unchanged. The estimated direct, spillover, and overall effects were generally small and statistically insignificant. Therefore, our core empirical findings appear reasonably robust to alternative exposure definitions, although this robustness is primarily reflected in the corrected estimates rather than in the uncorrected estimates.
Beyond examining the robustness of our results to the exposure definition, we next turn to another potential concern. Because identification in our observational network setting relies on a network unconfoundedness assumption, our empirical findings may still be affected by omitted factors both observed and unobserved. In particular, the household treatment status and neighborhood exposure may be correlated with characteristics that are not fully accounted for by the available baseline covariates. To assess the sensitivity of our results to this concern, we conducted an additional robustness check. Although this issue cannot be fully ruled out empirically, this analysis allowed for a more cautious and ultimately more persuasive interpretation of the empirical findings.
As a diagnostic, we augmented the baseline specification with the village-level fixed effects. Specifically, we replaced the latent index model with
where
denotes an indicator equal to one if household
i belongs to village
k and the first village is omitted as the reference category. With the village-level fixed effects, the comparison was made among households within the same village. This means that the estimates were not driven by average differences across villages but by differences among households living in the same village.
This analysis is particularly relevant in our context because both treatment participation and network exposure may be influenced by village-level characteristics. The results are reported in
Table 8. Compared with the baseline Net-MC results under binary exposure, the estimates obtained after including the village-level fixed effects remained broadly similar in both sign and magnitude. Specifically, the estimated ADE, ASE, and AOE were all statistically insignificant. These findings suggest that our main results were not driven primarily by omitted village-level factors, and they provide further support for the conclusion that the direct, spillover, and overall effects are all statistically insignificant after correcting for outcome misclassification.
6. Conclusions
This paper studied how outcome misclassification affects the estimation of causal effects in network settings. We modeled interference through exposure mappings and adopted a parametric binary choice framework that allowed the misclassification probabilities and causal effect parameters to be jointly identified.
Monte Carlo simulations showed that estimators that ignore outcome misclassification or network-related variables can suffer from substantial bias, whereas the proposed method performed well with finite samples. The empirical analysis based on data from Karnataka further highlights, under binary exposure, two distinct sources of bias in network settings that help explain these differences. First, omitting network-related variables can distort the decomposition of causal effects and lead to an overstatement of the direct effect. Second, failure to correct for outcome misclassification can generate misleading evidence of spillover and overall effects. Once outcome misclassification was corrected for, we found no evidence of spillover or overall effects of the neighbors’ microcredit participation on households’ informal borrowing. Taken together, these results underscore that credible estimation of causal effects in network settings requires jointly accounting for network-related variables and correcting for outcome misclassification.
Finally, we assessed the robustness of our results to the choice of exposure mapping and the network unconfoundedness assumption. Although our empirical analysis delivered a clear and robust message, its causal interpretation remains conditional on the plausibility of several core assumptions, including the parametric link specification, the exposure mapping, and the network unconfoundedness assumption. In addition, our analysis assumed nondifferential misclassification, under which the misclassification error was independent of the observed covariates. Relaxing this assumption by allowing misclassification probabilities to vary with the observed characteristics would introduce additional identification challenges. Addressing these challenges is an important direction for future research and would further broaden the scope of causal inference with misclassified outcomes in network settings.
While our empirical application was based on a specific regional context, the framework we developed is not restricted to that setting. More generally, in empirical applications where network relationships, treatment, and outcome measures are available, the variables required for our approach can typically be constructed. Our framework therefore provides a general tool for evaluating causal effects in network environments, particularly when outcome misclassification is present.