Figure 1.
Conceptual framework mapping Sen’s Capability Approach to rural water access. Level 1 resources and infrastructure (left) are mediated through four capability dimensions—physical accessibility (C1), good health (C2), time and energy (C3), and living conditions (C4)—to produce the observed functioning of drinking water access (right). The theoretical framework encompasses eight candidate indicators; the final operationalization retains five after empirical screening (see
Section 3.3.1).
Figure 1.
Conceptual framework mapping Sen’s Capability Approach to rural water access. Level 1 resources and infrastructure (left) are mediated through four capability dimensions—physical accessibility (C1), good health (C2), time and energy (C3), and living conditions (C4)—to produce the observed functioning of drinking water access (right). The theoretical framework encompasses eight candidate indicators; the final operationalization retains five after empirical screening (see
Section 3.3.1).
Figure 2.
Summary of core analytical results. (a) Capability-to-functioning relationship: aggregate capability score versus observed water access, with smoothed trend line. (b) Variance explained by three approaches: OLS prediction (), best cross-validated model (), and capability-tier clustering (). (c) Three-way variance decomposition: infrastructure (27.4%), provincial context (34.1%), and commune-level unmeasured factors (38.5%).
Figure 2.
Summary of core analytical results. (a) Capability-to-functioning relationship: aggregate capability score versus observed water access, with smoothed trend line. (b) Variance explained by three approaches: OLS prediction (), best cross-validated model (), and capability-tier clustering (). (c) Three-way variance decomposition: infrastructure (27.4%), provincial context (34.1%), and commune-level unmeasured factors (38.5%).
Figure 3.
Correlation matrix of infrastructure features and water access. Waste collection and secure housing show the strongest positive associations with water access (). Feature inter-correlations are moderate, confirming that the five variables capture complementary dimensions.
Figure 3.
Correlation matrix of infrastructure features and water access. Waste collection and secure housing show the strongest positive associations with water access (). Feature inter-correlations are moderate, confirming that the five variables capture complementary dimensions.
Figure 4.
Optimal k selection. (a) Internal validation metrics across . (b) Composite score for . (c) Elbow criterion (WCSS vs. k). achieves the highest composite score (0.867). is excluded to avoid trivial binary partitioning inconsistent with capability theory.
Figure 4.
Optimal k selection. (a) Internal validation metrics across . (b) Composite score for . (c) Elbow criterion (WCSS vs. k). achieves the highest composite score (0.867). is excluded to avoid trivial binary partitioning inconsistent with capability theory.
Figure 5.
Capability radar profiles for three clusters. The High-Capability cluster dominates across all four dimensions. The Moderate–High and Transitional clusters differ sharply on the Good Health dimension (wastewater and waste collection), yet achieve similar water outcomes, demonstrating that clusters capture infrastructure profiles rather than functioning outcomes directly.
Figure 5.
Capability radar profiles for three clusters. The High-Capability cluster dominates across all four dimensions. The Moderate–High and Transitional clusters differ sharply on the Good Health dimension (wastewater and waste collection), yet achieve similar water outcomes, demonstrating that clusters capture infrastructure profiles rather than functioning outcomes directly.
Figure 6.
Water access distributions by cluster. (a) Mean access with standard deviation bars. (b) Box plots revealing extensive within-cluster variance, particularly for Moderate–High and Transitional clusters. The overlap between C2 and C3 (Cohen’s ) reflects the capability framework’s prediction that similar infrastructure can produce widely different outcomes.
Figure 6.
Water access distributions by cluster. (a) Mean access with standard deviation bars. (b) Box plots revealing extensive within-cluster variance, particularly for Moderate–High and Transitional clusters. The overlap between C2 and C3 (Cohen’s ) reflects the capability framework’s prediction that similar infrastructure can produce widely different outcomes.
Figure 7.
Cluster centroids heatmap combining Level 1 resource variables (normalized) and Level 2 capability dimensions. The High-Capability cluster scores highest across all dimensions, while the Transitional cluster shows a distinctive deficit in wastewater-related variables.
Figure 7.
Cluster centroids heatmap combining Level 1 resource variables (normalized) and Level 2 capability dimensions. The High-Capability cluster scores highest across all dimensions, while the Transitional cluster shows a distinctive deficit in wastewater-related variables.
Figure 8.
Principal component analysis visualization of clusters. Stars indicate centroids. Adjacent clusters show overlap, consistent with the continuous nature of infrastructure variation. The first two principal components jointly capture the majority of variance in the five infrastructure features.
Figure 8.
Principal component analysis visualization of clusters. Stars indicate centroids. Adjacent clusters show overlap, consistent with the continuous nature of infrastructure variation. The first two principal components jointly capture the majority of variance in the five infrastructure features.
Figure 9.
Silhouette analysis for the clustering solution. Mean silhouette coefficient . The High-Capability cluster shows the strongest profile. Intermediate clusters exhibit moderate boundary ambiguity, reflecting the continuous nature of infrastructure variation across rural communes.
Figure 9.
Silhouette analysis for the clustering solution. Mean silhouette coefficient . The High-Capability cluster shows the strongest profile. Intermediate clusters exhibit moderate boundary ambiguity, reflecting the continuous nature of infrastructure variation across rural communes.
Figure 10.
Infrastructure-to-water prediction. (a) Observed vs. predicted water access (out-of-fold), colored by cluster. The scatter around the identity line reflects the 72.6% of variance not explained by infrastructure features. (b) Residual distribution across clusters.
Figure 10.
Infrastructure-to-water prediction. (a) Observed vs. predicted water access (out-of-fold), colored by cluster. The scatter around the identity line reflects the 72.6% of variance not explained by infrastructure features. (b) Residual distribution across clusters.
Figure 11.
Permutation feature importance from the OLS model (30 repeats). Waste collection and its proximity interaction dominate, followed by housing and electricity interactions. Error bars show standard deviation across permutations.
Figure 11.
Permutation feature importance from the OLS model (30 repeats). Waste collection and its proximity interaction dominate, followed by housing and electricity interactions. Error bars show standard deviation across permutations.
Figure 12.
Partial dependence profiles (Random Forest on five raw features, used for non-linearity detection). Wastewater evacuation (linearity ) and secure housing (linearity ) exhibit non-linear relationships with water access, suggesting threshold effects.
Figure 12.
Partial dependence profiles (Random Forest on five raw features, used for non-linearity detection). Wastewater evacuation (linearity ) and secure housing (linearity ) exhibit non-linear relationships with water access, suggesting threshold effects.
Figure 13.
Three-way variance decomposition. Infrastructure explains 27.4%, provincial context accounts for 34.1%, and commune-level unmeasured factors contribute 38.5%. The decomposition is non-circular (no information leakage between components), although the components are not strictly orthogonal because infrastructure varies systematically across provinces.
Figure 13.
Three-way variance decomposition. Infrastructure explains 27.4%, provincial context accounts for 34.1%, and commune-level unmeasured factors contribute 38.5%. The decomposition is non-circular (no information leakage between components), although the components are not strictly orthogonal because infrastructure varies systematically across provinces.
Figure 14.
Spatial structure analysis. Province-level clustering of water access () and prediction residuals () remain nearly identical, indicating that infrastructure variables explain variance roughly uniformly across provinces rather than differentially capturing spatial structure. Note: Province membership defines the weight matrix; these statistics quantify province-block clustering rather than geographic-distance-based spatial autocorrelation. *** denotes .
Figure 14.
Spatial structure analysis. Province-level clustering of water access () and prediction residuals () remain nearly identical, indicating that infrastructure variables explain variance roughly uniformly across provinces rather than differentially capturing spatial structure. Note: Province membership defines the weight matrix; these statistics quantify province-block clustering rather than geographic-distance-based spatial autocorrelation. *** denotes .
Figure 15.
Theil inequality decomposition. (a) Between-cluster (10.1%) vs. within-cluster (89.9%) shares. (b) Lorenz curves by cluster. (c) Cluster-specific inequality metrics (Theil-T, Gini, CV) by capability tier. The Transitional cluster’s Lorenz curve deviates most from the equality line, confirming that conversion efficiency—not infrastructure endowment—drives the bulk of inequality.
Figure 15.
Theil inequality decomposition. (a) Between-cluster (10.1%) vs. within-cluster (89.9%) shares. (b) Lorenz curves by cluster. (c) Cluster-specific inequality metrics (Theil-T, Gini, CV) by capability tier. The Transitional cluster’s Lorenz curve deviates most from the equality line, confirming that conversion efficiency—not infrastructure endowment—drives the bulk of inequality.
Figure 16.
Bottleneck matrix by cluster. (a) Primary bottleneck distribution (%). High-Capability communes have diversified bottlenecks (waste 30%, electricity 27%, distance 24%), while lower-capability communes are dominated by waste collection (>89%). (b) Binding frequency (gain of max), revealing co-occurring constraints. The chi-square test confirms significant association between cluster and bottleneck type (, ).
Figure 16.
Bottleneck matrix by cluster. (a) Primary bottleneck distribution (%). High-Capability communes have diversified bottlenecks (waste 30%, electricity 27%, distance 24%), while lower-capability communes are dominated by waste collection (>89%). (b) Binding frequency (gain of max), revealing co-occurring constraints. The chi-square test confirms significant association between cluster and bottleneck type (, ).
Figure 17.
Policy prioritization matrix. Each commune is positioned by capability level (horizontal) and conversion efficiency (vertical). The six archetypes correspond to distinct policy responses. Transform communes (19.9%, water ) require the most intensive intervention, while Diagnose communes (10.0%, water ) have adequate infrastructure but poor conversion, signaling governance rather than investment problems.
Figure 17.
Policy prioritization matrix. Each commune is positioned by capability level (horizontal) and conversion efficiency (vertical). The six archetypes correspond to distinct policy responses. Transform communes (19.9%, water ) require the most intensive intervention, while Diagnose communes (10.0%, water ) have adequate infrastructure but poor conversion, signaling governance rather than investment problems.
Table 1.
[New] variable selection and mapping to capability dimensions.
Table 1.
[New] variable selection and mapping to capability dimensions.
| Variable | Capability Dimension | Theoretical Justification | r with Water | Retained? |
|---|
| : Distance to road (log) | C1: Physical access | Service delivery costs | *** | Yes |
| : Wastewater evac. | C2: Good health | Environmental sanitation | *** | Yes |
| : Waste collection | C2: Good health | Municipal service capacity | *** | Yes |
| : Electricity access | C3: Time and energy | Pumping infrastructure | *** | Yes |
| : Secure housing | C4: Living conditions | Dwelling quality | *** | Yes |
| Household size | C3: Time and energy | Collection time burden | | No |
| Toilettes | C2: Good health | Collinear with | | No |
| Occupancy status | C4: Living conditions | Weak discriminant power | | No |
Table 2.
Policy archetype definitions.
Table 2.
Policy archetype definitions.
| Archetype | Tier | Residual | Policy Logic |
|---|
| Sustain | High | | High infrastructure, effective conversion. Monitor and maintain existing systems; identify success factors for replication. |
| Diagnose | High | | High infrastructure, poor conversion. Investigate institutional barriers and governance failures; additional infrastructure investment would be wasteful. |
| Scale | Mod–High | | Moderate infrastructure, effective conversion. Replicate and expand what works; these communes demonstrate that modest infrastructure can deliver results with good governance. |
| Consolidate | Mod–High | | Moderate infrastructure, poor conversion. Strengthen institutional capacity alongside targeted infrastructure upgrades. |
| Target | Transitional | | Low infrastructure, effective conversion. Prioritize infrastructure investment where conversion capacity already exists. |
| Transform | Transitional | | Low infrastructure, poor conversion. Comprehensive multi-sector intervention combining infrastructure, governance reform, and community capacity building. |
Table 3.
Descriptive statistics and water access correlations ( communes).
Table 3.
Descriptive statistics and water access correlations ( communes).
| Variable | Mean | Std | Pearson r | Spearman | p-Value |
|---|
| Distance to road (log) | — | — | | | < |
| Wastewater evac. (%) | 14.4 | 11.2 | | | < |
| Waste collection (%) | 16.2 | 12.8 | | | < |
| Electricity (%) | 93.9 | 4.2 | | | |
| Secure housing (%) | 38.4 | 17.8 | | | < |
| Water access (%) | 55.3 | 36.3 | — | — | — |
Table 4.
Cluster validation metrics across candidate k values.
Table 4.
Cluster validation metrics across candidate k values.
| k | Silhouette | DBI | CH | WCSS | Composite |
|---|
| 2 | 0.432 | 0.960 | 1044 | 203 | (excluded) |
| 3 | 0.324 | 1.123 | 951 | 149 | 0.867 |
| 4 | 0.304 | 1.125 | 839 | 125 | 0.606 |
| 5 | 0.295 | 1.112 | 775 | 108 | 0.522 |
| 6 | 0.308 | 1.083 | 763 | 93 | 0.691 |
| 7 | 0.297 | 1.128 | 723 | 85 | 0.426 |
| 8 | 0.269 | 1.166 | 694 | 77 | 0.079 |
| 9 | 0.266 | 1.173 | 660 | 72 | 0.000 |
Table 5.
Cluster characteristics (, ).
Table 5.
Cluster characteristics (, ).
| Cluster | n | % | Water (%) | Std (%) | Avg Cap. | Wastewater |
|---|
| High Capability | 297 | 22.4 | 80.5 | 19.9 | 0.79 | 0.656 |
| Moderate–High | 524 | 39.6 | 50.9 | 35.3 | 0.57 | 0.656 |
| Transitional | 503 | 38.0 | 45.1 | 37.9 | 0.48 | 0.198 |
Table 6.
Model comparison ( repeated cross-validation).
Table 6.
Model comparison ( repeated cross-validation).
| Model | | |
|---|
| OLS (5 raw) | | |
| OLS (16 eng.) | | |
| Ridge (16 eng.) | | |
| RF (5 raw) | | |
| RF (16 eng.) | | |
| GBR (16 eng.) | | |
| ExtraTrees (16 eng.) | | |
Table 7.
Three-way variance decomposition of water access ().
Table 7.
Three-way variance decomposition of water access ().
| Source | Share (%) | Estimation Method |
|---|
| Infrastructure (measurable) | 27.4 | (out-of-fold) |
| Provincial context | 34.1 | ICC of residuals |
| Commune-level unmeasured | 38.5 | Remainder |
| Total | 100.0 | |
Table 8.
Theil inequality decomposition by cluster.
Table 8.
Theil inequality decomposition by cluster.
| Cluster | Theil-T | Gini | CV |
|---|
| High Capability | 0.038 | 0.128 | 0.248 |
| Moderate–High | 0.297 | 0.397 | 0.694 |
| Transitional | 0.431 | 0.473 | 0.842 |
| Overall | 0.282 | 0.371 | — |
| Between-cluster | 0.029 (10.1%) | — | — |
| Within-cluster | 0.254 (89.9%) | — | — |
Table 9.
Infrastructure bottleneck analysis ().
Table 9.
Infrastructure bottleneck analysis ().
| Feature | Primary (%) | Binding (%) | Mean Gain (pp) |
|---|
| Waste collection | 77.1 | 86.6 | +20.7 |
| Electricity | 8.2 | 19.0 | +4.8 |
| Secure housing | 7.5 | 20.2 | +3.5 |
| Distance (log) | 5.5 | 22.5 | +7.4 |
| Wastewater evac. | 1.7 | 10.8 | +5.2 |
Table 10.
Policy prioritization matrix (six archetypes).
Table 10.
Policy prioritization matrix (six archetypes).
| Archetype | n | % | Water | Gap | Recommended Intervention |
|---|
| Sustain | 164 | 12.4 | 91.2 | | Monitor and maintain existing systems |
| Diagnose | 133 | 10.0 | 67.3 | | Identify institutional barriers; governance reform |
| Scale | 240 | 18.1 | 80.4 | | Replicate successful conversion; expand infrastructure |
| Consolidate | 259 | 19.6 | 82.1 | | Strengthen trajectory; prevent regression |
| Target | 265 | 20.0 | 20.3 | | Address specific conversion bottlenecks |
| Transform | 263 | 19.9 | 12.8 | | Comprehensive multi-sector intervention |