A step-by-step empirical evaluation of the integrated dataset and the developed hybrid architecture is presented. First, the distributions of key indicators, intergroup differences, and the feature correlation structure are analyzed to verify the correctness of the preprocessing and the validity of the binary classification problem formulation. Next, the results of comparative training across the baseline, SOTA, and proposed models within a single experimental protocol are presented using F1, ROC-AUC, and integrated quality metrics. Additionally, the error structure, sensitivity to the probability threshold, and factorial interpretation through the permutational importance of features are analyzed. The results obtained allow us to quantify the computational advantage of the hybrid architecture and its applicability for early detection and the prioritization of highly vulnerable areas.
3.1. Analysis of Data Distribution and Statistical Testing of Feature Structure
The primary statistical analysis of the integrated dataset aims to assess its structural integrity and the distributional characteristics of the feature space. The variability of indicators, their consistency across domains, and basic intergroup differences for the target variable are considered. The conducted verification allows us to confirm the correctness of the integration of sources and create a reliable basis for the subsequent construction and comparative evaluation of classification models.
Figure 4 reflects the average number of social categories A–E per territory. It is evident that the distribution structure is markedly heterogeneous: the largest contributions are made by categories B (≈3138) and C (≈2230), followed by D (≈1220), while A (≈435) and, especially, E (≈265) are represented by significantly smaller values.
This configuration means that different combinations of categories form the “social profile” of territories and cannot be reduced to a single type. Therefore, management decisions should be based on targeted prioritization and take into account the relative shares of vulnerable groups (in particular, the D+E components), and not just the absolute population values. From a practical perspective, this indicates that territorial vulnerability cannot be addressed through uniform policy measures. Instead, differentiated intervention strategies are needed that take into account the specific composition of social groups in each territory.
Figure 5 shows the distribution of the
—the share of the population belonging to the most vulnerable social categories D+E—across all territories.
The bulk of observations is concentrated in the low–moderate range (approximately 0.3–0.4). At the same time, on the right, there is a long tail with isolated areas with significantly elevated D+E shares (even reaching extreme values). This asymmetry and pronounced inter-area variability mean that the D+E indicator does differentiate areas by vulnerability, thereby justifying (i) the goal of early identification of areas with a high D+E share, (ii) the formation of a binary target using a quantile threshold, and (iii) the need for priority ranking, since the differences between areas are uneven and of practical significance. In practice, this confirms that the model can be effectively used for the early identification of high-risk areas, allowing policymakers to prioritize regions with a disproportionately high concentration of vulnerable populations.
Table 6 presents the key control metrics for integration quality during the construction of a unified analytical matrix and confirms that the data are suitable for subsequent model training and the calculation of the priority index.
The final indicators presented in
Table 2 confirm the correctness of the analytical matrix formation and the preservation of full coverage of the study objects. The total number of territorial units with social categories A–E is 166, which defines the general population for the analysis and allows for monitoring potential data losses at the stages of cleaning and integration. Correct matching by the KATO administrative identifier was performed for 160 territories; minor discrepancies indicate individual cases of key inconsistency and require the additional normalization of reference books, but do not affect the completeness of the final sample. The target variable is generated for all 166 objects, thereby excluding selection bias due to the lack of annotation and ensuring proper classifier training. Additionally, it was established that each territory contains at least 10 constructed informative features, confirming the sufficient saturation of the rows and the stability of the feature space under multi-source data conditions. The combination of these metrics demonstrates the correctness of the integration, the completeness of the feature representation, and the suitability of the matrix for classification and subsequent territory ranking. In practice, this ensures that the resulting model operates on a complete and representative dataset, reducing the risk of biased decisions caused by the absence or exclusion of certain areas.
Table 7 presents descriptive statistics for key indicators, including the number of observations, mean values, standard deviations, quartiles, and extreme values. These indicators serve an important analytical function by quantitatively confirming the inter-territorial heterogeneity of socioeconomic parameters. At the same time, the obtained statistics substantiate the need for normalization procedures and robust transformations, as well as for the correct interpretation of features in the presence of asymmetric distributions and outliers, thereby ensuring the methodological validity of subsequent modeling.
An analysis of the indicators presented in
Table 7 reveals several statistically and substantively significant patterns characterizing the structure of territorial differences. First, the indicator for the share of vulnerable population categories (D+E) has an average of approximately 0.195, a median of 0.177, and a maximum approaching 1.0, indicating significant variability and right-skewed asymmetry in the distribution. The significant spread of values confirms that this indicator has sufficient discriminatory power and is justifiably used as the basis for binarizing the target variable and subsequently ranking territories. Socioeconomic indices also reflect structural heterogeneity. The employment index is characterized by a relatively low average level with significant inter-territorial variation, indicating differences in the degree of economic participation of the population, even with comparable levels of social vulnerability. The fiscal indicator tax_balance_mln exhibits significant dispersion and extreme values, with a relatively moderate median, which is typical of financial indicators with a high concentration of resources in individual administrative centers. This pattern indicates the need for robust normalization methods and transformations during modeling. Resource indicators exhibit pronounced sparseness. The availability coefficient for available land is zero in most cases, while high values are recorded for a limited number of territories. This demonstrates the indicator’s inherently binary nature, where the information content is determined more by the availability of a resource than by its smooth quantitative scale. This distribution structure justifies the use of a fuzzy interpretation of availability levels. Similarly, infrastructure and tariff factors exhibit strong asymmetry and extreme values, requiring careful processing to prevent biased learning. At the same time, the indicator of social infrastructure coverage shows almost constant values across most territories, indicating limited independent discriminatory power but retaining its value as a contextual indicator in the complex model.
Agroeconomic parameters reflect the large-scale heterogeneity in the production profiles of territories. Significant variations in livestock numbers and the concentration of processing facilities in certain locations confirm the spatial differentiation of resource potential and the presence of industrial hubs. Such distributions strengthen the case for nonlinear analysis methods that account for factor interactions. Investment indicators also demonstrate significant variability, reflecting differences in the sustainability and development potential of the territories. Moreover, some technical characteristics, with virtually constant values, act more as control variables and do not independently contribute to class discrimination. Taken together,
Table 3 confirms that the integrated dataset includes characteristics with varying statistical natures, including symmetric and asymmetric distributions, sparse structures, and outliers. From an applied perspective, these properties highlight the need for robust data preprocessing and justify the use of hybrid modeling approaches capable of handling heterogeneous and nonlinearly distributed data. This diversity of statistical characteristics justifies the chosen strategy of robust preprocessing, the fuzzification of key factors, and the use of ensemble boosting methods that account for nonlinear combinations of characteristics while maintaining interpretable risk and sustainability indices.
Figure 6 presents the SHAP-based feature importance analysis at the global and local levels, providing quantitative insight into the model’s decision-making process. At the global level (left panel), the most influential feature is opv_population_registered, with an average |SHAP| value of 0.95, significantly outperforming all other variables. The second most important feature is project_investment_mln with an average contribution of approximately 0.78, followed by animal_output_total (≈0.72) and crop_population_screening (≈0.55). Infrastructure-related indicators, such as tariff_drinking_business (≈0.47) and labor-related variables (opv_workers ≈0.43), also show significant contributions. The remaining features, including livestock subcategories and land-use indicators, have moderate importance, ranging from 0.25 to 0.40. This distribution indicates a clear dominance of demographic and investment-related factors in the model’s overall behavior. At the local level (right panel), for the selected high-risk area, the opv_population_registered variable contributes approximately +2.5 to the logarithm of the classification odds ratio, making it the main driver of a positive prediction. The next most influential features are crop_population_screening (≈+1.1) and project_investment_mln (≈+0.9). Additional positive contributions are made by opv_workers (≈+0.8) and animal_output_total (≈+0.6). Notably, the fuzzy_risk feature contributes approximately +0.5, confirming its active role in the hybrid model. Other variables, including animal_eggs_output, distance_to_district_km, and livestock_diversity, have smaller but significant contributions, ranging from +0.2 to +0.4. Overall, the quantitative analysis of SHAP shows that while a small subset of features (primarily demographic and economic indicators) dominates globally, local forecasts are shaped by the combined influence of multiple factors. The inclusion of fuzzy_risk among the key factors influencing local forecasts confirms the interpretability and practical relevance of the extended fuzzy feature space.
Figure 7 presents an analysis of the fuzzy resilience component in the proposed model. The left panel shows the distribution of the fuzzy resilience index across all 166 territories, and the right panel presents the activation frequency of resilience rules (s1–s3) for both all territories and a subset of 20 priority territories. As shown in the left panel, the fuzzy resilience index is zero for all observations, with no variation across territories. This result is further confirmed by the right panel, where the activation frequency of all resilience rules (s1, s2, s3) is zero. This indicates that, with the current rule specification and parameterization, the resilience component is not activated for any territory in the dataset. This finding directly addresses the reviewer’s concerns regarding the interpretability of the fuzzy layer. The results indicate that the resilience subcomponent, as currently defined, remains inactive due to overly strict conjunction conditions in the rule base and/or insufficient variability of the relevant input characteristics. As a result, the interpretability of the fuzzy layer in this model is primarily determined by the risk component, which remains fully functional and contributes to both global and local explanations. Importantly, this behavior does not invalidate the overall hybrid framework, but rather points to a limitation in the current specification of the resilience rules. Future work will revisit this component by relaxing rule constraints, redesigning membership functions, or introducing alternative aggregation strategies to ensure the meaningful activation of resilience factors. Overall, the figure clearly demonstrates the internal behavior of the fuzzy layer and supports a more precise interpretation of the model structure, where the risk dimension determines the current contribution of the fuzzy subsystem.
Table 8 presents a sensitivity analysis of the fuzzy membership overlap settings and their impact on the predictive performance of the proposed model. Three configurations were considered: narrow overlap, basic overlap, and wide overlap, which differ in the position of the low-leverage parameter c and the high-leverage parameter a. The results show that the narrow and basic overlap settings provide similar classification performance in terms of accuracy (0.7619), precision (0.6111), recall (0.7857), and F1-score (0.6875), while the basic overlap achieves the highest ROC-AUC value of 0.8316 compared to 0.8291 for the narrow overlap. In contrast, the wide overlap configuration leads to a decrease in the overall classification quality: accuracy decreases to 0.7143, precision to 0.5500, and F1-score to 0.6471, although recall remains unchanged at 0.7857. The selected threshold level also shifts downwards: from 0.24 and 0.23 in the first two settings to 0.12 in the wide overlap condition.
3.2. Application of Analysis Methods to Check the Structure of Prepared Data
Figure 8 presents the Spearman rank correlation matrix
between the basic indicators and fuzzy aggregates, reflecting monotonic (not necessarily linear) dependencies in the data. The analysis shows that for most feature pairs, the absolute values of
are small, indicating low multicollinearity and confirming that the features carry distinct information and can be used together in the model without significant redundancy. The most pronounced relationship is observed between opv_employment_index and the fuzzy components:
for fuzzy_risk and
for fuzzy_priority, consistent with the economic meaning of the employment indicator as a factor that reduces risk and prioritizes sustainability. Additionally, fuzzy_risk and fuzzy_priority are almost perfectly anticorrelated (
), which is the expected consequence of their complementary construction (one value is defined by the other) and confirms the internal logical consistency of the fuzzy layer. Overall, the matrix is used as a QC tool to verify the correctness of feature formation and to identify key relationships that support model interpretation and fuzzy rule selection.
Table 9 evaluates the monotonic relationship between the key features and the target indicator of high vulnerability (y) using Spearman’s coefficient
and tests the statistical significance of this relationship using a
p-value with a sample size of
= 166. The most pronounced and statistically significant association is observed for practical_priority_score
, confirming its meaningful validity as an integral indicator, consistent with the target logic of vulnerability and suitable for ranking territories. Several factors show moderate but significant relationships: project_investment_mln has a negative association
, which is interpreted as a decrease in vulnerability with higher investment activity; tariff_pressure is positively associated with vulnerability
, indicating the role of the tariff burden as a potential risk amplifier; free_land_ratio shows a negative correlation
, which is consistent with the hypothesis of a resource “cushion” of territories in the presence of free land. The remaining features show weak or statistically insignificant correlations at the sample level (e.g., livestock_total_heads:
p = 0.094; opv_employment_index:
p = 0.1389; srs_coverage_pct and tax_balance_mln:
p > 0.4), which does not mean they are useless in the model, since in a multivariate setting they can manifest through nonlinear interactions. The forecast_projects feature (ρ = 0) is effectively a constant and does not contribute discriminatory information regarding the target.
Table 10 presents the results of the nonparametric Kruskal–Wallis test (H statistic) for testing the hypothesis of equal distributions of indicators across vulnerability groups. This test is appropriate when there is potential asymmetry in the distribution and when outliers are present. The biggest difference is demonstrated by practical_priority_score (H = 36.9221,
p ≈< 0), confirming its ability to classify territories by risk levels reliably and justifying the use of this integral indicator for ranking. Significant between-group differences were also found for opv_employment_index (H = 16.8792,
p = 0.0002), consistent with the role of employment/economic inclusion as a factor associated with reducing vulnerability, as well as for project_investment_mln (H = 8.4972,
p = 0.0143) and tariff_pressure (H = 7.0932,
p = 0.0288), indicating statistically significant differentiation of the groups by investment activity and tariff burden.
In contrast, livestock_total_heads, free_land_ratio, crop_processing_capacity_tpy, tax_balance_mln, and srs_coverage_pct yielded insignificant p-values (p > 0.2), indicating no robust differences in vulnerability distributions between the groups in the univariate setting; however, these features may contribute through combinations and nonlinear interactions in the multivariate model. The forecast_projects feature is constant (which formally yields H = 0) and does not provide any distinguishing information. Overall, significant H values for key variables confirm that the features differentiate risk groups, supporting the validity of the classification problem statement and the subsequent ranking of territories.
3.3. Training Models for Calculating the Efficiency of Methods
The models were compared using a single experimental protocol across three solution classes: baseline (classical algorithms), SOTA-based table ensembles, and the proposed approach. Quality assessment was performed on a deferred test set using metrics reflecting both the accuracy of identifying vulnerable areas and the model’s ability to rank risks correctly: F1 (the balance of precision and recall for the positive class), ROC-AUC (discriminatory ability based on probability estimates), and the integral criterion , which aggregates key quality indicators (including F1, AUC, Recall, and Precision) into a single scale for managerial comparison. For the proposed model, the probability binarization threshold was further optimized to achieve a balance between omitting high-risk areas and false positives; the optimal threshold was .
The results show that the Proposed Fuzzy-XGBoost model achieves the best performance, with Test F1 = 0.7333, Test ROC-AUC = 0.8291, and Integrated Score = 0.7680, ranking first in the Integrated Score and thus providing the most favorable tradeoff between detection accuracy and ranking robustness. The closest competitor is AdaBoost with an Integrated Score of 0.7288, indicating the high competitiveness of ensemble methods. However, the proposed approach retains a measurable advantage through the addition of an interpretable fuzzy layer and a more precise decision threshold.
Figure 9 visualizes comparative results across three scales—Test F1, Test ROC-AUC, and Integrated Score for a set of baseline, SOTA-based, and proposed models. The diagram shows that Proposed Fuzzy-XGBoost leads in both F1 and AUC and, most importantly, provides the greatest overall score, demonstrating not a partial advantage in one metric but a stable dominance in the overall criterion directly related to the task of early detection and the prioritization of territories.
Figure 10 shows the confusion matrices (test set, N = 42) for the group of models where the positive class corresponds to High (high vulnerability) and the negative class corresponds to Low. For Proposed Fuzzy-XGBoost, the obtained TN = 23, FP = 5, FN = 3, TP = 11 yields
and Specificity = TN/(TN + FP) = 0.821. This means that the proposed model simultaneously (i) better identifies vulnerable areas (minimizes high-risk missingness due to the low FN) and (ii) maintains acceptable robustness to false alarms (FP control).
Comparison with competitors confirms the advantage of the proposed approach, specifically along the critical management axis of “not missing high risks.” AdaBoost has TN = 23, FP = 5, FN = 4, TP = 10 (Sensitivity ≈ 0.714, Specificity ≈ 0.821), similar specificity, but misses more vulnerable areas. HistGradientBoosting and XGBoost (SOTA baseline) have the same error structure: TN = 24, FP = 4, FN = 6, TP = 8 (Sensitivity ≈ 0.571, Specificity ≈0.857): they “protect” low risks slightly better (higher specificity), but at the cost of a significant increase in FN, which reduces their suitability for the early detection of vulnerable areas. Random Forest (TN = 23, FP = 5, FN = 7, TP = 7) demonstrates even lower sensitivity (≈0.50), that is, it misses half of the high-risk objects. Gradient Boosting (TN = 20, FP = 8, FN = 6, TP = 8) worsens both specificity (≈0.714) and sensitivity (≈0.571). Extra Trees (TN = 25, FP = 3, FN = 8, TP = 6) demonstrates high specificity (≈0.893) but low sensitivity (≈0.429), that is, it “conservatively” avoids false alarms but systematically under-detects high risk. KNN and Logistic Regression also exhibit low sensitivity (≈0.357 and ≈0.429, respectively), making them less preferable for social support prioritization tasks. The main conclusion from the figure is that the proposed model provides the most balanced error structure, emphasizing minimizing FN while maintaining sufficient specificity, which aligns with the applied goal of reliably identifying areas requiring priority support measures.
Figure 11 shows how the classification quality, measured by the F1 metric, changes as the threshold
is varied, transforming the probability
into a binary decision
.
The curve has a pronounced dependence on the threshold. At too low t values, the model begins to label too many areas as “High,” increasing false alarms and decreasing accuracy. At the same time, at too high
values, the model becomes overly conservative, leading to increased missingness (FN) and a drop in recall. The maximum/plateau of F1 is observed in the low-to-medium threshold range, reflecting the problem-specificity of the early detection of vulnerable areas, where the balance between accuracy and recall for the positive class is important. The red dotted line marks the selected working value of
, which provides the best (or close to the best) compromise for F1. The choice of
is statistically justified by its location in the region of high stability in the curve; small changes in the threshold around the selected value do not lead to a sharp deterioration in F1, which increases the reliability of the solution in the face of possible data fluctuations. An important conclusion from the figure is that the standard fixed threshold of 0.5 turns out to be suboptimal, i.e., at t = 0.5, F1 is lower than in the optimum region, since such a threshold does not take into account (i) the asymmetry of the cost of errors in the management problem (missing a truly vulnerable area is often more critical than a false positive) and (ii) the features of the probability distribution produced by the model. Thus,
Figure 11 provides direct justification for the threshold optimization procedure as a mandatory stage of the application circuit: a correctly chosen
improves the quality of identifying highly vulnerable areas and makes the final ranking more managerially reliable than using the default threshold.
Table 11 presents a complete comparative quality profile for all models examined and records: (i) the feature space used (raw screening vs. fuzzy-expanded), (ii) the probability threshold for decision binarization, (iii) the cross-validation (CV) and hold-out test scores, and (iv) the composite Integrated Score as a single criterion of application suitability.
The main result of the table is the consistent leadership of Proposed Fuzzy-XGBoost, the only model trained on fuzzy-enhanced features and using the optimized threshold = 0.35: on the test, it achieves Accuracy = 0.8095, Precision = 0.6875, Recall = 0.7857, F1 = 0.7333, ROC-AUC = 0.8291 and the maximum Integrated Score = 0.768, which indicates the best balance between identifying high-risk areas and ranking stability. The closest competitor, AdaBoost (Integrated Score = 0.7288), demonstrates comparable metrics but is inferior to the proposed approach in key applied benchmarks (in particular, F1/Recall on the test), indicating that the standard threshold of 0.5 misses a greater number of vulnerable areas. Moreover, models demonstrating high CV values (for example, XGBoost (SOTA baseline) with CV Accuracy = 0.799 and CV ROC-AUC = 0.8295) significantly lose recall in the test (Test Recall = 0.5714, Test F1 = 0.6154), which emphasizes the importance of not only the “average” quality in CV but also the error structure and threshold tuning for the target task. HistGradientBoosting shows a similar CV profile but also loses Recall/F1 on the test set, confirming that without the fuzzy layer and adaptive threshold, the model becomes more conservative. Simpler algorithms (Random Forest, Gradient Boosting, Extra Trees, KNN, Logistic Regression) exhibit either low recall or weak discrimination (ROC-AUC), limiting their suitability for early detection and prioritization. Overall, the table confirms the graphical analysis: the combination of an interpretable fuzzy layer, boosting, and threshold optimization yields the most practical results, ensuring both high-quality identification of vulnerable areas and the stability of the final ranking.
To strengthen the baseline comparison, we added repeated CV comparisons with standalone XGBoost, LightGBM, and CatBoost on the raw feature space (
Table 12). The best baseline among these gradient ensembles was demonstrated by the LightGBM (raw, 0.50) model with mean F1 = 0.636 and ROC-AUC = 0.839. The Fuzzy XGBoost (optimized tau*) configuration demonstrates comparable quality (Mean F1 = 0.617, Mean ROC-AUC = 0.829) and retains an advantage in interpretability due to the explicit fuzzy risk representation.
Table 13 presents the results of a controlled study that compares four XGBoost classifier configurations with different feature representations and thresholding strategies. The comparison highlights the effects of fuzzy feature expansion and probability threshold optimization. The baseline configuration (Raw + 0.50) achieves the highest F1 score of 0.714 and the best ROC-AUC value of 0.824, indicating strong performance when using the original feature space with the default decision threshold. Applying threshold optimization to the original features (Raw + optimized τ*) increases recall (0.786) but decreases precision (0.524), resulting in a lower F1 score of 0.629. This highlights the trade-off that occurs when optimizing recall in asymmetric classification settings. Configurations with enhanced fuzzy feature representation exhibit different behavior. The Fuzzy + optimized τ* model improves recall (0.786) compared to the default-threshold model, but achieves a slightly lower F1 score (0.688) than the baseline model without thresholding. Meanwhile, the Fuzzy + 0.50 configuration demonstrates the worst results (F1 = 0.538), indicating that fuzzy features alone without thresholding adjustment are insufficient for optimal classification.
Figure 12 reflects the permutation importance of features relative to the F1 metric for the final model. For each feature, the deterioration in F1 is measured by randomly shuffling its values while holding the other variables fixed. This interpretation approach does not reflect the model’s “weights,” but rather the feature’s actual contribution to the quality of identifying highly vulnerable areas using the test protocol. The diagram shows that opv_population_registered has the greatest significance (the largest drop in F1 with permutation), indicating that indicators related to population coverage/registration and the basic contours of social accounting are key for distinguishing risk classes. These are followed by features reflecting the territory’s production and agricultural profile and resource endowment: crop_households_screening, opv_workers, total_land_ha, as well as indicators of agricultural output (animal_output_total, animal_meat_output). Project_investment_mln also demonstrates a significant contribution, consistent with the role of investment activity as a factor in resilience and the potential to reduce vulnerability. The crop_infra_yes_ratio indicator underscores the importance of infrastructure provision in the agricultural sector, while livestock_total_heads and distance_to_district_km highlight the influence of economic activity scale and territorial accessibility/remoteness.
The lower part of the list (e.g., involved_lph, animal_eggs_output, lph_count_local, tariff_drinking_business) has lower permutational importance, meaning a limited contribution to the final F1 in the presence of stronger factors; however, such variables can be useful in local scenarios or as refiners when formulating targeting measures. Overall,
Figure 12 is used for factor interpretation: it shows which measurable contours (social accounting, employment/workers, land resources, livestock production, investment, infrastructure, and accessibility) most determine the likelihood of high vulnerability, thereby forming manageable intervention points for targeted support programs. The Top 15 Features by Permutation Importance chart provides a numerical breakdown of the factor ranking by permutation importance relative to the F1 metric, thereby enhancing the interpretation of
Figure 9. The importance value reflects the expected drop in F1 when a specific feature’s values are randomly shuffled while holding the others constant, thereby measuring its contribution to the model’s ability to identify highly vulnerable areas correctly. The most significant factor is opv_population_registered (0.1274); its dominance indicates that the population coverage/census contour and the scale of basic social registration carry the maximum discriminatory signal for the target class. The next group of features with similar importance values: crop_households_screening (0.0538), opv_workers (0.0511), and total_land_ha (0.0496) form the core, associated with economic activity and the resource base of the territory. Their comparable values indicate that the model is based not on a single indicator but on a consistent set of factors describing the economic potential and employment structure.
Figure 13 visualizes the final ranking of the 20 territories with the highest intervention priority based on the Final Priority Index, an integrated indicator that aggregates the model’s probabilistic vulnerability assessment and fuzzy risk/resilience indicators into a single management prioritization scale. The higher the index value, the greater the territory’s expected need for targeted support measures, all other things being equal. The diagram is constructed as ordered horizontal columns, allowing for the direct comparison of territories and identifying the top of the list as the most critical for immediate response.
The key point of the result is the transition from the abstract “High/Low” classification to an operational planning tool that assigns each territory a numerical priority assessment. This eliminates ambiguity in resource allocation: instead of a binary decision, a scale is formed on which it is possible to (i) set thresholds for different levels of intervention, (ii) prioritize surveys and programs, (iii) justify the selection of territories in reporting, and (iv) compare the expected effects with a limited budget. Since the index is constructed based on a single attribute space and a deterministic normalization/aggregation procedure, the resulting rating is reproducible. It can be recalculated regularly as data is updated, ensuring the monitoring of dynamics and control over the effectiveness of measures. Thus,
Figure 10 presents the main applied output of the study: a ready-made list of territories for targeted social policy planning, with priorities formed quantitatively, transparently, and consistently with the model’s vulnerability assessment and interpretable risk factors.
Figure 14 is a scatterplot where the
x-axis represents the fuzzy risk index
, the
-axis represents the model-predicted probability of high vulnerability
, and the color of the dot encodes the resulting priority index (Priority index). This visualization is used to verify that the expert fuzzy layer and the statistical classifier do not contradict each other but form a consistent decision-making framework. The key observation is the presence of a consistent division of objects by probability: dots with high
values form the upper region of the graph, and low
values the lower one, with elevated values of the resulting priority concentrated where significant risk and a high probabilistic vulnerability assessment simultaneously manifest. This confirms that fuzzy risk functions as a meaningful, interpretable signal that supports the model’s statistical inference and enhances managerial interpretation: areas with high
and relatively high
receive a higher color priority level, consistent with the logic of targeted interventions.
At the same time, the figure also reveals an important practical detail: there are points for which
and
may not be completely synchronous (for example, moderate
with high
, or vice versa). This is expected and methodologically correct, because
is formed from expertly defined rules for a limited set of key factors, while
is the result of multivariate learning on an extended feature space; such cases are interpreted as areas where the statistical profile of the data reveals additional combinations of factors not fully reflected in the fuzzy rules, or where the expert contour signals a risk with insufficient statistical confidence. In both scenarios, the final index’s color scale provides a compromise that carefully combines the sources of information. Thus,
Figure 13 demonstrates the correctness of integrating the expert (fuzzy) and statistical (classification) components: a high priority is formed predominantly in the zone where both assessments consistently indicate risk, which increases confidence in the results and makes the final ranking managerially explainable.
Figure 15 illustrates the scenario analysis for the selected priority area: the
x-axis shows the management scenarios for changing the factors, and the
y-axis shows the probability of high vulnerability (pi) calculated by the model. The dotted line shows the working decision threshold
, relative to which the transition of the area to the high-risk class is interpreted. Comparison with the Base scenario demonstrates how the model responds to targeted interventions: increasing employment (Employment +15%), adjusting the coverage/SRS indicator (SRS +10%), reducing the tariff burden (Tariff −10%), improving transport accessibility/reducing remoteness (Distance −20%), and a comprehensive intervention (Combined policy).
The key point of the figure is to demonstrate that the model supports counterfactual assessment. When the input control variables change,
is recalculated, allowing the potential impact of alternative measures to be compared on a single numerical scale. In the example shown, the probability changes have a small amplitude (the values are close to each other), which is interpreted as an indication that for this area, the risk is not shaped by a single dominant factor, but by a combination of conditions, and individual local improvements have a limited immediate effect. At the same time, the integrated scenario reflects the principle of systemic impact: even with moderate shifts in individual areas, combining measures ensures the most consistent change in the forecast. It serves as the basis for selecting an intervention package. Thus,
Figure 15 demonstrates that the model is applicable not only as a diagnostic mechanism (assessing current risk), but also as a tool for planning and comparing interventions: it is possible to test the sensitivity of the vulnerability probability to controllable factors, rank measures by the expected effect, and justify which combinations of changes are more appropriate in conditions of limited resources. The practical ranking of territorial units represents the final applied result of the study, a ranked list of rural districts linked to administrative affiliation (district), a unique KATO identifier, and three quantitative decision components, the probability of high vulnerability
, fuzzy risk indices
, and resilience
, and the final integral indicator, the Final Priority Index. The top of the ranking is concentrated among territories with very high
values (in some cases,
≈ 0.99), indicating statistically significant classification as high-vulnerability at the chosen decision threshold (
Table 14).
At the same time, the values are in the medium–high range (approximately 0.31–0.45), reflecting a consistent expert signal on key risk factors. In this case, in the presented sample equals zero, indicating the absence of activated sustainability rules for these territories and, consequently, the absence of compensating factors in the expert part. The resulting Final Priority Index (≈0.77–0.88) aggregates these components. It provides a comparable scale of management prioritization: the higher the index, the more justified the inclusion of the territory within the contour of priority measures (additional surveys, targeted support programs, resource planning). The table’s structure makes the results verifiable and operationalizable: The presence of the KATO ensures unambiguous integration into departmental contours, and the simultaneous presentation of , , , and the final index allows for a distinction between situations of “high risk according to data with a moderate expert signal” and “high risk confirmed by both contours,” which is important for selecting the type of intervention. Taken together, the table provides a ready-made basis for developing a roadmap for management measures, as it translates the model’s output into a specific list of territories with a quantitative justification for priority. The conclusion of the work is confirmed by the final quality of the best Proposed Fuzzy-XGBoost model (Test F1 = 0.7333, Test ROC-AUC = 0.8291, Integrated Score = 0.7680): the model not only diagnoses vulnerability but also generates a reproducible index suitable for the practical planning of targeted social policy.
Overall, the obtained results demonstrate that the proposed hybrid framework is effective not only for forecasting but also for practical decision support. Improved identification of high-risk areas reduces the likelihood of missing critical cases, which is crucial for timely intervention and efficient resource allocation. Furthermore, integrating fuzzy indices provides an interpretable understanding of the main vulnerability factors, allowing decision-makers to understand both the classification results and their causes. Thus, the proposed approach can be directly applied to real-world administrative and regional planning problems.