A Reproducible Hybrid Architecture of Fuzzy Logic and XGBoost for Explainable Tabular Classification of Territorial Vulnerability

Akynbekova, Aiman; Mukhanova, Ayagoz; Muratkhan, Raikhan; Diyarova, Lunara; Baigubenova, Saya; Murzabekova, Gulden; Orazymbetova, Gulaim; Satybaldieva, Aliya; Abdikadyr, Zhanat

doi:10.3390/computers15040259

Open AccessArticle

A Reproducible Hybrid Architecture of Fuzzy Logic and XGBoost for Explainable Tabular Classification of Territorial Vulnerability

by

Aiman Akynbekova

¹,

Ayagoz Mukhanova

^1,*,

Raikhan Muratkhan

^2,*,

Lunara Diyarova

^3,*,

Saya Baigubenova

³

,

Gulden Murzabekova

⁴

,

Gulaim Orazymbetova

⁵,

Aliya Satybaldieva

⁵ and

Zhanat Abdikadyr

⁶

¹

Department of Information Systems, Lev Nikolaevich Gumilyov Eurasian National University, Astana 010000, Kazakhstan

²

Department of Applied Mathematics and Informatica, Karaganda Buketov University, Karaganda 010003, Kazakhstan

³

Institute of digital economy and sustainable development, Zhangir Khan West Kazakhstan Agrarian-Technical University, Uralsk 090009, Kazakhstan

⁴

Department of Computer Sciences, Saken Seifullin Kazakh Agrotechnical Research University, Astana 010000, Kazakhstan

⁵

Department of Physics and Computer Science, Taraz University Named After Muhammad Khaydar Dulati, Taraz 010008, Kazakhstan

⁶

Department of Epidemiology and Biostatistics, Non-Profit Joint Stock Company Astana Medical University, Astana 010000, Kazakhstan

^*

Authors to whom correspondence should be addressed.

Computers 2026, 15(4), 259; https://doi.org/10.3390/computers15040259

Submission received: 16 March 2026 / Revised: 15 April 2026 / Accepted: 15 April 2026 / Published: 20 April 2026

(This article belongs to the Special Issue Machine Learning: Techniques, Industry Applications, Code Sharing, and Future Trends)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study proposes a reproducible hybrid computational model for the explainable classification of territorial vulnerability using heterogeneous tabular data. The approach integrates fuzzy logic and extreme gradient boosting in a two-stage architecture that balances interpretability and predictive performance. First, a fuzzy transformation is applied to construct interpretable risk and resilience indicators based on multi-source administrative indicators. The analytical dataset was formed by integrating 11 heterogeneous administrative sources into a single matrix of 166 territorial units and 76 features. The model was evaluated on a stratified 75/25 split of the training and test sets using the F1 score, ROC-AUC, precision, recall, and integrated quality criterion. Experimental results show that the proposed Fuzzy-XGBoost framework achieved an F1 score of 0.7333 on the test dataset, an ROC-AUC of 0.8291, and an Integrated Score of 0.768, outperforming the strongest baseline and improving recall in highly vulnerable areas. Furthermore, probabilistic threshold optimization identified an operating point at τ = 0.35, reducing the number of missed high-risk cases while maintaining acceptable specificity. The results demonstrate that fuzzy feature expansion combined with gradient boosting provides an efficient and interpretable solution for tabular risk classification and decision support problems under heterogeneity and uncertainty.

Keywords:

fuzzy logic; hybrid machine learning; threshold optimization; decision support systems; priority ranking

1. Introduction

The rapid digitalization of public administration and regional analytics has led to the accumulation of large volumes of heterogeneous tabular data obtained from administrative registers, statistical systems, infrastructure records, and budgetary sources. These data offer new opportunities for evidence-based decision-making, territorial-level priority setting, and early risk identification. However, their analytical use remains challenging due to missing values, inconsistent measurement scales, mixed data types, and uneven source coverage across territorial units. In such circumstances, traditional deterministic approaches become insufficient, as they assume precise input data and stable relationships, which are rarely observed in real-world administrative settings [1,2].

To address issues of uncertainty and poorly formalized relationships, soft computing methods, particularly fuzzy logic, are widely used. Fuzzy systems allow the representation of linguistic concepts such as “low,” “medium,” and “high,” which are more consistent with the nature of socioeconomic indicators than hard numerical thresholds. Through membership functions and rule-based inference, fuzzy logic provides interpretable intermediate representations and enables the incorporation of expert knowledge into analytical pipelines [3,4,5,6]. This makes it particularly suitable for spatial diagnostics, where vulnerability arises from the interaction of multiple heterogeneous factors rather than from isolated indicators.

Meanwhile, machine learning methods have demonstrated high predictive performance in multidimensional tabular classification problems characterized by nonlinear dependencies and complex feature interactions. Among these, gradient boosting algorithms such as XGBoost have become state-of-the-art due to their robustness, flexibility, and performance on structured data [7,8]. Recent studies have also explored hybrid machine learning frameworks that combine predictive modeling with more interpretable analytical frameworks [9].

In parallel, the growing importance of explainable artificial intelligence underscores how high predictive performance alone is insufficient in decision-sensitive areas. Analytical systems must also provide transparent reasoning, interpretable feature contributions, and decision support capabilities relevant to real-world stakeholders [10,11,12]. This requirement is particularly important in territorial vulnerability assessment, where classification results can influence support allocation, intervention prioritization, and regional planning.

Recent studies confirm that combining machine learning with multidimensional territorial indicators improves the identification of socioeconomic vulnerability patterns and strengthens analytical support for regional development [13,14]. At the same time, territorial vulnerability is inherently multidimensional and arises from the interaction of demographic, economic, infrastructural, and social factors. Therefore, it requires integrated analytical models rather than isolated indicator-based approaches [15,16].

Despite these advances, some limitations remain in the current literature. Fuzzy logic-based approaches often rely heavily on expert-defined rules, which can limit scalability, reproducibility, and adaptability to complex empirical datasets. In contrast, purely data-driven models, including gradient boosting, typically operate as black-box predictors and do not provide explicit mechanisms for the interpretable aggregation of domain-specific factors. Furthermore, many existing studies treat classification thresholds as fixed parameters, without considering the asymmetric costs of false negatives and false positives in practical applications such as risk detection and resource allocation [17,18]. As a result, most existing approaches either focus on interpretability or on predictive performance, but rarely integrate both into a unified, reproducible analytical framework [19,20,21,22].

To address these gaps, this study proposes a reproducible hybrid architecture integrating fuzzy logic and gradient boosting for explainable tabular classification of spatial vulnerability. The proposed approach consists of a two-stage computational framework. In the first stage, fuzzy logic is used to construct interpretable risk and resilience indices based on heterogeneous input metrics. In the second stage, these indices are incorporated into an expanded feature space and used to train an XGBoost classifier that captures nonlinear dependencies and complex feature interactions. Furthermore, the framework incorporates probabilistic threshold optimization to account for asymmetric decision costs and improve the detection of high-risk areas.

The main contributions of this study are as follows. First, a hybrid Fuzzy–XGBoost architecture is developed that integrates semantic interpretability and data-driven predictive modeling within a single pipeline. Second, a fuzzy feature augmentation strategy is proposed, in which interpretable aggregate indices are embedded into the machine learning feature space rather than used as stand-alone expert estimates. Third, a data-driven calibration mechanism for fuzzy rule weights is presented using statistical measures of dependence. Fourth, threshold optimization is incorporated as an integral component of the analytical workflow, enabling decision-driven classification. Finally, the study presents a fully reproducible computational protocol that integrates data preprocessing, fuzzy transformation, classification, and priority ranking.

The remainder of the paper is organized as follows. Section 2 describes the dataset, preprocessing procedures, fuzzy modeling, and the proposed hybrid architecture. Section 3 presents the experimental results and comparative evaluation. Section 4 discusses the methodological aspects and limitations of the proposed approach. Section 5 concludes the study and outlines directions for future research.

2. Materials and Methods

The proposed methodology consists of five sequential steps: data integration and preprocessing, fuzzy feature construction, data-driven coefficient calibration, classification using XGBoost, and threshold optimization for decision support [22,23,24].

2.1. Dataset Description

This study utilizes a multi-source tabular dataset formed by integrating 11 disparate departmental and registry sources in .xlsx format into a single, unified analytical matrix. The observation unit is a territorial unit, including a settlement, rural district, or administrative territory of the study region. After combining the data using a consistent identifier, a 166 territorial units × 76 attributes matrix in the “objects × attributes” format was obtained. The data sources cover various domains of socioeconomic information, including socio-demographic and registry characteristics of households, integrated vulnerability categories with gradations A–E, agricultural indicators for crop and livestock production, land resources and availability of vacant plots, tariff and infrastructure burdens, project activity, budget and tax parameters, regulatory information, as well as pension and contribution indicators. Thus, the resulting dataset reflects the complex, multidimensional structure of territorial development and is designed to build a model that balances high classification accuracy with the interpretability of results. All 166 territorial units were used in the analysis, ensuring full coverage of objects for modeling and ranking. The feature space includes 76 numerical and categorical variables. Some features were obtained directly from the sources, including tariffs, capacity, population, and area. In contrast, others were derived indicators, such as ratios, shares, indices, and differences, which enhance the comparability across territories and the model’s robustness to scale differences.

A key feature of the data preparation is the unification of the final matrix despite uneven source coverage. To maintain sample integrity and prevent territory loss, a robust missing-value handling procedure was applied. Numerical values were filled with the median for the corresponding feature, while categorical values were replaced with the safe “Unknown” category. This strategy ensures the ranking is complete and minimizes bias from excluding observations. The problem statement is formulated as a binary classification problem. The target variable, high_vulnerability, takes the value 1 if the territory is classified as highly vulnerable and 0 otherwise. The annotation is operationally generated based on the proportion of the population belonging to the most disadvantaged categories D and E, using a quantile threshold of 0.67 for the distribution of this indicator across territories. This approach provides a formal and reproducible criterion for identifying the upper portion of the risk distribution. It is consistent with the task of early identification of territories requiring priority support measures.

The study utilizes two levels of data representation. The first level includes the original 11 .xlsx files, each representing a separate information domain. The second level is an integrated tabular dataset in DataFrame format, used to train the model, calculate probabilities, and generate the Final Priority Index. Additionally, interpretable derived components were generated, including fuzzy risk and resilience indices, which are used as aggregate features and expand the feature space for machine learning algorithms. To assess the model’s quality, a stratified data partition was used to preserve class proportions. The training set included 124 territories, representing 75 percent of the total, while the test set included 42 territories, representing 25 percent. This partitioning ensures accurate validation and allows the model to be evaluated using binary classification metrics, including the F1 score and ROC-AUC, which is particularly important given the asymmetric cost of errors in identifying highly vulnerable territories. Figure 1 summarizes the coverage of data sources by territory and highlights the uneven observability characteristic of administrative data obtained from multiple sources. This confirms the feasibility of using matrix aggregation and robust imputation to maintain full territorial coverage. The x-axis represents the number of territories, and the y-axis represents the names of sources.

As shown in Figure 1, several sources provide nearly complete coverage and form the core of the analytical matrix. These include pension contributions (166 territories), vulnerability categories ABCDE (161), two household registries (150 each), crop data (148), livestock data (143), and project summary data (142). Other sources, such as utility tariffs, have moderate coverage (116 territories) but still provide useful infrastructure-related information. Overall, the figure highlights the uneven observability characteristic of multi-source administrative data and justifies the use of matrix pooling and robust imputation to preserve full territorial coverage and the integrity of the final ranking.

2.2. Mathematical Model of Data Preprocessing

Let us assume that there is a set of territories (rural districts). For each territory, numerical and categorical indicators are collected, that is,

X = {x_{i j}}

is the original matrix of features (

i

is the territory number,

j

is the indicator number),

X^{fuzzy}

is the extended matrix obtained after adding fuzzy indices/variables,

y_{i} \in {0, 1}

is the target class (1 is high vulnerability, 0 is not high),

p_{i} = P (y_{i} = 1 ∣ X_{i}^{fuzzy})

is the probability of high vulnerability output by the model,

t^{*}

is the optimal threshold that transforms the probability

p_{i}

into class

{\hat{y}}_{i}

,

R_{i}

is the fuzzy risk index,

S_{i}

is the fuzzy resilience index,

P_{i}^{fuzzy}

is the complementary priority indicator,

I

is the integral quality criterion combining several metrics.

1. Basic standard pipeline for data processing and classification model construction. Figure 2 provides a high-level overview of the complete analytical process, from collecting and integrating data from multiple sources to constructing fuzzy features, training the model, and final ranking. It is intended to familiarize the reader with the main stages of the proposed methodology before introducing the mathematical details. The architecture comprises nine interconnected stages that form a logically complete analytical pipeline. Stage S1 (Multi-Source Data Acquisition) collects heterogeneous data sources, including administrative tables in .xlsx format and a methodological document with screening criteria. This step forms the initial information basis of the model. Stage S2 (Entity Resolution and Matrix Integration) ensures the alignment of territorial entities (KATO, district, rural district) and the integration of data into a single analytical matrix, eliminating duplicate and inconsistent identifiers. Stage S3 (Statistical Quality Control) performs statistical quality control: checking variable type correctness, performing median imputation for missing values, checking for the absence of NaNs, and validating values for logical consistency. Stage S4 (Indicator Engineering and Normalization) includes the construction of derived indicators (proportional indicators, indices, aggregated metrics) and their normalization (min-max scaling), ensuring feature comparability. Stage S5 (Fuzzy Inference Layer) implements a fuzzy inference layer, incorporating triangular membership functions and a set of rules that describe the risk and resilience of territories. This layer transforms quantitative indicators into interpretable fuzzy estimates. Stage S6 (Data-Driven Coefficient Estimation) estimates weighting coefficients based on mutual information and the absolute value of the Spearman coefficient. Negative contributions are truncated, and L1 normalization is then applied to stabilize the weighting structure. Stage S7 (Target Operationalization and Validation Protocol) operationalizes the target variable (e.g., the proportion of the population in high-vulnerability conditions ≥ Q0.67) and forms a validation protocol (stratified partitioning and cross-validation). Stage S8 (Final Feature Tensor Assembly) combines the original features, fuzzy transformations, and calibrated coefficients into a single final feature tensor. The final stage, S9 (Model Training: Proposed Fuzzy-XGBoost), trains a hybrid model integrating fuzzy logic and XGBoost gradient boosting to build a robust predictive algorithm.

The presented architecture ensures the systematicity of processing, the interpretability of risk and stability factors, and the reproducibility of the computational protocol through consistent quality control and a formalized assembly of the feature space. As shown in Figure 3, the proposed approach adds on top of the base matrix X: (S5) fuzzy inference forming

R_{i}

and

S_{i}

; (S6) data-driven calibration of rule/factor coefficients (MI + |Spearman|, positive clipping, L1-normalization); (S8) assembly of the final feature tensor; (S9) training XGBoost on

X^{fuzzy}

. Next (S7), the working threshold

t^{*}

is selected to maximize F1; in the updated version,

t^{*} = 0.30

. Basic preprocessing and operationalization of the target is shown in S3–S4–S7. To make different indicators (e.g., “number”, “percentage”, “tenge”) comparable on a scale, min–max normalization is applied (1):

\begin{matrix} x_{i j}^{*} = \frac{x_{i j} - \underset{j}{m i n}}{\underset{j}{m a x} - \underset{j}{m i n} + ε} \end{matrix}

(1)

where

{m i n}_{j}

and

{m a x}_{j}

—minimum and maximum values of feature

j

for all territories,

ε > 0

—small numbers to protect against division by zero (if

{m a x}_{j} = {m i n}_{j}

), filling gaps with robust statistics. If a feature value is missing, the median for that feature is used (2):

\begin{matrix} x_{i j} : = {m e d i a n}_{j}, i f x_{i j} absent \end{matrix}

(2)

The median is chosen because it is less sensitive to outliers than the mean; A calculation of the share of vulnerable categories. Let

A_{i}, {B i, C i, D i, E}_{i}

be the numbers or proportions of vulnerability categories for territory

i

. Then the share of the most vulnerable categories (3):

\begin{matrix} {shareDE}_{i} = \frac{D_{i} + E_{i}}{A_{i} + B_{i} + C_{i} + D_{i} + E_{i} + ε} \end{matrix}

(3)

Binarization of the target using quantiles: High vulnerability is defined as falling in the upper part of the

shareDE

distribution (4). The threshold is taken as the quantile

Q

:

\begin{matrix} y_{i} = 𝟙 [{shareDE}_{i}≥ Q] \end{matrix}

(4)

where

𝟙 [\cdot]

—indicator function: equal to 1 if the condition is met, otherwise 0. This rule corresponds to the task of the early identification of territories with the highest vulnerability concentration. Fuzzy layer (S5). Triangular membership function (5):

μ_{tri} (x; a, b, c) = m a x (m i n (\frac{x - a}{b - a}, \frac{c - x}{c - b}), 0)

(5)

Aggregated fuzzy risk (via calibrated coefficients S6): Let

r_{i k} \in [0, 1]

be the activations of the risk rules obtained from (5). Then (6):

R_{i} = \sum_{k = 1}^{K} w_{k} r_{i k}, \sum_{k = 1}^{K} w_{k} = 1, w_{k} \geq 0

(6)

Aggregated fuzzy stability (via calibrated coefficients S6): Let

s_{i m} \in [0, 1]

be the activations of the stability rules. Then (7):

S_{i} = \sum_{m = 1}^{M} v_{m} s_{i m}, \sum_{m = 1}^{M} v_{m} = 1, v_{m} \geq 0

(7)

Complementary priority indicator for fuzzy part (8):

P_{i}^{fuzzy} = 1 - R_{i}

(8)

Data-driven estimation of coefficients (S6). Rule/factor weights are estimated from the data based on combined information: mutual information (MI) and the rank correlation modulus |

ρ

| (Spearman’s rank order) with the target variable. For each risk rule (9):

q_{k} = M I (r_{\cdot k}, y) + ∣ ρ (r_{\cdot k}, y) ∣, q_{k}^{+} = m a x (q_{k}, 0), w_{k} = \frac{q_{k}^{+}}{\sum_{j = 1}^{K} q_{j}^{+}}

(9)

Similarly, for the stability rules (10):

g_{m} = M I (s_{\cdot m}, y) + ∣ ρ (s_{\cdot m}, y) ∣, g_{m}^{+} = m a x (g_{m}, 0), v_{m} = \frac{g_{m}^{+}}{\sum_{l = 1}^{M} g_{l}^{+}}

(10)

Here, positive clipping implements

m a x (\cdot, 0)

, and L1 normalization ensures that the sum of the weights is equal to 1. Validation protocol and threshold (S7). Optimal threshold for maximizing F1 (11):

t^{*} = a r g \underset{t \in [0, 1]}{m a x} F 1, (y, 𝟙 [p \geq t])

(11)

The updated version of the working solution adopted

t^{*} = 0.30

. Integral quality criterion (for model comparison). Integral assessment I with data-driven weights (12):

I = α_{F 1} \cdot F 1 + α_{A U C} \cdot A U C + α_{R} \cdot R e c a l l + α_{P} \cdot P r e c i s i o n, \sum α = 1, α \geq 0

(12)

The α weights are automatically calculated from the distribution of metrics (entropy/information weighting), so that the resulting scale reflects the models’ actual distinguishability in the data. Final solution and applied index (S9 and practical conclusion). Binary solution of the classifier (13):

{\hat{y}}_{i} = 𝟙 [p_{i} \geq t^{*}]

(13)

Priority index for ranking (14):

P r i o r i t y_{i} = 0.45 \cdot n o r m (p_{i}) + 0.35 \cdot n o r m, (P_{i}^{fuzzy}) + 0.20 \cdot n o r m (R_{i})

(14)

where

n o r m (\cdot)

—reduction in quantities to scale

[0, 1]

.

2. A fuzzy-hybrid pipeline. In real-world management, many factors are described not strictly (“high/low”), but approximately: “low income”, “high workload”, “insufficient infrastructure”. Fuzzy logic allows us to formalize such concepts through membership functions and rules. However, a fuzzy system alone may not be robust enough to handle complex nonlinear feature interactions. Therefore, it is combined with gradient boosting (XGBoost), a strong tree-based classifier. The result is a model that preserves explainability (via

R_{i}, S_{i}

) while improving classification quality (via XGBoost on

X^{fuzzy}

).

Figure 3 summarizes the proposed hybrid pipeline and clarifies how fuzzy feature construction, classifier training, and threshold calibration interact within a single workflow. These fuzzy outputs are aggregated into interpretable components related to risk, resilience, and priority, combined with the original normalized features. The resulting combined feature space is then used by the XGBoost classifier to predict the probability of high vulnerability. The resulting probability is converted to a binary decision using the optimized threshold

t^{*}

, determined during the training stage to maximize F1. The final model output includes not only probability and class, but also a priority index calculated as a weighted combination of normalized probability, an applied priority component, and fuzzy risk. This allows for the formation of a ranked list of territories for management decision-making.

From a methodological perspective, the architecture combines the interpretability of fuzzy logic with the predictive power of gradient boosting. The use of fuzzy membership functions allows for the accurate description of management characteristics without hard thresholds. At the same time, integration with XGBoost ensures that complex nonlinear interactions among factors are accounted for. As a result, the model simultaneously provides explainability through fuzzy risk and resilience indices and high classification quality in an extended feature space. A triangular function (15) is used to convert a numerical value x into a degree of membership in a linguistic term (e.g., “low/medium/high”).

\begin{matrix} μ_{tri} (x; a, b, c) = m a x, (m i n, (\frac{x - a}{b - a + ε}, \frac{c - x}{c - b + ε}), 0) \end{matrix}

(15)

where

x

—normalized value of a feature,

a < b < c

—term parameters,

ε > 0

—stabilizer. Fuzzy risk (weighted rule aggregation). Let

r_{i k} \in [0, 1]

be the degree of triggering of the

k

-th risk rule for territory i, obtained from (15) and the rule logic. The proposed data preprocessing framework offers several methodological advantages. It improves interpretability by generating fuzzy risk and resilience indices with clear meanings and understandable levels from low to medium to high. It also improves robustness to uncertainty and edge cases through triangular membership functions, which provide smooth transitions between risk levels and reduce sensitivity to noise and outliers. Furthermore, fuzzy feature expansion increases the informativeness of the input space by introducing aggregated nonlinear representations, which help XGBoost capture complex factor interactions while maintaining interpretability. The framework also accounts for asymmetric error costs through threshold optimization and remains fully reproducible thanks to formalized validation, normalization, imputation, and fuzzification steps.

3. Results

A step-by-step empirical evaluation of the integrated dataset and the developed hybrid architecture is presented. First, the distributions of key indicators, intergroup differences, and the feature correlation structure are analyzed to verify the correctness of the preprocessing and the validity of the binary classification problem formulation. Next, the results of comparative training across the baseline, SOTA, and proposed models within a single experimental protocol are presented using F1, ROC-AUC, and integrated quality metrics. Additionally, the error structure, sensitivity to the probability threshold, and factorial interpretation through the permutational importance of features are analyzed. The results obtained allow us to quantify the computational advantage of the hybrid architecture and its applicability for early detection and the prioritization of highly vulnerable areas.

To ensure correct interpretation of the results with a small test sample size, an additional uncertainty assessment was performed (Table 1). For the basic fuzzy +

t^{*}

configuration, a bootstrap 95% confidence interval for F1 was obtained: [0.467; 0.848] with a point estimate of 0.688. In addition, nested repeated stratified CV showed an average F1 = 0.676 ± 0.091 and an average ROC-AUC = 0.828 ± 0.066. Thus, the conclusions in the revised version are based not on a single holdout observation, but on interval and repeated validation assessments.

Table 2 shows the results of nested repeated cross-validation used as an additional check of the generalization ability for the four supervised XGBoost conditions. Across repeated iterations, the best average F1-measure was obtained for the Fuzzy + optimized τ* configuration (mean F1 = 0.67594, standard deviation = 0.09098), followed closely by the Raw + optimized τ* configuration (mean F1 = 0.67047, standard deviation = 0.07901). The two default threshold settings yielded slightly lower average F1-measures of 0.65632 for Raw + 0.50 and 0.65568 for Fuzzy + 0.50. In terms of ranking quality, all four configurations yielded very similar average ROC-AUC values, ranging from 0.82759 to 0.83070. The highest mean ROC-AUC value was observed for the Raw + 0.50 setting (0.83070 ± 0.06657), while the Fuzzy + optimized τ* condition reached 0.82759 ± 0.06618. This indicates that the main differences between the conditions arise not so much from the probability ranking itself, but from the interaction between the feature representation and the decision threshold strategy. The recall values further show the effect of threshold optimization. The highest mean recall value was achieved with the Fuzzy + optimized τ* setting (0.76000 ± 0.15275), followed by Raw + optimized τ* (0.74909 ± 0.14441). In contrast, both conditions with the default threshold yielded significantly lower recall values of 0.62909 with standard deviations of 0.11713 and 0.12287, respectively. The median value of the optimized threshold was 0.22 for the fuzzy-boosted condition and 0.21 for the original optimized condition, while the default settings remained fixed at 0.50.

Table 3 presents the sensitivity analysis of the overlap of fuzzy membership functions with changes in the c (low leverage) and a (high leverage) parameters. The results show that the narrow and basic overlap settings yield similar accuracy (0.761905), precision (0.611111), recall (0.785714), and F1-score (0.6875), indicating stable classification performance with moderate changes in the overlap structure. The only difference between the two settings is observed in the ROC-AUC, where the basic overlap achieves the highest value of 0.831633 compared to 0.829082 for the narrow overlap condition. In contrast, the wide overlap setting results in lower classification quality. While recall remains unchanged at 0.785714, precision decreases to 0.714286, accuracy to 0.550000, and the F1 score to 0.647059. The selected decision threshold also shifts downward: from 0.24 in the narrow overlap mode and 0.23 in the basic overlap mode to 0.12 in the wide overlap configuration. This indicates that excessive overlap between fuzzy membership functions reduces discrimination clarity and requires a softer threshold to maintain sensitivity.

Table 4 presents the bootstrap 95% confidence intervals for the main evaluation metrics in the four compared simulation conditions. The results provide an interpretation of the validation set’s performance, accounting for uncertainty, and show that the estimated metrics exhibit moderate variability due to the limited sample size. For the Raw + 0.50 configuration, the F1 point estimate is 0.714, with a 95% confidence interval of [0.476; 0.882], while the ROC-AUC is 0.824, with an interval of [0.677; 0.944]. This condition shows the highest F1 point estimate among the compared settings, although the interval’s width indicates significant uncertainty. The Raw + optimized τ* configuration yields a lower F1 score of 0.629 [0.428; 0.800], but achieves a higher recall of 0.786, with the upper bound of the confidence interval reaching 1.000, confirming that threshold optimization biases the model towards increased sensitivity at the expense of accuracy. The Fuzzy + 0.50 condition yields the weakest F1 score of 0.538, with a wide confidence interval of [0.261; 0.750], indicating lower stability without threshold adjustment. In contrast, the Fuzzy + optimized τ* setting improves F1 to 0.688 [0.467; 0.848] and recall to 0.786 [0.545; 1.000], demonstrating that threshold calibration substantially improves the performance of the fuzzy-enhanced representation. At the same time, the ROC-AUC values remain relatively close across all four conditions, ranging from 0.804 to 0.824, suggesting that the main differences arise not from the quality of the probability ranking per se, but from the final classification threshold and feature representation.

Table 5 presents the results of a supervised ablation study comparing four XGBoost classifier configurations with different feature representations (original vs. fuzzy logic-enhanced) and thresholding strategies (default vs. optimized). The analysis is designed to isolate the individual and combined effects of fuzzy feature augmentation and probability threshold calibration. The baseline configuration (original + 0.50) demonstrates the best overall performance, with the highest F1 Score of 0.714, precision of 0.810, and ROC-AUC of 0.824. This indicates that the original feature space with the default decision threshold provides high benchmark performance. When applying thresholding optimization to the original features (original + optimized τ*), recall increases from 0.714 to 0.786, but precision decreases from 0.714 to 0.524, resulting in a decrease in the F1-score to 0.629. This confirms that threshold tuning biases the model toward greater sensitivity at the expense of accuracy. For configurations with improved fuzziness, the Fuzzy + τ*-optimized model achieves an F1 score of 0.688, with a recall of 0.786 and a precision of 0.611, demonstrating a more balanced trade-off compared to the original optimized configuration. In contrast, the Fuzzy + 0.50 configuration performs the worst across all metrics: an F1 score of 0.538 and a recall reduced to 0.500, indicating that fuzzy features alone, without threshold tuning, are insufficient to provide sufficient discriminatory power.

3.1. Analysis of Data Distribution and Statistical Testing of Feature Structure

The primary statistical analysis of the integrated dataset aims to assess its structural integrity and the distributional characteristics of the feature space. The variability of indicators, their consistency across domains, and basic intergroup differences for the target variable are considered. The conducted verification allows us to confirm the correctness of the integration of sources and create a reliable basis for the subsequent construction and comparative evaluation of classification models. Figure 4 reflects the average number of social categories A–E per territory. It is evident that the distribution structure is markedly heterogeneous: the largest contributions are made by categories B (≈3138) and C (≈2230), followed by D (≈1220), while A (≈435) and, especially, E (≈265) are represented by significantly smaller values.

This configuration means that different combinations of categories form the “social profile” of territories and cannot be reduced to a single type. Therefore, management decisions should be based on targeted prioritization and take into account the relative shares of vulnerable groups (in particular, the D+E components), and not just the absolute population values. From a practical perspective, this indicates that territorial vulnerability cannot be addressed through uniform policy measures. Instead, differentiated intervention strategies are needed that take into account the specific composition of social groups in each territory. Figure 5 shows the distribution of the

{share}_{D + E}

—the share of the population belonging to the most vulnerable social categories D+E—across all territories.

The bulk of observations is concentrated in the low–moderate range (approximately 0.3–0.4). At the same time, on the right, there is a long tail with isolated areas with significantly elevated D+E shares (even reaching extreme values). This asymmetry and pronounced inter-area variability mean that the D+E indicator does differentiate areas by vulnerability, thereby justifying (i) the goal of early identification of areas with a high D+E share, (ii) the formation of a binary target using a quantile threshold, and (iii) the need for priority ranking, since the differences between areas are uneven and of practical significance. In practice, this confirms that the model can be effectively used for the early identification of high-risk areas, allowing policymakers to prioritize regions with a disproportionately high concentration of vulnerable populations. Table 6 presents the key control metrics for integration quality during the construction of a unified analytical matrix and confirms that the data are suitable for subsequent model training and the calculation of the priority index.

The final indicators presented in Table 2 confirm the correctness of the analytical matrix formation and the preservation of full coverage of the study objects. The total number of territorial units with social categories A–E is 166, which defines the general population for the analysis and allows for monitoring potential data losses at the stages of cleaning and integration. Correct matching by the KATO administrative identifier was performed for 160 territories; minor discrepancies indicate individual cases of key inconsistency and require the additional normalization of reference books, but do not affect the completeness of the final sample. The target variable is generated for all 166 objects, thereby excluding selection bias due to the lack of annotation and ensuring proper classifier training. Additionally, it was established that each territory contains at least 10 constructed informative features, confirming the sufficient saturation of the rows and the stability of the feature space under multi-source data conditions. The combination of these metrics demonstrates the correctness of the integration, the completeness of the feature representation, and the suitability of the matrix for classification and subsequent territory ranking. In practice, this ensures that the resulting model operates on a complete and representative dataset, reducing the risk of biased decisions caused by the absence or exclusion of certain areas.

Table 7 presents descriptive statistics for key indicators, including the number of observations, mean values, standard deviations, quartiles, and extreme values. These indicators serve an important analytical function by quantitatively confirming the inter-territorial heterogeneity of socioeconomic parameters. At the same time, the obtained statistics substantiate the need for normalization procedures and robust transformations, as well as for the correct interpretation of features in the presence of asymmetric distributions and outliers, thereby ensuring the methodological validity of subsequent modeling.

An analysis of the indicators presented in Table 7 reveals several statistically and substantively significant patterns characterizing the structure of territorial differences. First, the indicator for the share of vulnerable population categories (D+E) has an average of approximately 0.195, a median of 0.177, and a maximum approaching 1.0, indicating significant variability and right-skewed asymmetry in the distribution. The significant spread of values confirms that this indicator has sufficient discriminatory power and is justifiably used as the basis for binarizing the target variable and subsequently ranking territories. Socioeconomic indices also reflect structural heterogeneity. The employment index is characterized by a relatively low average level with significant inter-territorial variation, indicating differences in the degree of economic participation of the population, even with comparable levels of social vulnerability. The fiscal indicator tax_balance_mln exhibits significant dispersion and extreme values, with a relatively moderate median, which is typical of financial indicators with a high concentration of resources in individual administrative centers. This pattern indicates the need for robust normalization methods and transformations during modeling. Resource indicators exhibit pronounced sparseness. The availability coefficient for available land is zero in most cases, while high values are recorded for a limited number of territories. This demonstrates the indicator’s inherently binary nature, where the information content is determined more by the availability of a resource than by its smooth quantitative scale. This distribution structure justifies the use of a fuzzy interpretation of availability levels. Similarly, infrastructure and tariff factors exhibit strong asymmetry and extreme values, requiring careful processing to prevent biased learning. At the same time, the indicator of social infrastructure coverage shows almost constant values across most territories, indicating limited independent discriminatory power but retaining its value as a contextual indicator in the complex model.

Agroeconomic parameters reflect the large-scale heterogeneity in the production profiles of territories. Significant variations in livestock numbers and the concentration of processing facilities in certain locations confirm the spatial differentiation of resource potential and the presence of industrial hubs. Such distributions strengthen the case for nonlinear analysis methods that account for factor interactions. Investment indicators also demonstrate significant variability, reflecting differences in the sustainability and development potential of the territories. Moreover, some technical characteristics, with virtually constant values, act more as control variables and do not independently contribute to class discrimination. Taken together, Table 3 confirms that the integrated dataset includes characteristics with varying statistical natures, including symmetric and asymmetric distributions, sparse structures, and outliers. From an applied perspective, these properties highlight the need for robust data preprocessing and justify the use of hybrid modeling approaches capable of handling heterogeneous and nonlinearly distributed data. This diversity of statistical characteristics justifies the chosen strategy of robust preprocessing, the fuzzification of key factors, and the use of ensemble boosting methods that account for nonlinear combinations of characteristics while maintaining interpretable risk and sustainability indices.

Figure 6 presents the SHAP-based feature importance analysis at the global and local levels, providing quantitative insight into the model’s decision-making process. At the global level (left panel), the most influential feature is opv_population_registered, with an average |SHAP| value of 0.95, significantly outperforming all other variables. The second most important feature is project_investment_mln with an average contribution of approximately 0.78, followed by animal_output_total (≈0.72) and crop_population_screening (≈0.55). Infrastructure-related indicators, such as tariff_drinking_business (≈0.47) and labor-related variables (opv_workers ≈0.43), also show significant contributions. The remaining features, including livestock subcategories and land-use indicators, have moderate importance, ranging from 0.25 to 0.40. This distribution indicates a clear dominance of demographic and investment-related factors in the model’s overall behavior. At the local level (right panel), for the selected high-risk area, the opv_population_registered variable contributes approximately +2.5 to the logarithm of the classification odds ratio, making it the main driver of a positive prediction. The next most influential features are crop_population_screening (≈+1.1) and project_investment_mln (≈+0.9). Additional positive contributions are made by opv_workers (≈+0.8) and animal_output_total (≈+0.6). Notably, the fuzzy_risk feature contributes approximately +0.5, confirming its active role in the hybrid model. Other variables, including animal_eggs_output, distance_to_district_km, and livestock_diversity, have smaller but significant contributions, ranging from +0.2 to +0.4. Overall, the quantitative analysis of SHAP shows that while a small subset of features (primarily demographic and economic indicators) dominates globally, local forecasts are shaped by the combined influence of multiple factors. The inclusion of fuzzy_risk among the key factors influencing local forecasts confirms the interpretability and practical relevance of the extended fuzzy feature space.

Figure 7 presents an analysis of the fuzzy resilience component in the proposed model. The left panel shows the distribution of the fuzzy resilience index across all 166 territories, and the right panel presents the activation frequency of resilience rules (s1–s3) for both all territories and a subset of 20 priority territories. As shown in the left panel, the fuzzy resilience index is zero for all observations, with no variation across territories. This result is further confirmed by the right panel, where the activation frequency of all resilience rules (s1, s2, s3) is zero. This indicates that, with the current rule specification and parameterization, the resilience component is not activated for any territory in the dataset. This finding directly addresses the reviewer’s concerns regarding the interpretability of the fuzzy layer. The results indicate that the resilience subcomponent, as currently defined, remains inactive due to overly strict conjunction conditions in the rule base and/or insufficient variability of the relevant input characteristics. As a result, the interpretability of the fuzzy layer in this model is primarily determined by the risk component, which remains fully functional and contributes to both global and local explanations. Importantly, this behavior does not invalidate the overall hybrid framework, but rather points to a limitation in the current specification of the resilience rules. Future work will revisit this component by relaxing rule constraints, redesigning membership functions, or introducing alternative aggregation strategies to ensure the meaningful activation of resilience factors. Overall, the figure clearly demonstrates the internal behavior of the fuzzy layer and supports a more precise interpretation of the model structure, where the risk dimension determines the current contribution of the fuzzy subsystem.

Table 8 presents a sensitivity analysis of the fuzzy membership overlap settings and their impact on the predictive performance of the proposed model. Three configurations were considered: narrow overlap, basic overlap, and wide overlap, which differ in the position of the low-leverage parameter c and the high-leverage parameter a. The results show that the narrow and basic overlap settings provide similar classification performance in terms of accuracy (0.7619), precision (0.6111), recall (0.7857), and F1-score (0.6875), while the basic overlap achieves the highest ROC-AUC value of 0.8316 compared to 0.8291 for the narrow overlap. In contrast, the wide overlap configuration leads to a decrease in the overall classification quality: accuracy decreases to 0.7143, precision to 0.5500, and F1-score to 0.6471, although recall remains unchanged at 0.7857. The selected threshold level also shifts downwards: from 0.24 and 0.23 in the first two settings to 0.12 in the wide overlap condition.

3.2. Application of Analysis Methods to Check the Structure of Prepared Data

Figure 8 presents the Spearman rank correlation matrix

(ρ)

between the basic indicators and fuzzy aggregates, reflecting monotonic (not necessarily linear) dependencies in the data. The analysis shows that for most feature pairs, the absolute values of

∣ ρ ∣

are small, indicating low multicollinearity and confirming that the features carry distinct information and can be used together in the model without significant redundancy. The most pronounced relationship is observed between opv_employment_index and the fuzzy components:

ρ \approx - 0.85

for fuzzy_risk and

ρ \approx + 0.85

for fuzzy_priority, consistent with the economic meaning of the employment indicator as a factor that reduces risk and prioritizes sustainability. Additionally, fuzzy_risk and fuzzy_priority are almost perfectly anticorrelated (

ρ \approx = - 1.00

), which is the expected consequence of their complementary construction (one value is defined by the other) and confirms the internal logical consistency of the fuzzy layer. Overall, the matrix is used as a QC tool to verify the correctness of feature formation and to identify key relationships that support model interpretation and fuzzy rule selection.

Table 9 evaluates the monotonic relationship between the key features and the target indicator of high vulnerability (y) using Spearman’s coefficient

ρ

and tests the statistical significance of this relationship using a p-value with a sample size of

N

= 166. The most pronounced and statistically significant association is observed for practical_priority_score

(ρ = 0.526, p \approx 0)

, confirming its meaningful validity as an integral indicator, consistent with the target logic of vulnerability and suitable for ranking territories. Several factors show moderate but significant relationships: project_investment_mln has a negative association

(ρ = - 0.2207, p = 0.0043)

, which is interpreted as a decrease in vulnerability with higher investment activity; tariff_pressure is positively associated with vulnerability

(ρ = 0.1916, p = 0.0134)

, indicating the role of the tariff burden as a potential risk amplifier; free_land_ratio shows a negative correlation

(ρ = - 0.1765, p = 0.0229)

, which is consistent with the hypothesis of a resource “cushion” of territories in the presence of free land. The remaining features show weak or statistically insignificant correlations at the sample level (e.g., livestock_total_heads: p = 0.094; opv_employment_index: p = 0.1389; srs_coverage_pct and tax_balance_mln: p > 0.4), which does not mean they are useless in the model, since in a multivariate setting they can manifest through nonlinear interactions. The forecast_projects feature (ρ = 0) is effectively a constant and does not contribute discriminatory information regarding the target.

Table 10 presents the results of the nonparametric Kruskal–Wallis test (H statistic) for testing the hypothesis of equal distributions of indicators across vulnerability groups. This test is appropriate when there is potential asymmetry in the distribution and when outliers are present. The biggest difference is demonstrated by practical_priority_score (H = 36.9221, p ≈< 0), confirming its ability to classify territories by risk levels reliably and justifying the use of this integral indicator for ranking. Significant between-group differences were also found for opv_employment_index (H = 16.8792, p = 0.0002), consistent with the role of employment/economic inclusion as a factor associated with reducing vulnerability, as well as for project_investment_mln (H = 8.4972, p = 0.0143) and tariff_pressure (H = 7.0932, p = 0.0288), indicating statistically significant differentiation of the groups by investment activity and tariff burden.

In contrast, livestock_total_heads, free_land_ratio, crop_processing_capacity_tpy, tax_balance_mln, and srs_coverage_pct yielded insignificant p-values (p > 0.2), indicating no robust differences in vulnerability distributions between the groups in the univariate setting; however, these features may contribute through combinations and nonlinear interactions in the multivariate model. The forecast_projects feature is constant (which formally yields H = 0) and does not provide any distinguishing information. Overall, significant H values for key variables confirm that the features differentiate risk groups, supporting the validity of the classification problem statement and the subsequent ranking of territories.

3.3. Training Models for Calculating the Efficiency of Methods

The models were compared using a single experimental protocol across three solution classes: baseline (classical algorithms), SOTA-based table ensembles, and the proposed approach. Quality assessment was performed on a deferred test set using metrics reflecting both the accuracy of identifying vulnerable areas and the model’s ability to rank risks correctly: F1 (the balance of precision and recall for the positive class), ROC-AUC (discriminatory ability based on probability estimates), and the integral criterion

I

, which aggregates key quality indicators (including F1, AUC, Recall, and Precision) into a single scale for managerial comparison. For the proposed model, the probability binarization threshold was further optimized to achieve a balance between omitting high-risk areas and false positives; the optimal threshold was

t^{*} = 0.35

.

The results show that the Proposed Fuzzy-XGBoost model achieves the best performance, with Test F1 = 0.7333, Test ROC-AUC = 0.8291, and Integrated Score = 0.7680, ranking first in the Integrated Score and thus providing the most favorable tradeoff between detection accuracy and ranking robustness. The closest competitor is AdaBoost with an Integrated Score of 0.7288, indicating the high competitiveness of ensemble methods. However, the proposed approach retains a measurable advantage through the addition of an interpretable fuzzy layer and a more precise decision threshold.

Figure 9 visualizes comparative results across three scales—Test F1, Test ROC-AUC, and Integrated Score for a set of baseline, SOTA-based, and proposed models. The diagram shows that Proposed Fuzzy-XGBoost leads in both F1 and AUC and, most importantly, provides the greatest overall score, demonstrating not a partial advantage in one metric but a stable dominance in the overall criterion directly related to the task of early detection and the prioritization of territories.

Figure 10 shows the confusion matrices (test set, N = 42) for the group of models where the positive class corresponds to High (high vulnerability) and the negative class corresponds to Low. For Proposed Fuzzy-XGBoost, the obtained TN = 23, FP = 5, FN = 3, TP = 11 yields

S e n s i t i v i t y = \frac{T P}{T P + F N} = 0.786

and Specificity = TN/(TN + FP) = 0.821. This means that the proposed model simultaneously (i) better identifies vulnerable areas (minimizes high-risk missingness due to the low FN) and (ii) maintains acceptable robustness to false alarms (FP control).

Comparison with competitors confirms the advantage of the proposed approach, specifically along the critical management axis of “not missing high risks.” AdaBoost has TN = 23, FP = 5, FN = 4, TP = 10 (Sensitivity ≈ 0.714, Specificity ≈ 0.821), similar specificity, but misses more vulnerable areas. HistGradientBoosting and XGBoost (SOTA baseline) have the same error structure: TN = 24, FP = 4, FN = 6, TP = 8 (Sensitivity ≈ 0.571, Specificity ≈0.857): they “protect” low risks slightly better (higher specificity), but at the cost of a significant increase in FN, which reduces their suitability for the early detection of vulnerable areas. Random Forest (TN = 23, FP = 5, FN = 7, TP = 7) demonstrates even lower sensitivity (≈0.50), that is, it misses half of the high-risk objects. Gradient Boosting (TN = 20, FP = 8, FN = 6, TP = 8) worsens both specificity (≈0.714) and sensitivity (≈0.571). Extra Trees (TN = 25, FP = 3, FN = 8, TP = 6) demonstrates high specificity (≈0.893) but low sensitivity (≈0.429), that is, it “conservatively” avoids false alarms but systematically under-detects high risk. KNN and Logistic Regression also exhibit low sensitivity (≈0.357 and ≈0.429, respectively), making them less preferable for social support prioritization tasks. The main conclusion from the figure is that the proposed model provides the most balanced error structure, emphasizing minimizing FN while maintaining sufficient specificity, which aligns with the applied goal of reliably identifying areas requiring priority support measures.

Figure 11 shows how the classification quality, measured by the F1 metric, changes as the threshold

t

is varied, transforming the probability

p_{i}

into a binary decision

y i = 1 [p i \geq t]

.

The curve has a pronounced dependence on the threshold. At too low t values, the model begins to label too many areas as “High,” increasing false alarms and decreasing accuracy. At the same time, at too high

t

values, the model becomes overly conservative, leading to increased missingness (FN) and a drop in recall. The maximum/plateau of F1 is observed in the low-to-medium threshold range, reflecting the problem-specificity of the early detection of vulnerable areas, where the balance between accuracy and recall for the positive class is important. The red dotted line marks the selected working value of

t^{*} = 0.35

, which provides the best (or close to the best) compromise for F1. The choice of

t^{*}

is statistically justified by its location in the region of high stability in the curve; small changes in the threshold around the selected value do not lead to a sharp deterioration in F1, which increases the reliability of the solution in the face of possible data fluctuations. An important conclusion from the figure is that the standard fixed threshold of 0.5 turns out to be suboptimal, i.e., at t = 0.5, F1 is lower than in the optimum region, since such a threshold does not take into account (i) the asymmetry of the cost of errors in the management problem (missing a truly vulnerable area is often more critical than a false positive) and (ii) the features of the probability distribution produced by the model. Thus, Figure 11 provides direct justification for the threshold optimization procedure as a mandatory stage of the application circuit: a correctly chosen

t^{*}

improves the quality of identifying highly vulnerable areas and makes the final ranking more managerially reliable than using the default threshold. Table 11 presents a complete comparative quality profile for all models examined and records: (i) the feature space used (raw screening vs. fuzzy-expanded), (ii) the probability threshold for decision binarization, (iii) the cross-validation (CV) and hold-out test scores, and (iv) the composite Integrated Score as a single criterion of application suitability.

The main result of the table is the consistent leadership of Proposed Fuzzy-XGBoost, the only model trained on fuzzy-enhanced features and using the optimized threshold

t^{*}

= 0.35: on the test, it achieves Accuracy = 0.8095, Precision = 0.6875, Recall = 0.7857, F1 = 0.7333, ROC-AUC = 0.8291 and the maximum Integrated Score = 0.768, which indicates the best balance between identifying high-risk areas and ranking stability. The closest competitor, AdaBoost (Integrated Score = 0.7288), demonstrates comparable metrics but is inferior to the proposed approach in key applied benchmarks (in particular, F1/Recall on the test), indicating that the standard threshold of 0.5 misses a greater number of vulnerable areas. Moreover, models demonstrating high CV values (for example, XGBoost (SOTA baseline) with CV Accuracy = 0.799 and CV ROC-AUC = 0.8295) significantly lose recall in the test (Test Recall = 0.5714, Test F1 = 0.6154), which emphasizes the importance of not only the “average” quality in CV but also the error structure and threshold tuning for the target task. HistGradientBoosting shows a similar CV profile but also loses Recall/F1 on the test set, confirming that without the fuzzy layer and adaptive threshold, the model becomes more conservative. Simpler algorithms (Random Forest, Gradient Boosting, Extra Trees, KNN, Logistic Regression) exhibit either low recall or weak discrimination (ROC-AUC), limiting their suitability for early detection and prioritization. Overall, the table confirms the graphical analysis: the combination of an interpretable fuzzy layer, boosting, and threshold optimization yields the most practical results, ensuring both high-quality identification of vulnerable areas and the stability of the final ranking.

To strengthen the baseline comparison, we added repeated CV comparisons with standalone XGBoost, LightGBM, and CatBoost on the raw feature space (Table 12). The best baseline among these gradient ensembles was demonstrated by the LightGBM (raw, 0.50) model with mean F1 = 0.636 and ROC-AUC = 0.839. The Fuzzy XGBoost (optimized tau*) configuration demonstrates comparable quality (Mean F1 = 0.617, Mean ROC-AUC = 0.829) and retains an advantage in interpretability due to the explicit fuzzy risk representation.

Table 13 presents the results of a controlled study that compares four XGBoost classifier configurations with different feature representations and thresholding strategies. The comparison highlights the effects of fuzzy feature expansion and probability threshold optimization. The baseline configuration (Raw + 0.50) achieves the highest F1 score of 0.714 and the best ROC-AUC value of 0.824, indicating strong performance when using the original feature space with the default decision threshold. Applying threshold optimization to the original features (Raw + optimized τ*) increases recall (0.786) but decreases precision (0.524), resulting in a lower F1 score of 0.629. This highlights the trade-off that occurs when optimizing recall in asymmetric classification settings. Configurations with enhanced fuzzy feature representation exhibit different behavior. The Fuzzy + optimized τ* model improves recall (0.786) compared to the default-threshold model, but achieves a slightly lower F1 score (0.688) than the baseline model without thresholding. Meanwhile, the Fuzzy + 0.50 configuration demonstrates the worst results (F1 = 0.538), indicating that fuzzy features alone without thresholding adjustment are insufficient for optimal classification.

Figure 12 reflects the permutation importance of features relative to the F1 metric for the final model. For each feature, the deterioration in F1 is measured by randomly shuffling its values while holding the other variables fixed. This interpretation approach does not reflect the model’s “weights,” but rather the feature’s actual contribution to the quality of identifying highly vulnerable areas using the test protocol. The diagram shows that opv_population_registered has the greatest significance (the largest drop in F1 with permutation), indicating that indicators related to population coverage/registration and the basic contours of social accounting are key for distinguishing risk classes. These are followed by features reflecting the territory’s production and agricultural profile and resource endowment: crop_households_screening, opv_workers, total_land_ha, as well as indicators of agricultural output (animal_output_total, animal_meat_output). Project_investment_mln also demonstrates a significant contribution, consistent with the role of investment activity as a factor in resilience and the potential to reduce vulnerability. The crop_infra_yes_ratio indicator underscores the importance of infrastructure provision in the agricultural sector, while livestock_total_heads and distance_to_district_km highlight the influence of economic activity scale and territorial accessibility/remoteness.

The lower part of the list (e.g., involved_lph, animal_eggs_output, lph_count_local, tariff_drinking_business) has lower permutational importance, meaning a limited contribution to the final F1 in the presence of stronger factors; however, such variables can be useful in local scenarios or as refiners when formulating targeting measures. Overall, Figure 12 is used for factor interpretation: it shows which measurable contours (social accounting, employment/workers, land resources, livestock production, investment, infrastructure, and accessibility) most determine the likelihood of high vulnerability, thereby forming manageable intervention points for targeted support programs. The Top 15 Features by Permutation Importance chart provides a numerical breakdown of the factor ranking by permutation importance relative to the F1 metric, thereby enhancing the interpretation of Figure 9. The importance value reflects the expected drop in F1 when a specific feature’s values are randomly shuffled while holding the others constant, thereby measuring its contribution to the model’s ability to identify highly vulnerable areas correctly. The most significant factor is opv_population_registered (0.1274); its dominance indicates that the population coverage/census contour and the scale of basic social registration carry the maximum discriminatory signal for the target class. The next group of features with similar importance values: crop_households_screening (0.0538), opv_workers (0.0511), and total_land_ha (0.0496) form the core, associated with economic activity and the resource base of the territory. Their comparable values indicate that the model is based not on a single indicator but on a consistent set of factors describing the economic potential and employment structure.

Figure 13 visualizes the final ranking of the 20 territories with the highest intervention priority based on the Final Priority Index, an integrated indicator that aggregates the model’s probabilistic vulnerability assessment and fuzzy risk/resilience indicators into a single management prioritization scale. The higher the index value, the greater the territory’s expected need for targeted support measures, all other things being equal. The diagram is constructed as ordered horizontal columns, allowing for the direct comparison of territories and identifying the top of the list as the most critical for immediate response.

The key point of the result is the transition from the abstract “High/Low” classification to an operational planning tool that assigns each territory a numerical priority assessment. This eliminates ambiguity in resource allocation: instead of a binary decision, a scale is formed on which it is possible to (i) set thresholds for different levels of intervention, (ii) prioritize surveys and programs, (iii) justify the selection of territories in reporting, and (iv) compare the expected effects with a limited budget. Since the index is constructed based on a single attribute space and a deterministic normalization/aggregation procedure, the resulting rating is reproducible. It can be recalculated regularly as data is updated, ensuring the monitoring of dynamics and control over the effectiveness of measures. Thus, Figure 10 presents the main applied output of the study: a ready-made list of territories for targeted social policy planning, with priorities formed quantitatively, transparently, and consistently with the model’s vulnerability assessment and interpretable risk factors.

Figure 14 is a scatterplot where the x-axis represents the fuzzy risk index

R_{i}

, the

y

-axis represents the model-predicted probability of high vulnerability

p_{i} = P (y_{i} = 1 ∣ X_{i})

, and the color of the dot encodes the resulting priority index (Priority index). This visualization is used to verify that the expert fuzzy layer and the statistical classifier do not contradict each other but form a consistent decision-making framework. The key observation is the presence of a consistent division of objects by probability: dots with high

p_{i}

values form the upper region of the graph, and low

p_{i}

values the lower one, with elevated values of the resulting priority concentrated where significant risk and a high probabilistic vulnerability assessment simultaneously manifest. This confirms that fuzzy risk functions as a meaningful, interpretable signal that supports the model’s statistical inference and enhances managerial interpretation: areas with high

p_{i}

and relatively high

R_{i}

receive a higher color priority level, consistent with the logic of targeted interventions.

At the same time, the figure also reveals an important practical detail: there are points for which

R_{i}

and

p_{i}

may not be completely synchronous (for example, moderate

R_{i}

with high

p_{i}

, or vice versa). This is expected and methodologically correct, because

R_{i}

is formed from expertly defined rules for a limited set of key factors, while

p_{i}

is the result of multivariate learning on an extended feature space; such cases are interpreted as areas where the statistical profile of the data reveals additional combinations of factors not fully reflected in the fuzzy rules, or where the expert contour signals a risk with insufficient statistical confidence. In both scenarios, the final index’s color scale provides a compromise that carefully combines the sources of information. Thus, Figure 13 demonstrates the correctness of integrating the expert (fuzzy) and statistical (classification) components: a high priority is formed predominantly in the zone where both assessments consistently indicate risk, which increases confidence in the results and makes the final ranking managerially explainable.

Figure 15 illustrates the scenario analysis for the selected priority area: the x-axis shows the management scenarios for changing the factors, and the y-axis shows the probability of high vulnerability (pi) calculated by the model. The dotted line shows the working decision threshold

t^{*} = 0.35

, relative to which the transition of the area to the high-risk class is interpreted. Comparison with the Base scenario demonstrates how the model responds to targeted interventions: increasing employment (Employment +15%), adjusting the coverage/SRS indicator (SRS +10%), reducing the tariff burden (Tariff −10%), improving transport accessibility/reducing remoteness (Distance −20%), and a comprehensive intervention (Combined policy).

The key point of the figure is to demonstrate that the model supports counterfactual assessment. When the input control variables change,

p_{i}

is recalculated, allowing the potential impact of alternative measures to be compared on a single numerical scale. In the example shown, the probability changes have a small amplitude (the values are close to each other), which is interpreted as an indication that for this area, the risk is not shaped by a single dominant factor, but by a combination of conditions, and individual local improvements have a limited immediate effect. At the same time, the integrated scenario reflects the principle of systemic impact: even with moderate shifts in individual areas, combining measures ensures the most consistent change in the forecast. It serves as the basis for selecting an intervention package. Thus, Figure 15 demonstrates that the model is applicable not only as a diagnostic mechanism (assessing current risk), but also as a tool for planning and comparing interventions: it is possible to test the sensitivity of the vulnerability probability to controllable factors, rank measures by the expected effect, and justify which combinations of changes are more appropriate in conditions of limited resources. The practical ranking of territorial units represents the final applied result of the study, a ranked list of rural districts linked to administrative affiliation (district), a unique KATO identifier, and three quantitative decision components, the probability of high vulnerability

p_{i}

, fuzzy risk indices

R_{i}

, and resilience

S_{i}

, and the final integral indicator, the Final Priority Index. The top of the ranking is concentrated among territories with very high

p_{i}

values (in some cases,

p_{i}

≈ 0.99), indicating statistically significant classification as high-vulnerability at the chosen decision threshold (Table 14).

At the same time, the

R_{i}

values are in the medium–high range (approximately 0.31–0.45), reflecting a consistent expert signal on key risk factors. In this case,

S_{i}

in the presented sample equals zero, indicating the absence of activated sustainability rules for these territories and, consequently, the absence of compensating factors in the expert part. The resulting Final Priority Index (≈0.77–0.88) aggregates these components. It provides a comparable scale of management prioritization: the higher the index, the more justified the inclusion of the territory within the contour of priority measures (additional surveys, targeted support programs, resource planning). The table’s structure makes the results verifiable and operationalizable: The presence of the KATO ensures unambiguous integration into departmental contours, and the simultaneous presentation of

p_{i}

,

R_{i}

,

S_{i}

, and the final index allows for a distinction between situations of “high risk according to data with a moderate expert signal” and “high risk confirmed by both contours,” which is important for selecting the type of intervention. Taken together, the table provides a ready-made basis for developing a roadmap for management measures, as it translates the model’s output into a specific list of territories with a quantitative justification for priority. The conclusion of the work is confirmed by the final quality of the best Proposed Fuzzy-XGBoost model (Test F1 = 0.7333, Test ROC-AUC = 0.8291, Integrated Score = 0.7680): the model not only diagnoses vulnerability but also generates a reproducible index suitable for the practical planning of targeted social policy.

Overall, the obtained results demonstrate that the proposed hybrid framework is effective not only for forecasting but also for practical decision support. Improved identification of high-risk areas reduces the likelihood of missing critical cases, which is crucial for timely intervention and efficient resource allocation. Furthermore, integrating fuzzy indices provides an interpretable understanding of the main vulnerability factors, allowing decision-makers to understand both the classification results and their causes. Thus, the proposed approach can be directly applied to real-world administrative and regional planning problems.

4. Discussion

The results of this study demonstrate that the proposed Fuzzy-XGBoost hybrid framework offers a balanced trade-off between predictive performance and interpretability for the tabular classification of territorial vulnerability. While the model demonstrates competitive results (F1 ≈ 0.69–0.71 and ROC-AUC ≈ 0.81–0.83 across various configurations), the improvements over baseline approaches remain modest rather than significant. This suggests that the primary contribution of the proposed approach lies not only in maximizing forecasting accuracy but also in enhancing the transparency and interpretability of the decision-making process.

An ablation study shows that the baseline configuration (Raw + 0.50) achieves the highest F1 score (0.714), indicating that the original feature space already contains strong predictive signals. In contrast, the configuration with improved fuzziness combined with threshold optimization (Fuzzy + optimized τ*) yields a slightly lower F1 score (0.688). Still, it demonstrates improved recall (0.786), which is particularly important in risk-sensitive applications where false negatives are costly. This highlights a key trade-off: threshold optimization primarily affects the balance between precision and recall rather than improving the model’s internal ranking performance, as evidenced by the relatively stable ROC-AUC values across all configurations (0.804–0.824).

Uncertainty analysis further suggests that the observed differences between configurations should be interpreted with caution. Bootstrap confidence intervals for F1 are relatively wide (e.g., [0.467; 0.848] for the proposed configuration), and nested repeated cross-validation shows moderate variability (F1 = 0.676 ± 0.091). These results confirm that the performance differences are not statistically dominant and are due to the limited sample size. Therefore, the proposed model should be viewed as competitive with, rather than definitively superior to, existing experimental conditions.

In terms of interpretability, the SHAP analysis shows that a small number of features dominate the global model behavior, particularly demographic and economic indicators such as population size and investment. Meanwhile, local explanations indicate that forecasts are influenced by a combination of factors, including fuzzy-logic-derived features. Notably, the fuzzy risk component contributes to individual forecasts, confirming that the expansion of fuzzy features provides significant explanatory value at the local level. However, an important limitation of the current model is the inactive behavior of the fuzzy robustness component. Empirical analysis shows that the robustness index remains zero across all observations, indicating that the corresponding rule base is not activated under the current data conditions. This suggests that the current formulation of robustness rules is overly restrictive and fails to account for dataset variability. As a result, the interpretability of the fuzzy layer is effectively determined solely by the risk component. This finding does not refute the hybrid framework but highlights the need to improve the robustness of the modeling strategy, for example, by relaxing rule constraints or redesigning membership functions. A re-audit of the feature space revealed that the ‘forecast_projects’ feature has zero variance and carries no discriminatory information. In the revised experiments, this feature was excluded before training the models, and its constancy was explicitly documented in the data quality control Section 3.3. Training Models for Calculating the Efficiency of Methods.

Compared to existing studies on tabular classification and hybrid fuzzy-machine models, the proposed approach achieves comparable predictive performance while offering improved interpretability. Previous work has shown that combining fuzzy logic with machine learning can improve transparency, but often at the cost of increased model complexity or limited scalability. In this study, fuzzy feature augmentation is integrated directly into the feature space, enabling the model to remain compatible with standard machine learning pipelines. However, the results confirm that the benefits of fuzzy augmentation are context-dependent and do not always lead to significant performance improvements. From a practical perspective, the proposed model is particularly relevant for decision support systems, where interpretability, transparency, and prioritization are crucial. The ability to generate probabilistic results, optimize decision thresholds, and produce ranked lists of territories provides policymakers and regional planners with useful information. Even without significant efficiency gains, the model’s interpretability can enhance trust and facilitate informed decision-making.

However, several limitations should be acknowledged. First, the relatively small dataset (166 observations) limits the statistical power of the estimates and may limit generalizability. Second, the current robustness component requires revision to ensure meaningful activation. Third, the model was tested on a single dataset, and its robustness across different regional or subject-matter settings remains to be tested.

Future research will focus on expanding the dataset, improving the fuzzy rule base, and incorporating more advanced modeling methods, such as transformer-based architectures for tabular data. Furthermore, additional work will explore alternative calibration strategies and uncertainty-aware decision systems to improve forecast reliability in high-stakes applications.

5. Conclusions

This study presents a reproducible hybrid computational framework for the explainable classification of territorial vulnerability using heterogeneous tabular data. The proposed approach integrates a fuzzy semantic aggregation layer with the XGBoost classifier, enabling the combination of interpretable factor modeling and nonlinear probabilistic forecasting within a single analytical pipeline. Unlike traditional single-stage approaches, the framework was designed to jointly address the challenges of uncertainty representation, feature space refinement, probabilistic classification, and threshold-based decision making. Empirical evaluation was conducted on a single dataset composed of 11 administrative sources and containing 166 territorial units described by 76 attributes. Experimental results on a 75/25 stratified split showed that the proposed Fuzzy–XGBoost model achieved the best overall results among the evaluated models, with an F1 score of 0.7333, an ROC-AUC of 0.8291, and an Integrated Score of 0.768. The results also showed that optimizing the probability threshold to τ = 0.35 improved sensitivity to highly vulnerable areas and reduced the number of false negatives compared to the standard threshold, which is particularly important in decision support systems with asymmetric error costs. From a computational perspective, the main contribution of the study lies not only in the predictive performance of the final classifier but also in the workflow architecture itself. The fuzzy layer acts as an interpretable feature expansion mechanism that transforms raw administrative metrics into semantically meaningful aggregate descriptors, while the boosting component captures hidden nonlinear relationships in the expanded feature space. This combination enhances the practical utility of the model by linking explainability and predictive discrimination, rather than treating them as separate objectives. Feature contribution analysis also confirmed that the proposed framework preserves the meaningful structure of the data and supports interpretable analysis of the variables most closely related to vulnerability classification.

Another important result is that the proposed method provides a transferable template for hybrid intelligent systems working with real-world structured data. The study demonstrates that fuzzy formalization can be effectively integrated into supervised machine learning as a reproducible preprocessing and feature engineering layer, rather than being used solely as a stand-alone expert system. This makes the framework relevant not only for the territorial case under consideration, but also for broader tabular classification problems in which uncertainty, partial observability, and the need for interpretable risk ranking are central methodological constraints.

At the same time, several limitations should be acknowledged. The current experiments were conducted on a medium-sized regional dataset, and the model was validated within a single territorial context. Therefore, the generalizability of this model to other regions, time periods, and administrative structures requires further research. Furthermore, the final model performance may depend on the chosen imputation strategy, the structure of the membership functions, and the criterion used for threshold optimization. These aspects should be more systematically explored in future studies through robustness analysis, temporal validation, and cross-regional comparison.

Future research should extend this model in several directions. First, cross-regional and longitudinal validation is necessary to assess its transferability under structural data changes. Second, ablation experiments should be conducted to quantify the individual contributions of the fuzzy feature layer, the boosting model, and threshold optimization. Third, deeper integration with explainable AI methods could improve the local and global interpretability of the forecasts. Finally, the proposed framework can be generalized to broader classes of heterogeneous tabular decision support problems where ranking quality, uncertainty handling, and operational transparency are as important as classification accuracy.

Author Contributions

Conceptualization, A.A. and R.M.; methodology, A.A. and A.M.; software, G.M. and G.O.; validation, A.A., L.D., and Z.A.; formal analysis, A.M. and R.M.; investigation, A.A., S.B., and A.S.; resources, R.M. and L.D.; data curation, G.M. and G.O.; writing—original draft preparation, A.A.; writing—review and editing, A.M., R.M., and L.D.; visualization, G.M. and S.B.; supervision, R.M., L.D., and Z.A.; project administration, A.A.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are publicly available at: https://github.com/aimanakynbekova6-ui/Hybrid-Architecture-of-Fuzzy-Logic/tree/main (accessed on 10 February 2026).

Acknowledgments

The authors would like to express their sincere gratitude to Gulzira Abdikerimova for her valuable advice and constructive comments that significantly improved the quality of this work.

Conflicts of Interest

Author Zhanat Abdikadyr was employed by the company Non-Profit Joint Stock Company Astana Medical University. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under the Curve
CV	Cross-Validation
F1	F1-Score
FN	False Negative
FP	False Positive
KATO	Classifier of Administrative-Territorial Objects
KNN	K-Nearest Neighbors
MI	Mutual Information
ML	Machine Learning
ROC-AUC	Area Under the Receiver Operating Characteristic Curve
SOTA	State-of-the-Art
SRS	Social Registry System
TN	True Negative
TP	True Positive
XGBoost	Extreme Gradient Boosting

References

Saatchi, R. Fuzzy logic concepts, developments and implementation. Information 2024, 15, 656. [Google Scholar] [CrossRef]
Bressane, A.; Garcia, A.J.D.S.; Castro, M.V.D.; Xerfan, S.D.; Ruas, G.; Negri, R.G. Fuzzy machine learning applications in environmental engineering: Does the ability to deal with uncertainty really matter? Sustainability 2024, 16, 4525. [Google Scholar] [CrossRef]
Gu, X.; Han, J.; Shen, Q.; Angelov, P.P. Autonomous learning for fuzzy systems: A review. Artif. Intell. Rev. 2023, 56, 7549–7595. [Google Scholar] [CrossRef]
Lu, J.; Ma, G.; Zhang, G. Fuzzy machine learning: A comprehensive framework and systematic review. IEEE Trans. Fuzzy Syst. 2024, 32, 3861–3878. [Google Scholar] [CrossRef]
Júnior, J.S.; Mendes, J.; Souza, F.; Premebida, C. Survey on deep fuzzy systems in regression applications: A view on interpretability. arXiv 2022, arXiv:2209.04230. [Google Scholar] [CrossRef]
Pickering, L.; Cohen, K.; De Baets, B. A narrative review on the interpretability of fuzzy rule-based models from a modern interpretable machine learning perspective. Int. J. Fuzzy Syst. 2025, 1–20. [Google Scholar] [CrossRef]
Casalino, G.; Castellano, G.; Kaczmarek-Majer, K.; Schicchi, D.; Taibi, D.; Zaza, G. Evolving fuzzy classification for human-centered explainable learning analytics in virtual environments. Evol. Syst. 2025, 16, 119. [Google Scholar] [CrossRef]
Trillo Vílchez, J.R.; Moral Ávila, M.J.D.; Tapia García, J.M.; García Cabello, J.; Cabrerizo Lorite, F.J. Explainable classifier with adaptive optimisation for medical data. Appl. Intell. 2026, 56, 77. [Google Scholar] [CrossRef]
Allani, U. Interactive Diabetes Risk Prediction Using Explainable Machine Learning: A Dash-Based Approach with SHAP, LIME, and Comorbidity Insights. arXiv 2025, arXiv:2505.05683. [Google Scholar] [CrossRef]
Yang, X.; Hao, Y.; Ding, H.; Yu, C.; Liu, J.; Li, L.; Chen, J. Explainable Artificial Intelligence (XAI) framework using XGBoost and SHAP for assessing urban fire risk based on spatial distribution features. Int. J. Disaster Risk Reduct. 2025, 129, 105798. [Google Scholar] [CrossRef]
Kanani-Sadat, Y.; Safari, A.; Nasseri, M.; Homayouni, S. A novel explainable PSO-XGBoost model for regional flood frequency analysis at a national scale: Exploring spatial heterogeneity in flood drivers. J. Hydrol. 2024, 638, 131493. [Google Scholar] [CrossRef]
Wen, H.; Liu, B.; Di, M.; Li, J.; Zhou, X. A SHAP-enhanced XGBoost model for interpretable prediction of coseismic landslides. Adv. Space Res. 2024, 74, 3826–3854. [Google Scholar] [CrossRef]
Fu, X.; Wang, M.; Zhang, D.; Chen, F.; Peng, X.; Wang, L.; Tan, S.K. An XGBoost-SHAP framework for identifying key drivers of urban flooding and developing targeted mitigation strategies. Ecol. Indic. 2025, 175, 113579. [Google Scholar] [CrossRef]
Zhou, X.; Liu, G.; Wu, Y.; Wang, G.; Xu, Y.; Liu, T. XGBoost-SHAP based multi-scale evaluation and nonlinear response of river health in Guangdong Province. Ecol. Indic. 2025, 178, 114138. [Google Scholar] [CrossRef]
Mutia, E.; Azmeri, A.; Yulianur, A.; Achmad, A.; Meilianda, E. Multifactor analysis of urban pluvial flooding using a comprehensive vulnerability index. Jàmbá-J. Disaster Risk Stud. 2025, 17, 1835. [Google Scholar] [CrossRef]
Rivière, M.; Lenglet, J.; Noirault, A.; Pimont, F.; Dupuy, J.L. Mapping territorial vulnerability to wildfires: A participative multi-criteria analysis. For. Ecol. Manag. 2023, 539, 121014. [Google Scholar] [CrossRef]
Gemechu, G.F.; Wei, W. An Explainable Machine Learning and Cloud-Based Remote Sensing Framework for Monitoring Terrace Degradation and Restoration in Mountain Landscapes. Land Degrad. Dev. 2026, 37, 1093–1109. [Google Scholar] [CrossRef]
Pirasteh, S.; Mafi-Gholami, D.; Li, H.; Wang, T.; Zenner, E.K.; Nouri-Kamari, A.; Frazier, T.G.; Ghaffarian, S. Social vulnerability: A driving force in amplifying the overall vulnerability of protected areas to natural hazards. Heliyon 2025, 11, e42617. [Google Scholar] [CrossRef]
Han, Z.; Meng, L.; Mitani, Y.; Kawano, K.; Sugahara, T.; Taniguchi, H.; Honda, H.; Li, Z. Machine Learning-Based Assessment of Building Evacuation Vulnerability at the Pre-Disaster Stage. Sustain. Cities Soc. 2025, 130, 106571. [Google Scholar] [CrossRef]
Reimann, L.; Koks, E.; de Moel, H.; Ton, M.J.; Aerts, J.C. An empirical social vulnerability map for flood risk assessment at global scale (“GlobE-SoVI”). Earth’s Future 2024, 12, e2023EF003895. [Google Scholar] [CrossRef]
Drakes, O.; Restrepo-Osorio, D.; Powlen, K.A.; Hines, M. Social vulnerability and water insecurity in the western United States: A systematic review of framings, indicators, and uncertainty. Water Resour. Res. 2024, 60, e2023WR036284. [Google Scholar] [CrossRef]
Boukrentach, H.; Dekkiche, H. GIS-Based Groundwater Vulnerability Assessment Using DRASTIC Method and Fuzzy Logic: Case Study of Mostaganem Plateau in Algeria. Trans. GIS 2026, 30, e70219. [Google Scholar] [CrossRef]
Caruso, G.; Mueller, V.; Villacis, A. Leveraging unsupervised machine learning to examine women’s vulnerability to climate change. Appl. Econ. Perspect. Policy 2024, 46, 1355–1378. [Google Scholar] [CrossRef]
Torpan, S.; Hansson, S.; Orru, K.; Jukarainen, P.; Gabel, F.; Savadori, L.; Meyer, S.F.; Schieffelers, A.; Lovasz, G.; Rhinard, M. Mitigating vulnerabilities with social media: A cross-national study of European disaster managers’ practices. Risk Hazards Crisis Public Policy 2024, 15, 162–179. [Google Scholar] [CrossRef]

Figure 1. Source coverage in unified analytical matrix.

Figure 2. Basic model workflow.

Figure 3. Architecture of the proposed Fuzzy-XGBoost model.

Figure 4. Average Population by Social Category.

Figure 5. Distribution of D+E Vulnerability Share.

Figure 6. Global and local SHAP visual summary.

Figure 7. Distribution of fuzzy resilience and rule activations.

Figure 8. Spearman Correlation Matrix.

Figure 9. Performance comparison of baseline and proposed models.

Figure 10. Confusion Matrices (Part 1).

Figure 11. F1 Sensitivity to Probability Threshold.

Figure 12. Top Feature Importance (Permutation, F1).

Figure 13. Top 20 Priority Territories for Policy Intervention.

Figure 14. Decision Surface: Fuzzy Risk vs. Predicted Probability.

Figure 15. Scenario Analysis for a Priority Territory.

Table 1. Repeated nested CV summary.

Condition	Feature Space	Threshold Type	Mean_F1	Std_F1	Mean_ROC_AUC	Std_ROC_AUC	Mean_Recall	Std_Recall	Median_Threshold
Fuzzy + optimized $t^{*}$	Fuzzy	Optimized on train-only inner OOF	0.67594	0.090979	0.827589	0.066183	0.76	0.152753	0.22
Raw + optimized $t^{*}$	Raw	Optimized on train-only inner OOF	0.670474	0.079012	0.828811	0.068672	0.749091	0.144409	0.21
Raw + 0.50	Raw	Default	0.656321	0.105673	0.830701	0.066571	0.629091	0.117128	0.5
Fuzzy + 0.50	Fuzzy	Default	0.655678	0.109447	0.828351	0.068902	0.629091	0.122867	0.5

Table 2. Nested repeated CV as an additional generalization check.

Condition	Feature Space	Threshold Type	Mean_F1	Std_F1	Mean_ROC_AUC	Std_ROC_AUC	Mean_Recall	Std_Recall	Median_Threshold
Fuzzy + optimized t*	Fuzzy	Optimized on train-only inner OOF	0.67594	0.090979	0.827589	0.066183	0.76	0.152753	0.22
Raw + optimized tau*	Raw	Optimized on train-only inner OOF	0.670474	0.079012	0.828811	0.068672	0.749091	0.144409	0.21
Raw + 0.50	Raw	Default	0.656321	0.105673	0.830701	0.066571	0.629091	0.117128	0.5
Fuzzy + 0.50	Fuzzy	Default	0.655678	0.109447	0.828351	0.068902	0.629091	0.122867	0.5

Table 3. Impact of overlap settings on classification performance metrics.

Setting	Low Shoulder c	High Shoulder a	Selected Threshold	Accuracy	Precision	Recall	F1	ROC-AUC
Narrow overlap	0.45	0.55	0.24	0.761905	0.611111	0.785714	0.6875	0.829082
Baseline overlap	0.5	0.5	0.23	0.761905	0.611111	0.785714	0.6875	0.831633
Wide overlap	0.6	0.4	0.12	0.714286	0.55	0.785714	0.647059	0.829082

Table 4. Bootstrap 95% confidence intervals for the hold-out experiment.

Condition	Metric	Point Estimate	CI Low (2.5%)	CI High (97.5%)
Raw + 0.50	F1	0.714286	0.47619	0.882353
Raw + 0.50	Precision	0.714286	0.454293	0.92869
Raw + 0.50	Recall	0.714286	0.454545	0.928571
Raw + 0.50	ROC-AUC	0.82398	0.676505	0.944451
Raw + optimized τ*	F1	0.628571	0.428463	0.8
Raw + optimized τ*	Precision	0.52381	0.318122	0.739171
Raw + optimized τ*	Recall	0.785714	0.545455	1
Raw + optimized τ*	ROC-AUC	0.816327	0.658155	0.938284
Fuzzy + 0.50	F1	0.538462	0.26087	0.75
Fuzzy + 0.50	Precision	0.583333	0.285714	0.866667
Fuzzy + 0.50	Recall	0.5	0.230769	0.785714
Fuzzy + 0.50	ROC-AUC	0.803571	0.657825	0.930563
Fuzzy + optimized τ*	F1	0.6875	0.466667	0.848485
Fuzzy + optimized τ*	Precision	0.611111	0.375	0.833333
Fuzzy + optimized τ*	Recall	0.785714	0.545455	1
Fuzzy + optimized τ*	ROC-AUC	0.808673	0.652755	0.928399

Table 5. Hold-out ablation study with four controlled XGBoost conditions.

Condition	Feature Space	Threshold Type	Selected Threshold	Accuracy	Precision	Recall	F1	ROC-AUC
Raw + 0.50	Raw	Default	0.5	0.809524	0.714286	0.714286	0.714286	0.82398
Fuzzy + optimized τ*	Fuzzy	Optimized on train OOF	0.25	0.761905	0.611111	0.785714	0.6875	0.808673
Raw + optimized τ*	Raw	Optimized on train OOF	0.1	0.690476	0.52381	0.785714	0.628571	0.816327
Fuzzy + 0.50	Fuzzy	Default	0.5	0.714286	0.583333	0.5	0.538462	0.803571

Table 6. Integration quality control of analytical matrix.

Metric	Value
Total social units (ABCDE)	166
Matched KATO units	160
Rows with full target	166
Rows with at least 10 non-null engineered features	166

Table 7. Descriptive statistics of core indicators.

Indicator	Count	Mean	Std	Min	25%	50%	75%	Max
de_people_share	166	0.19527	0.108522	0.0	0.13767	0.17682	0.230460	1.0
opv_employment_index	166	0.08754	0.059336	0.0106	0.06378	0.07454	0.092900	0.31987
free_land_ratio	166	0.022598	0.106105	0.0	0.0	0.0	0.0	0.85572
srs_coverage_pct	166	74.05662	2.095171	67.5	73.9	73.9	73.9	92.2
tax_balance_mln	166	5.982070	42.74396	−55.9397	0.15722	0.15722	0.157229	498.280
tariff_pressure	166	5090.937	5145.457	0.0	1084.25	5170.77	5431.0795	51,671.67
livestock_total_heads	166	20,938.560	15,575.125	0.0	12,035.75	17,212.0	25,843.0	104,949.0
crop_processing_capacity_tpy	166	2178.034	20,449.16	0.0	0.0	0.0	0.0	253,200.0
project_investment_mln	166	315.5739	343.2260	9.5	72.4975	228.1	416.675	2946.4
practical_priority_score	166	0.409180	0.061451	0.2232	0.38156	0.41305	0.434817	0.688911

Table 8. Sensitivity to membership-function overlap.

Setting	Low Shoulder c	High Shoulder a	Selected Threshold	Accuracy	Precision	Recall	F1	ROC-AUC
Narrow overlap	0.45	0.55	0.24	0.761905	0.611111	0.785714	0.6875	0.829082
Baseline overlap	0.5	0.5	0.23	0.761905	0.611111	0.785714	0.6875	0.831633
Wide overlap	0.6	0.4	0.12	0.714286	0.55	0.785714	0.647059	0.829082

Table 9. Spearman’s association with vulnerability target (N = 166).

Feature	$Spearman ρ$	p-Value
practical_priority_score	0.526	0.0
project_investment_mln	−0.2207	0.0043
tariff_pressure	0.1916	0.0134
free_land_ratio	−0.1765	0.0229
livestock_total_heads	0.1304	0.094
opv_employment_index	−0.1154	0.1389
srs_coverage_pct	0.0603	0.4405
tax_balance_mln	−0.0401	0.6081
crop_processing_capacity_tpy	0.0224	0.7743
forecast_projects	0.0	0.0

Table 10. Kruskal–Wallis test across vulnerability groups.

Feature	Kruskal H	p-Value
practical_priority_score	36.9221	0.0
opv_employment_index	16.8792	0.0002
project_investment_mln	8.4972	0.0143
tariff_pressure	7.0932	0.0288
livestock_total_heads	3.0454	0.2181
free_land_ratio	2.2712	0.3212
crop_processing_capacity_tpy	1.8534	0.3959
tax_balance_mln	0.7895	0.6739
srs_coverage_pct	0.0001	1.0
forecast_projects	0.0	0.0

Table 11. Full comparative model evaluation.

Model	Feature Space	Probability Threshold	CV Accuracy	CV Precision	CV Recall	CV F1	CV ROC-AUC	Test Accuracy	Test Precision	Test Recall	Test F1	Test ROC-AUC	Integrated Score
Proposed Fuzzy-XGBoost	Fuzzy-enhanced features	0.35	0.759	0.662	0.6306	0.6269	0.8283	0.8095	0.6875	0.7857	0.7333	0.8291	0.768
AdaBoost	Raw screening features	0.5	0.7753	0.6897	0.6306	0.6443	0.814	0.7857	0.6667	0.7143	0.6897	0.8112	0.7288
HistGradientBoosting	Raw screening features	0.5	0.775	0.6932	0.6306	0.6491	0.8272	0.7619	0.6667	0.5714	0.6154	0.7985	0.6666
XGBoost (SOTA baseline)	Raw screening features	0.5	0.799	0.7344	0.6778	0.6842	0.8295	0.7619	0.6667	0.5714	0.6154	0.7883	0.6636
Random Forest	Raw screening features	0.5	0.767	0.7214	0.5333	0.6007	0.8341	0.7143	0.5833	0.5	0.5385	0.7985	0.6133
Gradient Boosting	Raw screening features	0.5	0.727	0.5924	0.5556	0.5499	0.7709	0.6667	0.5	0.5714	0.5333	0.7321	0.5973
Extra Trees	Raw screening features	0.5	0.7263	0.65	0.4833	0.528	0.8054	0.7381	0.6667	0.4286	0.5217	0.7653	0.5907
KNN	Raw screening features	0.5	0.7583	0.7543	0.4389	0.5454	0.7279	0.6667	0.5	0.3571	0.4167	0.6467	0.4821
Logistic Regression	Raw screening features	0.5	0.7507	0.6064	0.7528	0.6677	0.7589	0.619	0.4286	0.4286	0.4286	0.5765	0.473

Table 12. Repeated CV comparison with stronger boosting baselines.

Model	Mean_F1	Std_F1	Mean_ROC_AUC	Std_ROC_AUC	Mean_Recall
LightGBM (raw, 0.50)	0.635634	0.08545	0.839396	0.058114	0.585455
Fuzzy XGBoost (optimized tau*)	0.617346	0.108274	0.82894	0.05864	0.676364
XGBoost (raw, 0.50)	0.617291	0.105322	0.831117	0.058139	0.581818
CatBoost (raw, 0.50)	0.604599	0.103209	0.824556	0.05913	0.563636

Table 13. Four-condition ablation study.

Condition	Feature Space	Threshold Type	Selected Threshold	Accuracy	Precision	Recall	F1	ROC-AUC
Raw + 0.50	Raw	Default	0.5	0.809524	0.714286	0.714286	0.714286	0.82398
Fuzzy + optimized $t^{*}$	Fuzzy	Optimized on train OOF	0.25	0.761905	0.611111	0.785714	0.6875	0.808673
Raw + optimized tau*	Raw	Optimized on train OOF	0.1	0.690476	0.52381	0.785714	0.628571	0.816327
Fuzzy + 0.50	Fuzzy	Default	0.5	0.714286	0.583333	0.5	0.538462	0.803571

Table 14. Practical ranking of territorial units.

Rank	District	Rural District	KATO	Predicted High-Vulnerability Probability	Fuzzy Risk	Final Priority Index
1	Shusky district	Dinmukhamed Kunaev	316647000	0.9912999868392944	0.3835	0.8814
2	Shusky district	Tasotkel	316651000	0.88919997215271	0.4474	0.8711
3	Kordai district	Sortobinsky	314854000	0.9977999925613403	0.3157	0.8365
4	Bayzaksky district	Kyzylzhuldyzsky	313643000	0.7134000062942505	0.3306	0.8192
5	Kordai district	Karasaysky	314841000	0.9865000247955322	0.3157	0.8189
6	Shusky district	Dalakainarsky	316633000	0.9688000082969666	0.339	0.8043
7	Kordai district	Betkainarsky	314835000	0.9986000061035156	0.3157	0.801
8	Turar Ryskulov district	Akyrtobinsky	315033000	0.9057000279426575	0.3576	0.8004
9	Shusky district	Birliksky	316632000	0.9861999750137329	0.339	0.7997
10	Kordai district	Zhambylsky	314836000	0.9984999895095825	0.3157	0.7971
11	Kordai district	Karasusky	314843000	0.960099995136261	0.3157	0.7883
12	Shusky district	Koragatinsky	316645000	0.9667999744415283	0.339	0.7852
13	Bayzaksky district	Kostyubinsky	313645000	0.9476000070571899	0.3306	0.7801
14	Bayzaksky district	Burylsky	313635000	0.9546999931335449	0.3306	0.7791
15	Shusky district	Tolebiysky	316630000	0.9459999799728394	0.339	0.779
16	Merkensky district	Andas Batyr	315441000	0.767799973487854	0.3814	0.7785
17	Shusky district	Zhanazholsky	316638000	0.9248999953269958	0.339	0.7757
18	Bayzaksky district	Dikhansky	313637000	0.9332000017166138	0.3306	0.775
19	Kordai district	Kasyksky	314845000	0.9977999925613403	0.3157	0.773
20	Bayzaksky district	Sukhanbaevsky	313647000	0.9542999863624573	0.3306	0.772

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Akynbekova, A.; Mukhanova, A.; Muratkhan, R.; Diyarova, L.; Baigubenova, S.; Murzabekova, G.; Orazymbetova, G.; Satybaldieva, A.; Abdikadyr, Z. A Reproducible Hybrid Architecture of Fuzzy Logic and XGBoost for Explainable Tabular Classification of Territorial Vulnerability. Computers 2026, 15, 259. https://doi.org/10.3390/computers15040259

AMA Style

Akynbekova A, Mukhanova A, Muratkhan R, Diyarova L, Baigubenova S, Murzabekova G, Orazymbetova G, Satybaldieva A, Abdikadyr Z. A Reproducible Hybrid Architecture of Fuzzy Logic and XGBoost for Explainable Tabular Classification of Territorial Vulnerability. Computers. 2026; 15(4):259. https://doi.org/10.3390/computers15040259

Chicago/Turabian Style

Akynbekova, Aiman, Ayagoz Mukhanova, Raikhan Muratkhan, Lunara Diyarova, Saya Baigubenova, Gulden Murzabekova, Gulaim Orazymbetova, Aliya Satybaldieva, and Zhanat Abdikadyr. 2026. "A Reproducible Hybrid Architecture of Fuzzy Logic and XGBoost for Explainable Tabular Classification of Territorial Vulnerability" Computers 15, no. 4: 259. https://doi.org/10.3390/computers15040259

APA Style

Akynbekova, A., Mukhanova, A., Muratkhan, R., Diyarova, L., Baigubenova, S., Murzabekova, G., Orazymbetova, G., Satybaldieva, A., & Abdikadyr, Z. (2026). A Reproducible Hybrid Architecture of Fuzzy Logic and XGBoost for Explainable Tabular Classification of Territorial Vulnerability. Computers, 15(4), 259. https://doi.org/10.3390/computers15040259

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Reproducible Hybrid Architecture of Fuzzy Logic and XGBoost for Explainable Tabular Classification of Territorial Vulnerability

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Description

2.2. Mathematical Model of Data Preprocessing

3. Results

3.1. Analysis of Data Distribution and Statistical Testing of Feature Structure

3.2. Application of Analysis Methods to Check the Structure of Prepared Data

3.3. Training Models for Calculating the Efficiency of Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI