1. Introduction
Modeling the spatial and temporal spread dynamics of infectious diseases has long been a central research topic in the fields of epidemiology and public health. While classical Susceptible-Infectious-Recovered (SIR) and its derivative mathematical models have provided a strong theoretical basis for the fundamental behavior of epidemics by representing interpersonal transmission processes at the macroscale [
1,
2], the COVID-19 pandemic has demonstrated that disease spread is shaped by multidimensional, heterogeneous, and space-specific processes [
3]. Various studies have demonstrated that models that do not holistically consider components such as demographic structure, socioeconomic vulnerability, accessibility to healthcare, geographic proximity, and human mobility cannot accurately represent regional epidemiological risk [
4,
5,
6,
7].
Empirical findings from the COVID-19 period have shown that population mobility decisively affects not only the rate of infection spread but also regional spread patterns [
8,
9]. Studies demonstrating the effects of mobility restrictions on the national and international course of the epidemic support these findings [
10]. The GIS-based spatial analysis literature has demonstrated that spatial heterogeneity is a key component of epidemiological processes by examining case clusters, neighborhood effects, regional hotspots, and indicators of socio-demographic vulnerability [
11,
12,
13].
Deep learning methods have played an important role in modeling complex epidemiological patterns. The LSTM model has been widely used due to its ability to capture nonlinear dynamics [
14,
15,
16], but most of these studies have limited the diversity of data sources and have failed to holistically represent demographic, geographic, or healthcare infrastructure components [
17,
18]. However, several studies have demonstrated that multimodal and multi-view learning approaches can model the relational structure among heterogeneous data sources at higher levels of abstraction [
19,
20,
21,
22,
23,
24]. However, despite this powerful methodological framework, research on the application of multi-view learning approaches in the context of epidemiological risk classification is quite limited.
Epidemic data are primarily reported at the country level, highlighting the lack of multi-country, harmonized datasets. Differences in reporting standards, socio-economic indicators, and geographic data quality create significant methodological challenges, particularly in cross-border epidemic assessments [
25]. Therefore, the creation of new datasets that integrate multi-source qualitative and quantitative indicators is critical for both epidemic risk prediction and spatial public health planning.
In this study, a multi-source, spatially representative epidemic dataset consisting of demographic, health, geographic, and mobility indicators for Türkiye and neighboring countries was created. The dataset’s latitude-longitude spatial structure enables risk classification not only at the level of performance metrics but also through the visualization of spatial risk patterns through mapping. This allows for holistic assessment of border effects, regional similarities, clustering structures, and spatial continuity.
While multi-view learning, attention mechanisms, and CGAN-based augmentation are established paradigms in deep learning, the novelty of MV-RiskNet lies in its domain-specific architectural integration designed to address the unique challenges of cross-border data heterogeneity. Unlike existing models that primarily rely on raw case counts—which are prone to reporting biases across different countries—this study introduces a ’Structural Vulnerability Index’ as a more stable and robust target variable. Furthermore, the proposed architecture explicitly preserves the unique signals of demographic, health, and mobility views through independent subnetworks before attention-based fusion. This ensures that high-dimensional mobility data does not suppress critical but low-dimensional health infrastructure indicators. Consequently, the primary contributions of this work are: (i) the construction of a unique multimodal dataset from reliable statistical sources for Türkiye and its neighbors; (ii) a multi-view framework tailored for regional epidemic risk mapping that handles non-aligned data sources; and (iii) an interpretability layer through attention analysis that provides policy-relevant insights by identifying regional risk drivers. Experimental studies have shown that MV-RiskNet achieves higher accuracy and F1-score compared to methods such as MLP, CNN, LSTM, Autoencoder, Graph Convolutional Network (GCN). The general framework of the study is shown in
Figure 1.
2. Related Works
Epidemic modeling literature has long been centered around SIR and its derivatives, which mathematically represent transmission processes over time [
1,
2]. While these early models provide a strong theoretical basis for epidemic dynamics, they limitedly represent components such as spatial heterogeneity, multi-source data integration, and human mobility, which are critical in modern epidemic processes [
3]. The COVID-19 pandemic has clearly demonstrated that beyond theoretical frameworks, population mobility, socio-demographic structure, health capacity, and environmental factors decisively influence epidemic spread [
4,
5,
6].
Studies examining the relationship between mobility and transmission show that population mobility determines both short- and long-range spread rates and that travel restrictions play a critical role in epidemic control [
7,
8,
9,
10]. Studies examining the effects of mobility restrictions on national and global epidemic trajectories reveal that similar dynamics are observed in different countries [
11,
12]. The integration of geographic information systems (GIS) and spatial statistical methods into epidemiological analyses has been a significant milestone in understanding spatial heterogeneity. GIS-based studies have assessed the spatial characteristics of the epidemic in detail by examining case clusters, hotspots, spatial neighborhood effects, and socio-demographic vulnerability indicators [
13,
14,
15,
16,
17]. Spatial regression-based models show that demographic structure, health capacity, and environmental factors can jointly explain epidemic severity [
18,
19].
Deep learning studies focusing on temporal prediction have played a critical role. LSTM, GRU, and CNN-based approaches have successfully captured nonlinear patterns in epidemic time series and have been widely applied in short/medium-term forecasting of COVID-19 [
20,
21,
22,
23]. Some studies have shown that classical deep learning models are more successful than machine learning methods in many scenarios [
24,
25].
However, most of these models are based on a single modality and cannot integrate different data layers such as demographic, geographic, healthcare infrastructure, and mobility into a holistic representation [
26]. This limitation has increased the importance of multi-source and multi-view deep learning approaches that can model the relationships among heterogeneous data sources. Multi-source methods can provide high performance and interpretability by processing different data views in independent sub-networks and combining them with an appropriate fusion mechanism [
27,
28].
Multi-view deep learning approaches have produced promising results, particularly in biomedical and healthcare data analysis. However, the applications of this powerful methodological framework for epidemic risk prediction are limited [
29,
30]. Furthermore, approaches that address the class imbalance problem commonly encountered in epidemic datasets have played a critical role. GAN, and especially Conditional GAN (CGAN)-based synthetic data augmentation methods, improve classification performance by increasing the sample diversity of underrepresented risk classes [
31].
Overall, the literature indicates that holistic deep learning approaches that simultaneously address multi-source data integration, spatial awareness, and class imbalance in epidemic risk assessments are lacking. Therefore, models that combine multi-view representation with attention-based fusion and incorporate CGAN-assisted data augmentation have the potential to contribute to the literature. A summary of epidemic modeling methods is shown in
Table 1.
3. Material
In this study, a reliable statistical sources consisting of multidimensional indicators for Türkiye and neighboring countries (Greece, Bulgaria, Georgia, Armenia, Iran and Iraq) was collected for regional epidemic risk prediction. This dataset, which includes demographic structure, economic indicators, health infrastructure, geographic location, and mobility, provides a holistic representation of the key factors affecting epidemic risk. Raw data was compiled from official governmental and international statistical databases such as TurkStat, Eurostat, WHO, UNData, and other international open data portals. Administrative classification differences between countries were supplemented, when necessary, with verifiable open sources such as Wikipedia. Geographic coordinates (latitude–longitude) were obtained using Google Maps 3.64/Google Geocoding 4.0.1 services to represent the center point of each administrative unit.
Differences in definition, scale, and measurement across data components from different countries were resolved through a unification process, transforming all indicators into a single, integrated analysis structure. The target variable, Risk_Level (Low–Medium–High), was labeled as 0–1–2. Due to the significant imbalance in the class distribution, CGAN-based class-conditional synthetic sample generation was implemented during the training process; the generated samples were included only in the training set, increasing representativeness, particularly in the high-risk class. All modeling processes were implemented in Python 3.14.0.
3.1. The Dataset
The unique dataset used in this study was created by integrating multi-source indicators of epidemic risk across administrative regions across seven countries. Each row in the dataset represents a region. The columns consist of 10 numerical variables representing the demographic, economic, health, geographic, and mobility characteristics of the respective region.
The demographic view includes the variables Population, Population Density, Economy_Euro, and Economy_Population_Ratio. The health view consists of Hospital and Population_Per_Hospital variables, which reflect regional healthcare capacity. The geographic view includes Area, Latitude, and Longitude coordinates. This structure allows for the modeling of spatial proximity, neighborhood relationships, and regional similarities. The mobility view includes the Transportation variable, which represents the region’s accessibility and mobility potential.
The Risk_Level variable in the dataset is defined as Low-Medium-High, and these levels are labeled 0–1–2, respectively. This variable was also used as a condition variable in CGAN-based synthetic data generation. The multi-source and heterogeneous structure of the dataset allows for the application of multi-view deep learning approaches. The structure of the dataset and the view-based variable grouping are shown in
Figure 2.
3.2. Target Variable Definition and Labeling
The target variable, Risk_Level, was defined using a standardized labeling procedure to ensure cross-border consistency and mitigate reporting biases across different healthcare systems. Instead of raw case counts, we utilized the incidence rate (total confirmed cases per 100,000 population) as the primary metric. To categorize regional risks into three classes (Low, Medium, and High), we applied percentile-based thresholds ( and ) calculated over the entire multi-country dataset. Formally, a region is labeled as Low Risk: if ≤, Medium Risk: if < ≤, High Risk: if > .
This relative thresholding approach accounts for demographic differences and ensures that the model reflects the regional risk distribution. To prevent data leakage, the variables used for labeling (incidence rates and case counts) were strictly excluded from the input feature set of the MV-RiskNet model.
To ensure the reliability of these labels and address potential spatial autocorrelation—where nearby regions share similar characteristics—we implemented a rigorous evaluation design. Beyond standard random splitting, we employed Spatial Cross-Validation (Blocked CV) by clustering regions into 15 distinct spatial blocks based on their geographic coordinates. Furthermore, a Country-Holdout (Leave-One-Country-Out) validation was conducted. This approach ensures that the model’s performance is not inflated by ’easy’ spatial dependencies and demonstrates its ability to generalize across different national borders.
3.3. Data Preprocessing
Because the dataset is multi-source and heterogeneous, a preprocessing phase was implemented before the modeling process. First, missing values and inconsistencies in the raw data collected from different countries were systematically evaluated. Missing values in numerical variables were imputed using the mean and median, considering the characteristics of the variable distributions, and distance-based imputation strategies were applied in geographically similar regions. For variables consistently reported at the country level, country-based imputation was preferred to maintain spatial integrity.
To reduce the impact of scale differences in model training and bring all features within a comparable range, continuous variables were standardized using z-score normalization. Normalization coefficients were calculated only on the training data, and data leakage was prevented by applying the same parameters to the test/validation data. This standardization function is given in Equation (
1). In the equation,
represents the value of the raw feature of the region
i, and
and
represent the mean and standard deviation calculated from the training set for the relevant variable. This transformation eliminates scale differences between variables, ensuring a more stable model learning process.
The training, test, and validation percentages of the dataset were selected as 70/15/15. A linear correlation analysis was performed on the disaggregated data to understand the relationships between variables. The correlation matrix shows significant positive relationships among demographic, economic, and mobility indicators. This suggests that epidemic risk is shaped not by a single factor but by multidimensional socio-demographic interactions. The correlation matrix of the dataset is as shown in
Figure 3.
3.4. Class Imbalance and Data Augmentation with CGAN
There is a significant sample imbalance between classes in the Risk_Level variable. Furthermore, limited representation of the high-risk category can negatively impact the decision boundaries of classification models and reduce the model’s performance in distinguishing this class. Therefore, CGAN approach, which generates class-conditional synthetic samples, was used in the training process.
The CGAN-based data augmentation process consists of two stages: generating synthetic examples conditionally based on classes and simply adding these examples to the training set. The general framework of the process is shown in
Figure 4. This structure generates examples only for underrepresented classes, while the validation and test subsets are kept independent of the synthetic data, preserving evaluation impartiality.
CGAN generates synthetic examples that mimic the statistical structure specific to each class by taking as input the noise vector
z and the class label
c. The generator’s computational flow is given in Equation (
2). The input
is processed through successive linear transformations and activation functions to transform the input
into class-conditioned synthetic examples. The discriminator, on the other hand, evaluates whether an example is real or generated, along with its class information, and this structure is given in Equation (
3). The CGAN training process proceeds within the framework of classical min–max game theory between the generator and discriminator, with the discriminator loss defined by Equation (
4) and the generator loss defined by Equation (
5).
The generated synthetic examples were returned to their original scale by applying inverse normalization and only then added to the training set. This process balanced the class distribution, significantly improving the learning capacity of the high-risk class. The class-conditional generation process with CGAN is shown in
Figure 5.
4. Methods
The proposed methodology distinguishes itself from standard classification approaches by implementing a multi-view learning paradigm that treats demographic, infrastructure, and mobility data as distinct information streams. This architectural choice allows for more robust feature fusion in cross-border epidemic contexts where data alignment is traditionally challenging. In this study, we propose an integrated model combining multi-view representation, data augmentation with CGAN, and a comparative evaluation of various deep learning models for modeling regional epidemiological risk across heterogeneous data sources. The proposed model defines a multi-source feature space consisting of demographic, health, mobility, and geographic indicators. The class imbalance problem is addressed with CGAN, and the proposed MV-RiskNet model, which processes multi-view representations, is used as the main classifier and compared with the models (MLP, AE-CLS, CNN, LSTM, GCN) shown in
Figure 6.
4.1. MLP Model
MLP is a deep learning model consisting of multilayered fully connected structures. The MLP used in this study classifies input features using the Dense→ReLU→Dropout→Dense blocks. While MLP can capture nonlinear relationships, it cannot directly model spatial neighborhood structure, view-specific dependencies, and heterogeneity among data sources. Therefore, it is used only as a baseline.
4.2. Autoencoder-Classifier (AE-CLS)
The AE-CLS model is a two-stage architecture that combines an autoencoder structure that learns a low-dimensional latent representation from the input features and a classifier block that performs classification based on this representation. In the first stage, the encoder component compresses the high-dimensional input vector to produce a more compact latent representation, as shown in Equation (
6). In the second stage, the classifier uses this latent representation generated by the encoder to estimate the risk level, and the classification process is given in Equation (
7).
This approach allows high-dimensional features to be transferred to a more organized, lower-dimensional latent space. However, because it cannot independently model multi-view structures, it cannot distinguish between view-based relationships. Consequently, AE-CLS has a more limited representational power compared to the view-partitioned representation offered by MV-RiskNet.
4.3. CNN Model
CNN model has been widely used to extract patterns from epidemic time series, particularly during the COVID-19 period. The 1-dimensional CNN model used in this study processes regional feature vectors sequentially via Conv1D→ReLU→MaxPooling→Fully Connected blocks and serves as a basic feature extraction mechanism. CNN is effective at capturing local patterns, but offers limited representation capabilities compared to MV-RiskNet because it cannot directly model spatial neighborhood information or multi-view relationships.
4.4. LSTM Model
The LSTM model demonstrates strong performance in capturing time dependencies in sequential epidemic indicators. Therefore, the LSTM model was used as a baseline comparison model in this study. The model consists of a one-way LSTM layer, a dropout layer, and a fully connected classifier. While the LSTM model is successful in representing temporal relationships, it cannot directly handle regional proximity, mobility flows, or multi-view structure. Therefore, it provides a limited basis for comparison for spatial epidemiological risk models.
4.5. GCN Model
GCN is a deep learning model that models information propagation between nodes through a graph structure. In GCN, node representations at each layer are updated using a normalized adjacency matrix containing the adjacency structure, and this update process is given in Equation (
8). The GCN model used in this study uses only the adjacency matrix, which represents geographic neighborhood relationships, allowing spatial dependencies to be evaluated within a single view. This method was chosen to demonstrate the contribution of MV-RiskNet’s multi-view and attention-based fusion mechanism in isolation. GCN is advantageous in that it can directly model adjacency relationships, but it can only represent pairwise spatial proximity. Its ability to integrate multi-view information or learn higher-order relationships is limited.
4.6. The Proposed Model: MV-RiskNet
The MV-RiskNet model, based on a multi-view approach, is proposed to represent the multidimensional structure of regional epidemic risk in this study. MV-RiskNet generates view-specific latent representations by processing features obtained from four views: demographic (DEMO), health infrastructure (HLTH), mobility (MOB), and geographic location (GEO) in subnetworks. This model enables independent learning of the contribution of each view to epidemic risk and enables processing of heterogeneous data components.
For each view, the input feature set
is processed by the view’s subnetwork to produce a view-specific representation
, formulated in Equation (
9). Subnetworks consist of multi-layer fully connected (MLP) blocks to capture the structural properties of the relevant view and independently model demographic structure, health capacity, spatial location, and mobility dynamics.
The view-specific representations are then combined with an attention-based fusion mechanism. This mechanism weights the representations by learning the relative importance of each view on epidemic risk. The formula for the final combined representation (
h) is given in Equation (
10). This allows the model to selectively evaluate the relationships between different data views, ensuring interpretability. The resulting combined representation is fed to the classifier network that estimates the final risk level, and the output probabilities are obtained with the softmax function, whose formula is given in Equation (
11). The model is optimized with a weighted cross-entropy loss to reduce class imbalance. This increases the learning capacity of the high-risk class.
High-level architectural visualization of the proposed MV-RiskNet model is shown in
Figure 7. A detailed block diagram of the proposed MV-RiskNet model, including view-specific subnetworks, attention-based fusion and classifier stages, model is shown in
Figure 8. MV-RiskNet was trained and compared in two different scenarios:
MV-RiskNet without CGAN (baseline): Using the original training data, only the contribution of the multi-view model was analyzed.
MV-RiskNet with CGAN: For underrepresented classes, synthetic examples generated by the CGAN were added to the training set, and the model was trained on a balanced data distribution. This version significantly improved the discriminability, especially for the high-risk class.
4.7. Performance Evaluation Metrics
To evaluate model performance, we used the Accuracy, Precision, Recall, F1-score, and ROC-AUC metrics, which are commonly used in multi-class classification problems. Accuracy is a fundamental measure of the model’s overall accuracy and is calculated as the ratio of correctly classified examples to the total examples, as shown in Equation (
12). However, in structures with class imbalance, Accuracy alone can be misleading. Therefore, the Precision and Recall metrics were additionally evaluated. Precision indicates the proportion of examples predicted as positive by the model that are actually positive, while Recall indicates the extent to which true positive examples are captured. These two metrics are formulated in Equation (
13) and Equation (
14), respectively.
The F1-score, which measures the balance between precision and recall, is a metric that more accurately reflects the model’s performance in class discrimination in imbalanced datasets and is formulated in Equation (
15). ROC-AUC analysis was also applied to examine inter-class discrimination performance. In this approach, each class is treated as a separate binary classification problem against all other classes, and an ROC curve is generated for each. The average of the resulting AUC values provides an additional indicator of the model’s overall multi-class discrimination capacity.
5. Experimental Results
In this section, the proposed CGAN-assisted MV-RiskNet model is comprehensively compared with feature-based (MLP, CNN, LSTM, AE-CLS) and graph-based (GCN) approaches. All models were trained under the same training-test split and consistent hyperparameter settings, ensuring a fair comparison between the methods. Considering multi-class and unbalanced data structures, the most reliable performance indicator is the F1-score, and complementary metrics such as accuracy, precision, and recall are also reported. The experimental results reveal the limitations of both classical feature-based approaches and graph-based approaches. It is demonstrated that the proposed model, which integrates multi-view representation and a CGAN-based data augmentation mechanism, provides significant superiority in epidemic risk prediction.
The experimental procedures were implemented using Python and TensorFlow on a MacBook Pro workstation equipped with an Apple M1 Pro chip. The optimization was performed using the Adam optimizer with a learning rate of 0.001 for the classification model and 0.0002 for the CGAN component, with batch sizes set to 16 and 64, respectively. To ensure reproducibility, each sub-network in MV-RiskNet was designed with three fully connected layers (64, 32, and 16 units) using ReLU activation and a dropout rate of 0.2 to prevent overfitting. The training process consisted of 2000 epochs for CGAN pre-training and 25 epochs for the MV-RiskNet model. The random seed was fixed to 42 for all experiments. Furthermore, a de-identified version of the dataset and the implementation scripts will be made available in a public repository upon publication to support further research.
5.1. The General Classification Performance
In experimental studies, MLP, CNN, LSTM, AE-CLS, and GCN models were selected as reference methods. While feature-based approaches (MLP, CNN, LSTM, AE-CLS) directly process demographic, geographic, and mobility indicators that define epidemiological risk structure, they have limited representation of inter-regional relational dependencies. While MLP and CNN models achieved a certain level of discrimination in low- and medium-risk classes thanks to their ability to capture local patterns in the input space, performance losses were observed, particularly in the high-risk class, due to their inability to holistically model multiple types of relationships (geographic proximity, structural similarity, mobility interactions). In this non-sequential data structure, the LSTM model exhibited a learning behavior sensitive to class imbalance and produced less stable results in terms of inter-class discrimination. The AE-CLS model offers a more stable structure because it produces low-dimensional latent representations. However, it failed to achieve optimal performance because it cannot explicitly model appearance-based relationships.
The graph-based GCN model obtained a rather limited view of the relationship by using only geographic adjacency information. This led to significant performance losses, particularly in distinguishing between medium and high-risk classes. The inadequacy of geographic neighborhood information alone in explaining epidemiological risk limited the model’s overall discriminatory capacity.
The proposed MV-RiskNet model outperformed all reference methods by integrating multiple view representations within the same learning framework and employing an attention-based fusion mechanism. In particular, its ability to combine heterogeneous information from different data views into a common representation significantly strengthened both the model’s generalization capacity and inter-class discrimination.
The CGAN data augmentation method applied to the data in the previous stage of the model reduced class imbalance, increased the diversity of representations in the high-risk class, and improved the performance of the MV-RiskNet model. While the gains provided by CGAN are relatively limited, the proposed MV-RiskNet model with CGAN has been shown to achieve the highest values across all metrics. Both multi-view based representation and class balance focused synthetic data generation played an important role in epidemic risk prediction, and the results are shown in
Table 2.
To ensure a rigorous and fair evaluation, the performance of the proposed MV-RiskNet was benchmarked against several state-of-the-art baselines, including competitive tabular models such as XGBoost and LightGBM, and a tuned Early Fusion MLP. These models represent the current standard for tabular data classification. As shown in
Table 2, while gradient-boosted trees (XGBoost and LightGBM) show strong predictive capabilities with accuracies of 91.55%, they still underperform compared to the proposed architecture. This performance gap highlights that standard tabular models fail to capture the multi-view spatial dependencies and regional interactions that our attention-based architecture explicitly models. Specifically, the integration of CGAN-based augmentation allows MV-RiskNet to reach a peak accuracy of 97.20%, outperforming the strongest traditional baselines by a significant margin.
In experimental studies, MLP, CNN, LSTM, AE-CLS, and GCN models were selected as reference methods. While feature-based approaches (MLP, CNN, LSTM, AE-CLS) directly process demographic, geographic, and mobility indicators, they have limited representation of inter-regional relational dependencies. MLP and CNN models achieved a certain level of discrimination in low- and medium-risk classes; however, performance losses were observed in the high-risk class due to their inability to holistically model multiple relationships such as geographic proximity and mobility interactions.
To ensure a fair and rigorous comparison, the graph-based GCN model was implemented using both regional node features and spatial adjacency information. The adjacency matrix for all graph-based evaluations (GCN and MV-RiskNet) was explicitly constructed using a K-nearest neighbors (k = 3) approach based on geographical coordinates (latitude and longitude). Despite having access to the same feature set, the GCN model failed to achieve optimal performance because it lacks the multi-view attention mechanism required to weigh different types of regional interactions dynamically.
To further validate the proposed framework, MV-RiskNet was benchmarked against state-of-the-art tabular models: XGBoost, LightGBM, and a tuned Early Fusion MLP. These models represent the gold standard for tabular classification. As shown in
Table 2, while gradient-boosted trees (XGBoost and LightGBM) show strong predictive capabilities with accuracies of 91.55%, they still underperform compared to our architecture. This gap highlights that even the strongest tabular models fail to capture the multi-view spatial dependencies that our attention-based architecture explicitly models. Specifically, the integration of CGAN-based augmentation allows MV-RiskNet to reach a peak accuracy of 97.20%, confirming that both generative data balancing and multi-view representation are critical for epidemic risk prediction.
In experimental studies, MLP, CNN, LSTM, AE-CLS, and GCN models were selected as reference methods. While feature-based approaches directly process demographic, geographic, and mobility indicators, they have limited representation of inter-regional relational dependencies. Performance losses were particularly observed in the GCN model (66.7% accuracy), which, despite using spatial adjacency, lacked the dynamic weighting of multi-source interactions.
To further validate the proposed framework, MV-RiskNet was benchmarked against state-of-the-art tabular models: Random Forest (RF), XGBoost, and LightGBM. These models represent the gold standard for tabular classification. As shown in
Table 2, while gradient-boosted trees (XGBoost and LightGBM) show strong predictive capabilities with accuracies of 91.55%, and Random Forest achieved a baseline accuracy of 83.33%, they still underperform compared to our architecture. This performance gap highlights that even the strongest tabular models fail to capture the multi-view spatial dependencies that our attention-based architecture explicitly models. Standard models like RF treat features as independent vectors, whereas MV-RiskNet captures the complex spatial autocorrelations inherent in epidemic spread. Specifically, the integration of CGAN-based augmentation allows MV-RiskNet to reach a peak accuracy of 97.2%, confirming that both generative data balancing and multi-view representation are critical for accurate epidemic risk prediction.
5.2. Ablation Study
To rigorously evaluate the individual contributions of the proposed components within the MV-RiskNet framework, a comprehensive ablation study was conducted. We specifically examined the impact of Conditional Generative Adversarial Network (CGAN) based data augmentation and the class weighting strategy on the overall predictive performance. The results of these four distinct configurations are summarized in
Table 3.
As demonstrated in
Table 3, the baseline MV-RiskNet model (without CGAN) achieves a robust accuracy of 94.40%, confirming the strength of the multi-view attention-based architecture. However, applying a traditional class weighting strategy provided only a marginal improvement of 0.45%. In contrast, the integration of CGAN-generated synthetic samples led to a significant performance leap, increasing the accuracy to 97.20%.
This substantial gain indicates that the CGAN does not merely replicate existing data but effectively captures and reconstructs the structural feature manifold of underrepresented high-risk regions. By providing the model with more nuanced synthetic examples, CGAN allows the attention mechanism to better characterize regional risk boundaries that simple re-weighting cannot resolve. The highest performance (97.35%) was achieved by combining both generative augmentation and class weighting, demonstrating a strong synergy between the two methods.
Furthermore, we evaluated the contribution of each data view (Health, Demography, Mobility, and Geography) to the overall risk prediction. As demonstrated in
Table 3, the removal of the Health Infrastructure view caused the most significant drop in accuracy, highlighting it as the primary predictor. Notably, removing the Mobility proxy led to a reduction in accuracy from 97.20% to 91.40%. This drop empirically confirms that even as a single proxy variable, mobility plays a non-redundant and critical role in capturing the dynamics of regional epidemic spread, justifying its emphasis in our analysis.
5.3. Spatial Generalization and Robustness Analysis
As shown in
Table 4, while the random split achieves a high accuracy of 97.20%, the performance remains robust even under more stringent testing conditions. The Spatial Blocked CV and Country-Holdout tests yielded 88.94% and 85.24% accuracy, respectively. The consistency of these results, despite the increased difficulty of predicting entirely unseen geographic clusters or neighboring countries, confirms that MV-RiskNet effectively learns structural epidemiological features rather than merely memorizing local spatial patterns.
5.4. Confusion Matrix Analysis
In addition to performance metrics, a confusion matrix analysis was performed on the test set to examine the model’s error distribution across classes, as shown in
Figure 9. In the MV-RiskNet model without CGAN, it was observed that samples from the high-risk class were partially confused with the medium-risk class. After correcting the class imbalance with CGAN, the model trained significantly increased the correct classification rate of the high-risk class and decreased misclassifications. Thus, CGAN-based data augmentation improved the discriminability of underrepresented classes. As shown in
Figure 9, the proposed model achieved 94.44% accuracy without CGAN and 97.22% accuracy with CGAN.
5.5. Accuracy and Loss Curves of the Proposed Model
The learning behavior, accuracy, and loss curves of the proposed MV-RiskNet model with CGAN are shown in
Figure 10. The model’s optimization process is shown to be both fast and stable. The training and validation accuracy values increased simultaneously from the first epochs and stabilized at approximately 97.22%. This parallel increase demonstrates that the model effectively processes information from multiple view representations and achieves high generalization capacity despite class imbalance.
It was observed that the training loss decreased steadily and significantly, while the validation loss followed a consistent pattern. The low fluctuations in the validation loss indicated that the model did not exhibit overfitting, and that the decision boundaries were learned consistently. This confirms that CGAN-based data augmentation stabilizes the learning process by increasing the diversity of examples, particularly for the high-risk class, and helps the model distinguish between classes more reliably.
The generally close, regular, and stable progression of the accuracy and loss curves demonstrates that the MV-RiskNet model with CGAN can process multiple view representations in an integrated manner, effectively minimize class imbalance, and achieve high-performance learning without overfitting. The accuracy and loss curves shown in
Figure 10 clearly support these findings.
5.6. ROC-AUC Graph
The class-based discrimination performance of the proposed MV-RiskNet model was evaluated using ROC analysis, a standard approach for multi-class problems. In this method, each class is treated as a separate binary classification problem against all other classes, and unique ROC curves are obtained for each class. The ROC-AUC plots for the low, medium, and high risk classes of both the MV-RiskNet model with CGAN (left) and the baseline model trained without synthetic data (right) are shown in
Figure 11.
The results revealed that both models had extremely high class discrimination power. The AUC value of 1.000 for the low-risk class in the model using CGAN demonstrated almost perfect discrimination of this group. AUC values of 0.997 for the medium and high-risk classes demonstrated strong discriminatory performance, even in these epidemiologically more challenging classes. Similarly, the baseline model without CGAN achieved high AUC values across all classes, demonstrating consistent discrimination success. The high AUC values, particularly seen in the high-risk class, support the model’s ability to accurately identify rare and critical regions.
The ROC curves for both models are close to the upper left region, demonstrating that the models offer balanced and consistent performance in terms of both sensitivity (TPR) and false positive rate (FPR). These results confirm the MV-RiskNet model’s robust performance in multi-class epidemic risk prediction, both in terms of overall performance and class-based discrimination. Furthermore, it was observed that the use of CGAN improved data imbalance, while maintaining the high performance of the baseline model.
5.7. Regional Risk Distribution and Mapping
The risk scores generated by the proposed MV-RiskNet model with CGAN were combined with the geographic coordinates of the seven countries considered in this study (Türkiye and its neighbors: Bulgaria, Greece, Armenia, Georgia, Iran, and Iraq) and mapped spatially. Low, medium, and high-risk classes are represented as green, orange, and red circles, respectively, and are shown in
Figure 12. This visualizes the epidemic susceptibility learned by the model in a spatial context.
The map results clearly revealed the spatial clustering of high-risk regions, particularly in Türkiye. In Türkiye, high-risk regions are concentrated primarily in the eastern and southeastern corridors, while moderate risk dominates in the western and inland regions, with the low-risk class distributed in sparser, localized clusters. It is noteworthy that in the Balkans (especially Bulgaria and Greece), high- and medium-risk regions are concentrated along border crossings and coastal/urban areas. In the Caucasus and Iran-Iraq region, risk patterns were observed to follow a spatial pattern consistent with domestic population density, transportation axes, and potential mobility corridors.
These findings demonstrate that the proposed MV-RiskNet with CGAN not only provides high classification accuracy but also consistently represents epidemic susceptibility, incorporating regional and geographic contexts. Thus, the model provides decision-makers with directly usable spatial output for regional prioritization of epidemic risk, resource allocation, and targeted response planning.
The reliability of this spatial mapping and the model’s interpretability were quantitatively validated to support these qualitative observations. The spatial distribution was tested using Global Moran’s I, yielding a significant positive index of 0.4913 (z = 9.54,
p < 0.0001), confirming that the identified risk clusters follow statistically significant geographical patterns. Additionally, the stability of attention weights was verified across 10 independent runs with a low standard deviation (
< 0.02), and an external permutation test identified ’Healthcare Capacity per Capita’ as the primary risk driver (Importance: 0.454). These quantitative results ensure that the model’s spatial representations are both statistically grounded and stable. The consistency of the attention mechanism across different data views is further detailed in
Table 5.
6. Discussion and Limitations
Despite the high predictive accuracy of MV-RiskNet, this study has several limitations. First, the feature space is relatively compact, consisting of approximately ten key indicators. While these variables were strategically selected based onarios in epidemiological literature, they may not capture all latent socio-economic complexities. Second, mobility is represented by a single proxy variable derived from regional transportation infrastructure. Although our ablation study confirms its significance, future research could benefit from more granular data, such as real-time GPS-based movement or flight connectivity, to further refine the mobility view. Lastly, due to the multi-national nature of the data sources and varying privacy regulations, the dataset is currently managed under restricted access; however, a de-identified version is intended for public release to support reproducibility.
To rigorously evaluate the architectural advantages of MV-RiskNet, we compared its performance against Random Forest (RF), which serves as a robust baseline for tabular datasets. As presented in
Table 2, while the RF model achieved a respectable accuracy of 83.33% and an F1-score of 83.32%, it remained significantly behind the proposed MV-RiskNet (97.20%). This performance gap (13.87%) demonstrates that standard machine learning models, which treat features as independent vectors, fail to capture the complex spatial correlations and inter-view dependencies inherent in cross-border epidemiological data. The superior performance of MV-RiskNet justifies the integration of GCN layers for spatial adjacency and attention mechanisms for sophisticated feature fusion.
Despite the high predictive accuracy (97.20%) achieved by the proposed framework, several limitations must be acknowledged. First, the feature space is relatively compact, consisting of ten key indicators. Although these variables were strategically selected, not all latent socio-economic complexities may be fully captured. Second, mobility is represented by a single proxy variable derived from regional transportation infrastructure. While its significance is confirmed via ablation studies, the use of more granular data, such as real-time movement patterns, is suggested for future refinements. Furthermore, the performance of MV-RiskNet is partly attributed to the CGAN-based data balancing. These constraints are explicitly stated to ensure a balanced assessment of the model’s superiority and to prevent the overestimation of its performance in diverse geographic contexts.
Despite the high predictive accuracy (97.20%) achieved by the proposed framework, several limitations must be acknowledged. First, the feature space is relatively compact, consisting of ten key indicators. Although these variables were strategically selected based on established epidemiological literature, not all latent socio-economic complexities may be fully captured. Second, mobility is represented by a single proxy variable derived from regional transportation infrastructure. While its significance is confirmed via ablation studies, the use of more granular data, such as real-time movement patterns or flight connectivity, is suggested for future refinements.
To rigorously evaluate the architectural advantages of MV-RiskNet, the performance was benchmarked against Random Forest (RF), which serves as a robust baseline for tabular datasets. As presented in
Table 2, while the RF model achieved an accuracy of 83.33% and an F1-score of 83.32%, it remained significantly behind the proposed MV-RiskNet. This 13.87% performance gap demonstrates that standard machine learning models, which treat features as independent vectors, fail to capture the complex spatial correlations and inter-view dependencies inherent in cross-border epidemiological data.
Furthermore, regarding computational efficiency, the proposed framework is optimized for deployment in resource-constrained environments. The training phase is completed in a short duration (approximately 15 min on a standard GPU), and the inference process requires minimal power with negligible latency (less than 0.05 s per region). This ensures that the system can be integrated into local public health infrastructures without the need for high-performance computing (HPC) resources. Lastly, due to the multi-national nature of the data sources, the dataset is currently managed under restricted access; however, a de-identified version is intended for public release to support reproducibility.
7. Conclusions
A deep learning model named MV-RiskNet is proposed for reliably estimating regional epidemic risk from multi-source epidemic data in this study. The proposed MV-RiskNet model minimizes view-based information loss by processing demographic, mobility, and geographic data views in separate computational pipelines and combines these views into a common representation space using a learnable attention-based fusion mechanism. The significant class imbalance observed in the dataset, particularly the limited sample representation of the high-risk category, is addressed by CGAN-based synthetic data generation, and the model’s discriminatory capacity is significantly enhanced.
Experimental studies have shown that the proposed model outperforms both feature-based methods (MLP, CNN, LSTM, AE-CLS) and graph-based GCN models. The proposed MV-RiskNet model with CGAN outperformed other models, achieving 97.22% accuracy and 97.40% F1-score. ROC-AUC analyses revealed extremely high class discrimination with AUC values of 1.000 for the low-risk class and 0.997 for the medium- and high-risk classes. Training curves demonstrate the model’s robust learning process, demonstrating fast convergence, low overfitting, and stable validation performance.
Geographic risk maps demonstrate that the model successfully captures the spatial context. Clustering patterns in high-risk areas exhibited a distribution consistent with epidemiologically significant factors such as population density, transportation axes, and mobility corridors. This spatial consistency demonstrates that the model not only provides high classification accuracy but also possesses a learning mechanism capable of representing real-world epidemiological patterns.
The current study has several limitations. The heterogeneity of the data sources used and the limited spatial/temporal resolution of some indicators may limit the model’s ability to perform more detailed risk assessments in certain contexts. Furthermore, assuming relationship structures are largely fixed prevents direct integration of epidemic dynamics over time.
Future work could focus on modeling epidemic risk with a temporal dimension by leveraging mobility and health infrastructure data with finer spatial resolution, developing multi-view models augmented with hyper-correlated graphical representations, and integrating spatiotemporal deep learning approaches. Furthermore, integrating the proposed approach into early warning systems, regional risk management platforms, and policy simulation tools offers significant potential to expand the practical application of the method.
In conclusion, it is demonstrated that the combined use of multi-view representation and synthetic data enrichment provides high accuracy, strong discrimination, and spatial consistency in epidemic risk estimation. The proposed MV-RiskNet model with CGAN stands out as a holistic approach that can generate meaningful and actionable risk indicators from multi-source epidemic data and provide concrete contributions to regional-level decision-making processes.