Next Article in Journal
A Deep Learning Framework for Multi-Object Tracking in Space Animal Behavior Studies
Previous Article in Journal
A Novel Weizmannia coagulans Strain WC412 with Superior Environmental Resilience Improves Growth Performance of Mice by Regulating the Intestinal Microbiota
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatiotemporal Risk Assessment of H5 Avian Influenza in China: An Interpretable Machine Learning Approach to Uncover Multi-Scale Drivers

1
College of Veterinary Medicine, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing 100193, China
2
China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing 100193, China
*
Author to whom correspondence should be addressed.
Animals 2025, 15(16), 2447; https://doi.org/10.3390/ani15162447
Submission received: 22 July 2025 / Revised: 13 August 2025 / Accepted: 18 August 2025 / Published: 20 August 2025
(This article belongs to the Section Veterinary Clinical Studies)

Simple Summary

Avian influenza, particularly the H5 subtypes, poses a continuous threat to poultry farming and public health. Predicting where and when outbreaks might occur is difficult, especially in a large and diverse country like China. To tackle this challenge, we developed a smart computer model using artificial intelligence (XGBoost) to analyze over two decades of H5 avian influenza outbreak data. We combined this with multi-source information, including satellite data on poultry density, climate zones, wild bird habitats, and daily weather conditions. Our model successfully identified the key factors driving outbreak risk with high accuracy. We found that the density of poultry farms and specific climate zones are the most important background factors, while daily weather changes act as triggers. Notably, we discovered a surprising interaction: hot summer temperatures, usually considered low-risk, can become a significant danger in areas with many poultry farms. These findings were used to create a high-resolution risk map of China, highlighting hotspots for targeted surveillance. This research provides a valuable tool for developing better early warning systems to protect both animal and human health from the threat of avian influenza.

Abstract

Avian influenza (AI), particularly the H5 subtypes, poses a significant and persistent threat globally. While the influence of environmental factors on AI seasonality is recognized, a comprehensive understanding of the hierarchical and interactive effects of multi-scale drivers in a vast and ecologically diverse country like China remains limited. We developed an interpretable machine learning framework (XGBoost with SHAP) to analyze the spatiotemporal risk of 1800 H5 AI outbreaks in mainland China from 2000 to 2023. We integrated multi-source data, including dynamic poultry density, Köppen climate classifications, Important Bird and Biodiversity Areas (IBAs), and daily meteorological variables, to identify key drivers and quantify their nonlinear and synergistic effects. The model demonstrated high predictive accuracy (5-fold cross-validation R 2 = 0.776). Our analysis revealed that macro-scale ecological contexts, particularly poultry density and specific Köppen climate zones (e.g., Cwa), and strong seasonality were the most dominant drivers of AI risk. We identified significant nonlinear relationships, such as a strong inverse relationship with temperature, and a critical synergistic interaction where high temperatures substantially amplified risk in areas with high poultry density. The final predictive map identified high-risk hotspots primarily concentrated in eastern and southern China. Our findings indicate that H5 AI risk is governed by a hierarchical interplay of multi-scale environmental drivers. This interpretable modeling approach provides a valuable tool for developing targeted surveillance and early warning systems to mitigate the threat of avian influenza.

1. Introduction

Avian influenza (AI) viruses, particularly highly pathogenic (HPAI) H5 subtypes, represent a persistent threat to global food security, wildlife conservation, and public health [1]. Originating from a vast gene pool in wild aquatic birds, these viruses can cause devastating outbreaks in poultry populations, leading to substantial economic losses. In recent years, the scale of highly pathogenic avian influenza (HPAI) epidemics has become unprecedented. The 2021–2022 H5N1 wave, for instance, resulted in over 6227 cases across 37 European countries, surpassing all previous outbreaks in the region. This escalating situation, coupled with sporadic but severe zoonotic infections in humans, has intensified concerns regarding their pandemic potential. The ecology of these viruses is deeply intertwined with their natural reservoirs, domestic poultry systems, and a range of environmental factors [2,3]; therefore, elucidating the drivers of AI transmission remains a critical global priority.
The epidemiology of HPAI is governed by a confluence of drivers operating at multiple scales. The density of poultry populations, as the primary host, is a recognized determinant of risk at both regional and global scales. Likewise, meteorological variables are known to influence viral viability, while wild birds serve as critical reservoirs and dispersal vectors. Despite this knowledge, a robust predictive understanding of H5 AI risk is constrained by several persistent methodological and conceptual limitations [4].
First, the prevalent use of statistical models assuming linearity often fails to capture the complex, nonlinear, and threshold-dependent responses inherent in ecological systems. To overcome these limitations, machine learning and deep learning approaches are increasingly being adopted across various agricultural and ecological domains, from crop monitoring to disease detection [5,6,7]. Second, large-scale analyses have frequently been limited by a reliance on static data, thereby neglecting the dynamic nature of key anthropogenic factors such as poultry production. Most critically, the hierarchical and interactive effects among drivers remain insufficiently characterized [8]. The mechanisms by which stable, macro-scale contexts (e.g., climate zones) modulate the impact of transient, micro-scale triggers (e.g., daily weather fluctuations) are not yet fully elucidated [9,10,11]. A framework capable of simultaneously addressing these challenges is therefore essential for advancing predictive capabilities.
Prior risk-mapping studies in China have already linked HPAI circulation to agro-ecological contexts. In particular, Martin et al. [12] used boosted regression trees and logistic regression to show that clinical outbreaks were mainly associated with chicken density, human population density, and low elevation, whereas risk detected through active surveillance related more strongly to domestic duck density and surface water. Building on this foundational work, our study extends the temporal coverage to 2000–2023, incorporates dynamic (rather than static) poultry density data and detailed Köppen climate classifications, and leverages the XGBoost-SHAP framework to explicitly quantify the nonlinear effects and cross-scale interactions that were previously difficult to interpret.
To address these multifaceted challenges, this study develops and applies an interpretable machine learning framework to comprehensively investigate the drivers of H5 avian influenza risk in China, using over two decades of dynamic, multi-source data. The specific objectives are threefold:
  • To employ a powerful gradient boosting algorithm (XGBoost) [13] to capture complex, nonlinear driver responses and elucidate the hierarchical structure of risk, distinguishing between foundational macro-scale contexts and transient micro-scale triggers;
  • To leverage the SHAP (SHapley Additive exPlanations) framework to explicitly quantify the magnitude and form of both nonlinear relationships and synergistic interactions among key drivers;
  • To synthesize these complex model insights into a practical, high-resolution national risk map to inform targeted interventions.
Through this integrated approach, this study seeks to provide a more nuanced and holistic understanding of H5 AI epidemiology, offering valuable insights for the development of evidence-based surveillance and early warning systems.
The remainder of this paper is structured as follows. Section 2 details the data sources and our interpretable machine learning methodology. Section 3 presents the model’s predictive performance and the key findings regarding the hierarchical, nonlinear, and synergistic nature of the risk drivers. Section 4 discusses the advantages, limitations, and future perspectives of our study, and Section 5 concludes this paper.

2. Materials and Methods

2.1. Avian Influenza Outbreak Data

Our study is based on outbreak data for highly pathogenic avian influenza (HPAI) H5 subtypes in mainland China, obtained from the Food and Agriculture Organization (FAO)’s Global Animal Disease Information System (EMPRES-i). The dataset encompasses all reported events from 25 November 2003 to 11 May 2024. We focused specifically on HPAI H5 due to its significant impact on poultry production and public health.
The integrity of these data is fundamentally underpinned by the international surveillance framework of the World Organisation for Animal Health (WOAH, formerly OIE). As a WOAH-listed disease, the confirmation and reporting of avian influenza follow stringent, standardized procedures. Case confirmation adheres to scientific standards outlined in the WOAH’s Manual of Diagnostic Tests and Vaccines for Terrestrial Animals, which recommends molecular techniques like real-time RT-PCR for definitive virus detection [14]. Once confirmed, member countries are obligated to report outbreaks under the Terrestrial Animal Health Code [15]. This rigorous validation process ensures that the data used in our study are of high quality.
To construct our final analytical dataset, the raw data underwent a multi-step preprocessing pipeline. First, we spatially filtered the records to retain only those within mainland China and temporally selected events up to 11 May 2024. This resulted in a dataset exclusively comprising the following serotypes: H5N1, H5N2, H5N3, H5N6, and H5N8 HPAI. We then aggregated these closely related subtypes to model the overall HPAI H5 risk landscape. An outbreak event was defined as one or more confirmed cases at a unique location on a specific day; accordingly, case counts for reports sharing the same coordinates and date were summed. The “observation date” was used as the primary temporal marker for each event, as it most closely reflects the timing of field detection. This process yielded a final dataset of 1800 distinct outbreak events, with the aggregated case count serving as the model’s target variable.

2.2. Environmental and Ecological Predictors

A comprehensive set of potential predictors was constructed by integrating multi-source geospatial and meteorological datasets, a common practice in modern ecological studies that increasingly leverages advanced remote sensing technologies for data acquisition [16,17]. All data layers were standardized to a consistent spatial resolution and projected to the WGS 84 coordinate system.

2.2.1. Meteorological Data

Daily meteorological data were sourced from the China Meteorological Administration (CMA), ensuring consistency and authority across the entire study period. To achieve complete temporal coverage, we constructed a continuous time-series dataset by integrating two official sources from the CMA.
For the period prior to 2014, we utilized the China National-Level Ground Meteorological Station Basic Meteorological Elements Daily Value Dataset (V3.0), provided by the CMA’s National Meteorological Information Center [18]. This foundational dataset is exceptionally reliable, as documented in its official evaluation report, which confirms that its data availability rate exceeds 99% and its accuracy rate approaches 100% following a rigorous, multi-stage quality control process [18]. For the period from 2014 onward, where a consolidated V3.0-style dataset was not yet available, we obtained data directly from the CMA’s operational data sharing service. This service provides real-time and recent historical data from the same network of national-level ground stations, ensuring methodological continuity and data compatibility with the earlier V3.0 dataset.
The final variables included daily mean temperature (°C), daily mean atmospheric pressure (hPa), and cumulative daily precipitation (mm). To ensure that these data accurately reflected local conditions, a boundary-based matching approach was employed. Each outbreak point was first located within its respective smallest administrative unit (e.g., county or district), and subsequently, data from the official weather station operating within that same administrative unit were assigned to the point. To capture potential lagged effects, data were extracted for both the outbreak date and the three preceding days.

2.2.2. Poultry Density Data

To account for the distribution and density of primary domestic hosts, a critical driver of AI transmission, we incorporated dynamic poultry density data from the Gridded Livestock of the World (GLW), a globally recognized, peer-reviewed dataset developed by the Food and Agriculture Organization (FAO) [19,20]. The GLW provides standardized, high-resolution (approx. 10 km) estimates of livestock populations, generated using a sophisticated Random Forest modeling approach based on the most recently compiled subnational census data [20]. This makes it a highly suitable and authoritative source for our large-scale analysis. We specifically focused on the combined density of chickens and ducks, as they are the most significant poultry species implicated in the epidemiology of H5 avian influenza in China [19].
To capture the spatiotemporal evolution of poultry farming across our long study period, we utilized the available GLW raster layers for the years 2010, 2015, and 2020. A dynamic temporal matching approach was employed: each outbreak event was matched to the most temporally proximate GLW map. For instance, an outbreak occurring between 2010 and 2012 was matched to the 2010 map, while an outbreak in 2018 was matched to the 2020 map. This method represents a significant improvement over using a single, static host distribution map, as it better reflects the changing landscape of poultry production over time.
For each matched outbreak location, the corresponding poultry density value (heads/birds per km2) was extracted. Given that poultry density data are often characterized by a highly skewed distribution, a log-plus-one transformation ( l o g ( x + 1 ) ) was applied. This standard procedure normalizes the variable’s distribution and stabilizes its variance, thereby improving its performance and interpretability within the machine learning model.

2.2.3. Köppen–Geiger Climate Classification Data

To characterize the macro-climatic context of each outbreak, a high-resolution (1 km) Köppen–Geiger climate classification map was utilized [21]. The specific dataset employed was the latest version released in 2023, which provides climate zone projections for 1901–2099 based on constrained CMIP6 data. Each outbreak coordinate was spatially joined with these climate polygons to assign a specific classification code (e.g., “Dwa”, “Cfa”), thereby embedding the long-term climatic background as a categorical predictor in the model.

2.2.4. Important Bird and Biodiversity Area (IBA) Data

The potential influence of wild bird reservoirs was represented using geospatial data of Important Bird and Biodiversity Areas (IBAs) from BirdLife International (version updated March 2025). Two distinct features were engineered from this dataset:
  • A binary variable was created to indicate whether an outbreak occurred directly within an IBA polygon;
  • For outbreaks located outside these zones, the Euclidean distance (in kilometers) to the boundary of the nearest IBA was calculated.

2.3. Feature Engineering and Selection

A comprehensive set of predictors was engineered, grounded in established biological and ecological principles of AI transmission. The cornerstone of our feature engineering was the creation of a biologically informed temporal structure. A substantial body of evidence indicates that the typical incubation period of HPAI H5 viruses in poultry ranges from 1 to 3 days [14]. Based on this critical window of infection, we expanded each outbreak event into three parallel records representing the meteorological conditions at 1, 2, and 3 days prior. This fixed-lag approach was deliberately chosen for its direct biological interpretability. While more complex methods like Distributed Lag Nonlinear Models (DLNMs) exist for exploring a wider range of lag–response relationships [22], our focused 1–3 day window provides a clear and targeted test of the hypothesis that short-term environmental triggers during the incubation period are key drivers of outbreaks. A suite of features was derived from this structure, including lagged weather variables, weather dynamics indicators, and spatiotemporal controls, to capture a variety of environmental pressures. The justification for including these variable categories is detailed in Table 1.
This entire feature engineering process is formally summarized by Equation (1):
x i , t k = Env ( s i , t i k ) , k = 1 , 2 , 3
Here, x i , t k represents the complete feature vector for the i-th outbreak, which occurred at location s i and time t i . The vector captures the environmental conditions lagged by k days. The function Env ( s , t ) generates the full set of environmental predictors for a given location s and time t. As specified, the lag variable k covers the 1, 2, and 3 days corresponding to the viral incubation period.
Following feature engineering, a multi-stage feature selection process was implemented to derive a parsimonious and mechanistically interpretable model. This process was designed to systematically reduce dimensionality while retaining predictors with the most significant and biologically plausible influence. These predictors include three main categories: foundational lagged variables Equation (2), a suite of weather dynamics indicators Equation (3), and cyclical temporal controls Equation (4). The pipeline began with an ecological stratification of all predictors into macro-, meso-, and micro-scale categories to delineate their hierarchical scales of influence. Subsequently, an ensemble ranking approach was employed to robustly evaluate feature importance and mitigate the bias of any single algorithm. This ensemble combined machine learning-based importance (Random Forest) to capture nonlinear contributions with a traditional statistical test (Spearman’s rank correlation, p < 0.05 ) to assess monotonic relationships. The final subset of 26 predictors was determined by integrating these rankings with domain knowledge on biological plausibility. The robustness of this feature set was further confirmed through a bootstrap stability assessment, ensuring the selected variables possess not only statistical significance but also plausible mechanistic links to AI transmission.
The specific mathematical definitions for the engineered feature categories are as follows. The foundational lagged variables, representing the mean temperature (T, in °C), mean atmospheric pressure (P, in hPa), and cumulative precipitation (R, in mm), are given by
{ T t k , P t k , R t k } , k { 1 , 2 , 3 }
The weather dynamics and stability indicators were defined for any meteorological variable X { T , P , R } .
The cyclical temporal controls were defined for month m i and day of year d i :
Δ X t = X t X t 1
S t X = 1 1 + | Δ X t |
X ¯ t ( 3 ) = 1 3 j = 0 2 X t j
The cyclical temporal controls were defined for month m i and day of year d i :
MonthSin i = sin 2 π m i 12
MonthCos i = cos 2 π m i 12
DOYSin i = sin 2 π d i 365
DOYCos i = cos 2 π d i 365

2.4. Interpretable Machine Learning Model

To model the complex, nonlinear relationships between the multi-scale predictors and HPAI H5 outbreak counts, we employed XGBoost (Extreme Gradient Boosting), a powerful and widely used machine learning algorithm [13]. The utility of such advanced algorithms has been demonstrated in a wide range of smart agriculture applications, tackling complex tasks like phenotype analysis and automated weed control [27,28]. XGBoost is an ensemble method that builds a predictive model in the form of a collection of decision trees, which are added sequentially to correct the errors of the previous ones. This gradient boosting framework is highly effective at capturing complex interactions and nonlinear patterns without prior assumptions about the underlying functional form.
Our modeling approach is rooted in the principles of a Generalized Linear Model (GLM), specifically a Poisson regression, which is suitable for count data like outbreak events. The model links the set of predictors for the i-th observation, x i , to the expected outbreak count, λ i , via a logarithmic link function. This relationship is expressed as
ln ( λ i ) = f ( x i ) or equivalently λ i = exp ( f ( x i ) )
Here, the crucial difference from a standard GLM is that f ( x i ) is not a simple linear combination of predictors. Instead, f ( x i ) represents the complex, nonlinear function learned by the ensemble of XGBoost trees. It is the sum of the predictions from all the individual trees in the model.
To validate the choice of XGBoost, its performance was benchmarked against two traditional statistical models: a Panel Data model with fixed effects (PanelOLS) and a Gaussian Generalized Linear Model (GLM). All models were trained and evaluated on the same set of predictors using an identical 5-fold cross-validation scheme to ensure a fair comparison. As shown in Table 2, the XGBoost model achieved a mean R 2 of 0.776, substantially outperforming both the PanelOLS ( R 2 = 0.458) and the GLM ( R 2 = 0.257).This significant performance gap underscores the necessity of a machine learning approach capable of capturing the complex nonlinearities and interactions inherent in the ecological drivers of avian influenza, which traditional models fail to adequately address.
To interpret the outputs of this model and understand the contribution of each predictor to f ( x i ) , we integrated the SHAP (SHapley Additive exPlanations) framework [29]. All data processing, modeling, and visualization were conducted using Python (version 3.9.0). The machine learning model was implemented with the XGBoost library (version 2.1.4), and model interpretation was performed using the SHAP library (version 0.48.0). Key libraries for data manipulation and analysis included Pandas (version 2.2.3), NumPy (version 1.23.0), and Scikit-learn (version 1.6.1). Geospatial data were handled with GeoPandas (version 1.0.1), and figures were generated using Matplotlib (version 3.9.4) and Seaborn (version 0.13.2).

2.5. Model Validation and Diagnostics

The generalization ability and predictive performance of the final XGBoost model were rigorously evaluated using a 5-fold cross-validation (CV) scheme. We employed a spatial partitioning strategy for the folds to provide a more robust assessment of the model’s capacity to generalize across the diverse geographical contexts of the study area. Model performance was quantified using three standard metrics: the coefficient of determination ( R 2 ), root mean square error (RMSE), and mean absolute error (MAE). The optimal set of model hyperparameters was selected through a systematic tuning process that aimed to maximize the cross-validated R 2 score. The final optimized hyperparameters are detailed in Appendix A.
In addition to performance evaluation, a suite of diagnostic tests was conducted to ensure model stability and integrity. To assess multicollinearity among the predictors, we calculated the Variance Inflation Factor (VIF). To test for spatial autocorrelation in the model’s residuals, which could indicate unexplained spatial patterns, we calculated Moran’s I statistic. The standard formulas for these diagnostics are given in Equation (6):
VIF j = 1 1 R j 2
I = N W · i j w i j ( e i e ¯ ) ( e j e ¯ ) i ( e i e ¯ ) 2
In these diagnostics, R j 2 is the coefficient of determination from regressing predictor j on all other predictors [30]. For Moran’s I, N is the number of spatial units, e i is the residual for unit i, e ¯ is the mean of the residuals, w i j is the spatial weight between units i and j, and W is the sum of all weights [31]. These diagnostics confirmed the statistical robustness of our final model.

3. Results

3.1. Descriptive Analysis: Spatiotemporal Overview of Outbreaks

A total of 1800 H5 subtype avian influenza outbreaks were recorded in mainland China during the study period (Table 3). The spatiotemporal distribution of these events was highly heterogeneous. Spatially, outbreaks were predominantly concentrated in the eastern and southern provinces, particularly within the humid subtropical (Cfa) and monsoon-influenced humid subtropical (Cwa) Köppen climate zones (Figure 1). Temporally, the outbreaks exhibited strong seasonality, with incidence peaking consistently during winter and spring (December–March), and showed significant inter-annual variability with distinct epidemic waves apparent during 2004–2006 and 2014–2017 (Figure 2).
An examination of the outbreak characteristics in Table 3 reveals that the number of cases per event was highly right-skewed, with a median of 1 but a mean of 5.4 (SD = 10.2), indicating that most reported events were small despite occasional large-scale outbreaks (max = 108). The outbreaks predominantly occurred under cool conditions, with a mean ambient temperature of 7.5 °C (interquartile range [IQR]: −0.3 to 15.6 °C). Furthermore, a clear spatial relationship with wild bird habitats was evident, with 50% of outbreaks occurring within 41.2 km of an Important Bird and Biodiversity Area (IBA) and 75% occurring within 85.3 km.

3.2. Predictive Performance

The predictive performance and generalization ability of the final XGBoost model were robustly evaluated through a 5-fold cross-validation scheme [27]. The model demonstrated a high degree of predictive accuracy, achieving a mean coefficient of determination ( R 2 ) of 0.776, indicating that the selected predictors explained approximately 77.6% of the variance in H5 AI outbreaks (Table 4). Notably, the R 2 score for each of the five folds consistently exceeded the performance target of 0.7.
The model’s stability was underscored by the low standard deviation of the R2 scores across the folds (SD = 0.065) and was visually confirmed by the narrow interquartile range in the boxplot (Table 4, Figure 3A). In terms of prediction error, the model yielded low mean values for both the root mean square error (RMSE) of 0.604 and the mean absolute error (MAE) of 0.306, which remained consistently low across all individual folds (Table 4, Figure 3B). Furthermore, rigorous overfitting controls employed during the hyperparameter optimization process resulted in an average training-test R2 gap of only 0.082, confirming the model’s excellent generalization capability. A visual inspection of the cross-validated predicted versus observed values is expected to further confirm this high performance, with points clustering tightly around the line of perfect agreement (Figure 3C).

3.3. Identification of Key Drivers of Avian Influenza Risk

The hierarchy of risk drivers for H5 avian influenza was identified by ranking all predictors based on their mean absolute SHAP values, which quantify their overall contribution to the XGBoost model’s predictions (Figure 4). The analysis revealed a clear hierarchical structure, where a small number of macro-scale contextual and seasonal factors emerged as the dominant drivers of risk [5,6]. The Köppen_Cwa (monsoon-influenced humid subtropical) climate zone was the single most influential predictor (mean |SHAP| = 0.052), followed closely by the primary seasonality feature, Day_of_year_sin (mean |SHAP| = 0.050). Other features representing the long-term environmental context and annual cycles, such as Daily_pressure, Day_of_year, and other Köppen classifications (Köppen_Cwb, Köppen_BWk), also ranked among the top predictors. This underscores the foundational role of the baseline geographical landscape and strong seasonal periodicities in determining an area’s endemic risk level.
While the macro-scale context sets the stage, meso-scale ecological factors and micro-scale meteorological variables act as crucial secondary and modulating influences (Figure 4). Key meso-scale factors related to host and wild bird interfaces, including Poultry_density (mean |SHAP| = 0.021) and Distance_to_IBA_km (mean |SHAP| = 0.019), were identified as significant predictors, although they had a smaller magnitude of contribution compared to the top-tier drivers. Micro-scale weather variables representing short-term environmental stress, such as lagged temperature (Temperature_lag3day, Temperature_lag1day) and indicators of atmospheric instability (Pressure_change), were also found to be important contributors to risk. This multi-scale structure suggests that while a region’s climate and the time of year establish a foundational risk level, the actual occurrence of an outbreak is triggered by the interplay of more dynamic, localized factors.

3.4. Nonlinear Effects and Risk Windows of Key Drivers

To elucidate how the identified key drivers modulate AI risk, their marginal effects were examined using SHAP dependence plots (Figure 5). The analysis revealed clear nonlinear patterns for several key predictors, providing insights beyond simple linear correlations [32].
For ambient temperature, a distinct nonlinear trend was observed (Figure 5a). While the risk contribution remained close to zero at moderate and high temperatures (>15 °C), it began to systematically increase as temperatures fell below approximately 10 °C. This risk-enhancing effect of cold was most pronounced in the −10 °C to 0 °C range. The greater variance in SHAP values at these low temperatures suggests that the magnitude of this effect is likely dependent on other interacting factors.
Proximity to IBAs exhibited a clear negative relationship with risk (Figure 5b). The risk contribution was highest and most variable at distances less than 20 km, indicating that while close proximity is a significant risk factor, its impact is highly context-dependent. As the distance from an IBA increased, the risk contribution steadily decreased, with the effect diminishing and stabilizing beyond approximately 50–100 km.

3.5. Synergistic Effects Among Key Drivers

Beyond the individual nonlinear effects, our analysis delved into the synergistic interactions between predictors using SHAP interaction values. This revealed that key drivers often do not act in isolation but rather that their impacts are contingent on the context provided by other variables.
A particularly strong synergistic effect was identified between ambient temperature and poultry density (Figure 6), demonstrating that the influence of temperature on risk is highly dependent on host density. Specifically, within the 0–25 °C range, the marginal effect of temperature on risk was negligible regardless of poultry density. However, a powerful positive interaction emerged at temperatures above 30 °C. In these high-temperature conditions, the risk contribution from temperature in high-density scenarios increased sharply to a SHAP value approaching 0.3 while remaining near 0 in low-density settings (Figure 6). Conversely, no such synergistic amplification was observed in the low-temperature range (<0 °C). This suggests that while cold temperatures act as a general risk factor (as shown in Figure 5a), their effect is not amplified by high poultry density in the same way that high temperatures are. This finding reveals a critical mechanism: high host density acts as an amplifier, transforming otherwise low-risk, high-temperature conditions into a significant driver of H5 outbreaks.
To provide a comprehensive overview of all pairwise interactions, a SHAP interaction heatmap was generated (Figure 7). This visualization systematically maps the interaction strength between all key predictors. The heatmap highlights several notable synergies beyond the temperature–poultry relationship. For instance, strong interactions are visible between certain Köppen climate zones, such as Köppen_Dwc and Köppen_ET, and between climate zones and ecological factors like Distance_to_IBA_km. These complex interdependencies, revealed by the heatmap, underscore the importance of a holistic, context-aware approach to risk assessment, as the impact of any single driver is often modulated by the broader environmental and ecological setting.

3.6. High-Resolution Spatiotemporal Risk Mapping

To synthesize the model’s findings into a practical decision-support tool, the trained XGBoost model was applied to a high-resolution nationwide grid, generating a predictive baseline risk map for H5 avian influenza across mainland China (Figure 8). The map reveals a distinct and highly heterogeneous geographical distribution of risk. High-risk areas are prominently concentrated in eastern and southern China, forming several key hotspots. These include the Yangtze River Delta, the Pearl River Delta, the Sichuan Basin, and the regions surrounding major lake systems such as Poyang Lake and Dongting Lake.
The spatial patterns delineated on the risk map show strong correspondence with the key drivers identified in the feature importance analysis. The predicted high-risk zones largely overlap with areas characterized by a convergence of multiple top-ranked risk factors: high poultry density, humid temperate/subtropical climate conditions (e.g., the Cfa and Cwa Köppen zones), and proximity to IBAs (cf. Figure 4) [21,33]. Conversely, western and northern regions, such as the Tibetan Plateau and the arid areas of Xinjiang and Inner Mongolia, which lack this combination of risk factors, consistently exhibit the lowest predicted risk levels. The map therefore serves as a visual confirmation of the hierarchical and interactive driver structure identified by the model.

4. Discussion

This study applied an interpretable machine learning framework to assess the spatiotemporal risk of HPAI H5 avian influenza in mainland China. While previous studies have provided significant insights into HPAI dynamics, understanding the complex, multi-scale interplay of drivers across large, heterogeneous areas remains a challenging area of research [34]. Our work contributes to this field by analyzing over two decades of dynamic, multi-source data with an XGBoost-SHAP framework. This approach yielded a model with strong predictive performance and, more importantly, offered a way to explore the hierarchical, nonlinear, and synergistic relationships that may govern disease risk. The analysis suggests a potential hierarchical structure of risk drivers, explores nonlinear exposure–response patterns, and identifies possible synergistic effects. These findings were synthesized into a high-resolution risk map, which may offer…which may offer valuable insights for the planning of surveillance and control activities.

4.1. Advantages of the Study

A primary finding from our model is the apparent importance of the macro-scale ecological landscape in shaping HPAI H5 risk [35]. Our analysis identified poultry density and specific Köppen climate zones as predictors with high feature importance, suggesting that the baseline risk of an outbreak may be strongly influenced by the geographical and agricultural context [21]. This observation is consistent with a substantial body of literature that has frequently identified high poultry density as a key factor in HPAI H5N1 amplification and spread in various settings [23]. Our work contributes to this by illustrating how this baseline risk appears to be modulated by meso-scale factors, such as the interface with wild birds proxied by IBAs [24], and potentially triggered by micro-scale meteorological events. A significant aspect of our study is the characterization of complex, nonlinear relationships. For instance, our model identified a distinct nonlinear relationship between ambient temperature and AI risk (Figure 5a). The risk contribution remained negligible at moderate to high temperatures but increased substantially as temperatures dropped into colder ranges (e.g., below 10 °C). This finding aligns with existing laboratory and ecological evidence on enhanced viral survival and stability in cold conditions [26,36]. Similarly, the non-monotonic effect of poultry density suggested by our analysis, which indicates a potential plateauing of risk at very high densities, provides a data-driven hypothesis that moves beyond a simple “more is worse” assumption. Such nonlinear insights, if further explored, could be valuable for refining targeted and resource-efficient control strategies. The comprehensive interaction heatmap (Figure 7) further revealed a landscape of complex interdependencies, such as those between different climate zones and ecological factors, underscoring the necessity of a context-aware approach to risk assessment.
Furthermore, our analysis suggests that these drivers may not operate in isolation but could exhibit significant synergistic effects, a concept of growing importance in ecological modeling [37]. A notable interaction identified by our model was between ambient temperature and poultry density (Figure 6). The results point towards a potential compound risk mechanism: high temperatures (e.g., >30 °C), which independently show a negligible association with risk, may act as a significant risk amplifier when they co-occur with high poultry density. This observation, if validated by further mechanistic or field studies, could be particularly important as it might challenge conventional assumptions about risk seasonality in certain high-density poultry farming systems [38,39]. It suggests a testable hypothesis that during summer months, surveillance resources could be more efficiently allocated by prioritizing regions with a high concentration of poultry. This ability to uncover and quantify potential interactions is a key feature of our interpretable machine learning approach.

4.2. Limitations

Nevertheless, our study is subject to several limitations that warrant acknowledgment. First, as an ecological study, its findings are subject to the ecological fallacy, and associations observed at a regional scale may not directly translate to individual farms. Second, the outbreak data, while being the best available, may be influenced by regional variations in surveillance intensity and reporting practices, a common challenge in large-scale epidemiological analyses [40,41,42]. Third, the predictor variables, while comprehensive, serve as proxies for more complex processes; for instance, “distance to IBA” is a simplification of intricate wild bird migratory patterns.

4.3. Future Perspectives

These limitations highlight valuable avenues for future research. A logical next step could involve integrating a phylodynamic analysis of viral genomic data to trace transmission pathways with greater precision. Moreover, incorporating more dynamic data layers, such as poultry trade networks or real-time migratory bird tracking data, could further refine the model’s predictive capabilities. Our aggregation of HPAI H5 subtypes, while necessary for this analysis, also points to the need for future subtype-specific risk modeling (e.g., for H5N1 vs. H5N6) as more granular data become available [43,44]. Finally, applying this validated modeling framework to future climate and land-use change scenarios could provide important insights for long-term preparedness and policy planning [45].

5. Conclusions

In conclusion, this study reveals that the spatiotemporal risk of H5 avian influenza in China is governed by a distinct hierarchy of multi-scale ecological drivers. The final interpretable machine learning model achieved a high degree of predictive accuracy (5-fold CV R2 = 0.776), demonstrating that stable, macro-scale contexts, such as regional climate zones and poultry density, provide the foundational layer of risk, which is then modulated by transient, micro-scale meteorological triggers. Critically, the analysis moved beyond identifying linear risk factors to uncover a novel synergistic interaction: high ambient temperatures (>30 °C), typically considered low-risk, become a significant risk amplifier in areas with high poultry density. By synthesizing these hierarchical, nonlinear, and synergistic effects into a high-resolution risk map, this work provides an evidence-based tool for developing more targeted surveillance and control strategies. Future research should build upon this framework by integrating more dynamic variables, such as viral genomic data and specific intervention measures, to further enhance predictive accuracy and response planning [26,32,46,47].

Author Contributions

Conceptualization, X.X. and X.W.; methodology, X.W.; software, X.W.; validation, X.W. and Y.X.; formal analysis, X.W.; data curation, X.W.; writing—original draft preparation, X.W.; writing—review and editing, X.X. and Y.X.; visualization, X.W.; supervision, X.X.; project administration, X.X.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study because it was based on the analysis of publicly available, aggregated data from the Food and Agriculture Organization (FAO)’s Global Animal Disease Information System (EMPRES-i) and did not involve any new animal or human subjects.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw H5 outbreak data analyzed during the current study are publicly available from the Food and Agriculture Organization’s Global Animal Disease Information System (EMPRES-i). The processed data and the final set of predictor variables generated during this study are available from the corresponding author on reasonable request.

Acknowledgments

This thesis is the culmination of not only my own efforts but also the immense support and encouragement from many individuals to whom I am profoundly indebted. It is with the deepest gratitude that I acknowledge them here. I am, above all, indebted to my supervisor, Xi Xi. His intellectual rigor and profound scholarly insight have been a guiding light throughout this research. I am immensely grateful for his patient mentorship and unwavering encouragement, which have not only shaped this thesis but have also fundamentally influenced my approach to academic inquiry. I would also like to thank Lu Wang for her support during this research. To my dear friends, I offer my deepest thanks. In moments of doubt and difficulty, your unwavering companionship and invaluable counsel were my anchors. Your friendship is one of my life’s most precious treasures. To my family, I am eternally grateful. You have been my unwavering pillar of support and the warm harbor that provided me the tranquility to pursue this work. This accomplishment is as much yours as it is mine. My most profound and personal gratitude is reserved for Haoqi Chu, my partner. Thank you for the countless conversations that sharpened my thinking and the quiet companionship that soothed my anxieties, for walking with me from one high summer to the next.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIAvian Influenza
HPAIHighly Pathogenic Avian Influenza
FAOFood and Agriculture Organization
EMPRES-iGlobal Animal Disease Information System
CMAChina Meteorological Administration
GLWGridded Livestock of the World
IBAImportant Bird and Biodiversity Area
XGBoostExtreme Gradient Boosting
SHAPSHapley Additive exPlanations
CVCross-Validation
RMSERoot Mean Square Error
MAEMean Absolute Error
VIFVariance Inflation Factor

Appendix A. Model Hyperparameters

The final optimized hyperparameters for the XGBoost model, determined through a 5-fold cross-validation and tuning process, are detailed in Table A1.
Table A1. Final optimized hyperparameters for the XGBoost model.
Table A1. Final optimized hyperparameters for the XGBoost model.
HyperparameterFinal Value
n_estimators750
max_depth4
learning_rate0.0875
min_child_weight2
subsample0.9106
colsample_bytree0.8703
reg_alpha0.2717
reg_lambda0.7399
gamma0.0034
scale_pos_weight0.975
max_delta_step0
objective‘count:poisson’
random_state42
n_jobs1

References

  1. Wu, A.Q.; Li, K.L.; Song, Z.Y.; Lou, X.; Hu, P.; Yang, W.; Wang, R.F. Deep Learning for Sustainable Aquaculture: Opportunities and Challenges. Sustainability 2025, 17, 5084. [Google Scholar] [CrossRef]
  2. Yang, Z.X.; Li, Y.; Wang, R.F.; Hu, P.; Su, W.H. Deep Learning in Multimodal Fusion for Sustainable Plant Care: A Comprehensive Review. Sustainability 2025, 17, 5255. [Google Scholar] [CrossRef]
  3. Geier, D.A.; Kern, J.K.; Geier, M.R. A Longitudinal Ecological Study of Seasonal Influenza Deaths in Relation to Climate Conditions in the United States from 1999 through 2011. Infect. Ecol. Epidemiol. 2018, 8, 1474708. [Google Scholar] [CrossRef] [PubMed]
  4. Su, W.; Liu, T.; Geng, X.; Yang, G. Seasonal Pattern of Influenza and the Association with Meteorological Factors Based on Wavelet Analysis in Jinan City, Eastern China, 2013–2016. PeerJ 2020, 8, e8626. [Google Scholar] [CrossRef] [PubMed]
  5. Yang, Z.Y.; Xia, W.K.; Chu, H.Q.; Su, W.H.; Wang, R.F.; Wang, H. A comprehensive review of deep learning applications in cotton industry: From field monitoring to smart processing. Plants 2025, 14, 1481. [Google Scholar] [CrossRef]
  6. Wang, R.F.; Su, W.H. The application of deep learning in the whole potato production Chain: A Comprehensive review. Agriculture 2024, 14, 1225. [Google Scholar] [CrossRef]
  7. Wang, Z.; Zhang, H.W.; Dai, Y.Q.; Cui, K.; Wang, H.; Chee, P.W.; Wang, R.F. Resource-Efficient Cotton Network: A Lightweight Deep Learning Framework for Cotton Disease and Pest Classification. Plants 2025, 14, 2082. [Google Scholar] [CrossRef]
  8. Choi, Y.W.; Tuel, A.; Eltahir, E.A.B. On the Environmental Determinants of COVID-19 Seasonality. Geohealth 2021, 5, e2021GH000413. [Google Scholar] [CrossRef]
  9. Gass, J.D.; Hill, N.J.; Damodaran, L.; Naumova, E.N.; Nutter, F.B.; Runstadler, J.A. Ecogeographic Drivers of the Spatial Spread of Highly Pathogenic Avian Influenza Outbreaks in Europe and the United States, 2016-Early 2022. Int. J. Environ. Res. Public Health 2023, 20, 6030. [Google Scholar] [CrossRef]
  10. The Lancet Infectious Diseases. What is the Pandemic Potential of Avian Influenza A(H5N1)? Lancet Infect. Dis. 2024, 24, 437. [Google Scholar] [CrossRef]
  11. Zhu, H.; Qi, F.; Wang, X.; Zhang, Y.; Chen, F.; Cai, Z.; Chen, Y.; Chen, K.; Chen, H.; Xie, Z.; et al. Study of the Driving Factors of the Abnormal Influenza A (H3N2) Epidemic in 2022 and Early Predictions in Xiamen, China. BMC Infect. Dis. 2024, 24, 1093. [Google Scholar] [CrossRef]
  12. Martin, V.; Pfeiffer, D.U.; Zhou, X.; Xiao, X.; Prosser, D.J.; Guo, F.; Gilbert, M. Spatial Distribution and Risk Factors of Highly Pathogenic Avian Influenza (HPAI) H5N1 in China. PLoS Pathog. 2011, 7, e1001308. [Google Scholar] [CrossRef] [PubMed]
  13. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  14. World Organisation for Animal Health (WOAH). Chapter 3.3.4: Avian influenza (including infection with high pathogenicity avian influenza viruses). In Manual of Diagnostic Tests and Vaccines for Terrestrial Animals; World Organisation for Animal Health (WOAH, Founded as OIE): Paris, France, 2021. [Google Scholar]
  15. World Organisation for Animal Health (WOAH). Terrestrial Animal Health Code, 28th ed.; World Organisation for Animal Health (WOAH, Founded as OIE): Paris, France, 2019. [Google Scholar]
  16. Cui, K.; Tang, W.; Zhu, R.; Wang, M.; Larsen, G.D.; Pauca, V.P.; Alqahtani, S.; Yang, F.; Segurado, D.; Fine, P.; et al. Efficient Localization and Spatial Distribution Modeling of Canopy Palms Using UAV Imagery. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4413815. [Google Scholar] [CrossRef]
  17. Di, X.; Cui, K.; Wang, R.F. Toward Efficient UAV-Based Small Object Detection: A Lightweight Network with Enhanced Feature Fusion. Remote Sens. 2025, 17, 2235. [Google Scholar] [CrossRef]
  18. National Meteorological Information Center (NMIC). Evaluation Report for China National-Level Ground Meteorological Station Basic Meteorological Elements Daily Value Dataset (V3.0); Technical Report; China Meteorological Administration: Beijing, China, 2012. (In Chinese) [Google Scholar]
  19. Gilbert, M.; Nicolas, G.; Cinardi, G.; Vanwambeke, S.; Boeckel, T.P.V.; Wint, G.R.W.; Robinson, T.P. Global distribution data for cattle, buffaloes, horses, sheep, goats, pigs, chickens and ducks in 2010. Sci. Data 2018, 5, 180227. [Google Scholar] [CrossRef]
  20. Nicolas, G.; Robinson, T.P.; Wint, G.R.W.; Conchedda, G.; Cinardi, G.; Gilbert, M. Using Random Forest to Improve the Downscaling of Global Livestock Census Data. PLoS ONE 2016, 11, e0150424. [Google Scholar] [CrossRef]
  21. Si, X.; Wang, L.; Mengersen, K.; Hu, W. Epidemiological Features of Seasonal Influenza Transmission among 11 Climate Zones in Chinese Mainland. Infect. Dis. Poverty 2024, 13, 4. [Google Scholar] [CrossRef]
  22. Gasparrini, A.; Armstrong, B.; Kenward, M.G. Distributed Lag Non-linear Models. Stat. Med. 2010, 29, 2224–2234. [Google Scholar] [CrossRef]
  23. Gilbert, M.; Pfeiffer, D.U. Risk factor modelling of the spatio-temporal patterns of highly pathogenic avian influenza (HPAIV) H5N1: A review. Spatio-Temporal Epidemiol. 2012, 3, 173–183. [Google Scholar] [CrossRef]
  24. Prosser, D.J.; Teitelbaum, C.S.; Yin, S.; Hill, N.J.; Xiao, X. Climate Change Impacts on Bird Migration and Highly Pathogenic Avian Influenza. Nat. Microbiol. 2023, 8, 2223–2225. [Google Scholar] [CrossRef]
  25. Humphreys, J.M.; Ramey, A.M.; Douglas, D.C.; Mullinax, J.M.; Soos, C.; Link, P.; Walther, P.; Prosser, D.J. Waterfowl Occurrence and Residence Time as Indicators of H5 and H7 Avian Influenza in North American Poultry. Sci. Rep. 2020, 10, 2592. [Google Scholar] [CrossRef]
  26. Lowen, A.C.; Steel, J. Roles of Humidity and Temperature in Shaping Influenza Seasonality. J. Virol. 2014, 88, 7692–7695. [Google Scholar] [CrossRef]
  27. Huo, Y.; Wang, R.F.; Zhao, C.T.; Hu, P.; Wang, H. Research on Obtaining Pepper Phenotypic Parameters Based on Improved YOLOX Algorithm. AgriEngineering 2025, 7, 209. [Google Scholar] [CrossRef]
  28. Wang, R.F.; Tu, Y.H.; Li, X.C.; Chen, Z.Q.; Zhao, C.T.; Yang, C.; Su, W.H. An Intelligent Robot Based on Optimized YOLOv11l for Weed Control in Lettuce. In Proceedings of the 2025 ASABE Annual International Meeting, Toronto, ON, Canada, 13–16 July 2025; p. 1. [Google Scholar]
  29. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
  30. Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill/Irwin: Boston, MA, USA, 2005. [Google Scholar]
  31. Moran, P.A.P. Notes on Continuous Stochastic Phenomena. Biometrika 1950, 37, 17–23. [Google Scholar] [CrossRef]
  32. Li, J.; Rao, Y.; Sun, Q.; Wu, X.; Jin, J.; Bi, Y.; Chen, J.; Lei, F.; Liu, Q.; Duan, Z.; et al. Identification of Climate Factors Related to Human Infection with Avian Influenza A H7N9 and H5N1 Viruses in China. Sci. Rep. 2015, 5, 18094. [Google Scholar] [CrossRef] [PubMed]
  33. Luo, H.; Cui, Y.; Yu, W.; Li, G.; Zhao, Q.; Geng, M.; Wang, H.; Ma, W. The Impact of Urbanization in China on Influenza Incidence across Neighboring Cities. J. Infect. 2025, 90, 106370. [Google Scholar] [CrossRef] [PubMed]
  34. Yuan, H.; Kramer, S.C.; Lau, E.H.Y.; Cowling, B.J.; Yang, W. Modeling Influenza Seasonality in the Tropics and Subtropics. PLoS Comput. Biol. 2021, 17, e1009050. [Google Scholar] [CrossRef] [PubMed]
  35. Flahault, A.; de Castaneda, R.R.; Bolon, I. Climate Change and Infectious Diseases. Public Health Rev. 2016, 37, 21. [Google Scholar] [CrossRef]
  36. Yin, J.; Liu, T.; Tang, F.; Chen, D.; Sun, L.; Song, S.; Zhang, S.; Wu, J.; Li, Z.; Xing, W.; et al. Effects of Ambient Temperature on Influenza-like Illness: A Multicity Analysis in Shandong Province, China, 2014–2017. Front. Public Health 2022, 10, 1095436. [Google Scholar] [CrossRef]
  37. Deyle, E.R.; Maher, M.C.; Hernandez, R.D.; Basu, S.; Sugihara, G. Global Environmental Drivers of Influenza. Proc. Natl. Acad. Sci. USA 2016, 113, 13081–13086. [Google Scholar] [CrossRef]
  38. Mahmud, A.S.; Martinez, P.P.; Baker, R.E. The Impact of Current and Future Climates on Spatiotemporal Dynamics of Influenza in a Tropical Setting. PNAS Nexus 2023, 2, pgad307. [Google Scholar] [CrossRef]
  39. Colijn, C.; Soebiyanto, R.P.; Clara, W.; Jara, J.; Castillo, L.; Sorto, O.R.; Marinero, S.; de Antinori, M.E.B.; McCracken, J.P.; Widdowson, M.A.; et al. The Role of Temperature and Humidity on Seasonal Influenza in Tropical Areas: Guatemala, El Salvador and Panama, 2008–2013. PLoS ONE 2014, 9, e100659. [Google Scholar] [CrossRef]
  40. Elsobky, Y.; El Afandi, G.; Abdalla, E.; Byomi, A.; Reddy, G. Possible Ramifications of Climate Variability on HPAI-H5N1 Outbreak Occurrence: Case Study from the Menoufia, Egypt. PLoS ONE 2020, 15, e0240442. [Google Scholar] [CrossRef]
  41. Wang, Z.; Wang, R.; Wang, M.; Lai, T.; Zhang, M. Self-supervised transformer-based pre-training method with General Plant Infection dataset. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China, 18–20 October 2024; pp. 189–202. [Google Scholar]
  42. Wang, R.F.; Qu, H.R.; Su, W.H. From Sensors to Insights: Technological Trends in Image-Based High-Throughput Plant Phenotyping. Smart Agric. Technol. 2025, 12, 101257. [Google Scholar] [CrossRef]
  43. Yan, Z.L.; Liu, W.H.; Long, Y.X.; Ming, B.W.; Yang, Z.; Qin, P.Z.; Ou, C.Q.; Li, L. Effects of Meteorological Factors on Influenza Transmissibility by Virus Type/Subtype. BMC Public Health 2024, 24, 494. [Google Scholar] [CrossRef]
  44. Wagatsuma, K. Effect of Short-term Ambient Temperature Exposure on Influenza A and B Incidence: A Time-series Analysis of Daily Surveillance Data in Kawasaki City, Japan. IJID Reg. 2024, 13, 100479. [Google Scholar] [CrossRef]
  45. Morin, C.W.; Stoner-Duncan, B.; Winker, K.; Scotch, M.; Hess, J.J.; Meschke, J.S.; Ebi, K.L.; Rabinowitz, P.M. Avian Influenza Virus Ecology and Evolution through a Climatic Lens. Environ. Int. 2018, 119, 241–249. [Google Scholar] [CrossRef]
  46. Lindner-Cendrowska, K.; Bröde, P. Impact of Biometeorological Conditions and Air Pollution on Influenza-like Illnesses Incidence in Warsaw. Int. J. Biometeorol. 2021, 65, 929–944. [Google Scholar] [CrossRef]
  47. Soebiyanto, R.P.; Adimi, F.; Kiang, R.K. Modeling and Predicting Seasonal Influenza Transmission in Warm Regions Using Climatological Parameters. PLoS ONE 2010, 5, e9450. [Google Scholar] [CrossRef]
Figure 1. Spatiotemporal distribution of H5 avian influenza outbreaks in China, 2000–2023: H5 avian influenza risk distribution across Köppen climate zones in China.
Figure 1. Spatiotemporal distribution of H5 avian influenza outbreaks in China, 2000–2023: H5 avian influenza risk distribution across Köppen climate zones in China.
Animals 15 02447 g001
Figure 2. Temporal distribution of H5 avian influenza outbreaks, 2000–2023.
Figure 2. Temporal distribution of H5 avian influenza outbreaks, 2000–2023.
Animals 15 02447 g002
Figure 3. Model performance evaluation from 5-fold cross-validation.
Figure 3. Model performance evaluation from 5-fold cross-validation.
Animals 15 02447 g003
Figure 4. Global feature importance of avian influenza risk drivers.
Figure 4. Global feature importance of avian influenza risk drivers.
Animals 15 02447 g004
Figure 5. Nonlinear exposure–response curves for key predictors. (a) SHAP value vs. temperature; (b) SHAP value vs. distance to IBA.
Figure 5. Nonlinear exposure–response curves for key predictors. (a) SHAP value vs. temperature; (b) SHAP value vs. distance to IBA.
Animals 15 02447 g005
Figure 6. Synergistic interaction effect between ambient temperature and poultry density on H5 avian influenza risk. The plot shows the SHAP value for temperature (y-axis) across its range (x-axis), with points colored by poultry density. The clear separation of red (high-density) and blue (low-density) points at high temperatures indicates a strong positive interaction.
Figure 6. Synergistic interaction effect between ambient temperature and poultry density on H5 avian influenza risk. The plot shows the SHAP value for temperature (y-axis) across its range (x-axis), with points colored by poultry density. The clear separation of red (high-density) and blue (low-density) points at high temperatures indicates a strong positive interaction.
Animals 15 02447 g006
Figure 7. Heatmap of pairwise SHAP interaction strengths among key predictors. The color of each off-diagonal cell indicates the strength of the interaction effect between the corresponding pair of features, with brighter, redder colors signifying stronger synergies. The diagonal represents the main effect of each feature.
Figure 7. Heatmap of pairwise SHAP interaction strengths among key predictors. The color of each off-diagonal cell indicates the strength of the interaction effect between the corresponding pair of features, with brighter, redder colors signifying stronger synergies. The diagonal represents the main effect of each feature.
Animals 15 02447 g007
Figure 8. High-resolution predictive risk map of H5 avian influenza in mainland China.
Figure 8. High-resolution predictive risk map of H5 avian influenza in mainland China.
Animals 15 02447 g008
Table 1. Description and justification of predictor variable categories.
Table 1. Description and justification of predictor variable categories.
ScaleCategory and VariablesJustification and Supporting References
Macro-scaleGeographic and Climatic Contexts (e.g., latitude, longitude, Köppen class)Represents stable, large-scale environmental conditions that define baseline AI risk. The Köppen–Geiger classification summarizes long-term climate regimes governing viral persistence [21,23].
Meso-scaleHost and Wild Bird Interface (e.g., poultry density, dist. to IBA)Captures key ecological factors at a regional level. Poultry density is a fundamental determinant of disease amplification, while proximity to IBAs proxies for viral spillover risk from wild birds [23,24,25].
Micro-scaleMeteorological Triggers (e.g., lagged temp., pressure change)Represents transient weather conditions that can trigger outbreaks. Temperature and other factors influence the environmental survival and stability of the influenza virus [23,26].
TemporalSeasonality Controls (e.g., sine/cosine of day of year)Models the well-documented seasonality of avian influenza. These cyclical features allow the model to learn patterns of risk peaking in cooler months in a continuous manner [4].
Table 2. Comparative performance of the XGBoost model against traditional statistical models. All models were evaluated using a 5-fold cross-validation scheme on the same set of predictors. Metrics shown are the mean and standard deviation (SD) across the folds.
Table 2. Comparative performance of the XGBoost model against traditional statistical models. All models were evaluated using a 5-fold cross-validation scheme on the same set of predictors. Metrics shown are the mean and standard deviation (SD) across the folds.
ModelMean R2 (±SD)Mean RMSE (±SD)Mean MAE (±SD)
XGBoost (this study)0.776 (±0.039)0.604 (±0.065)0.306 (±0.060)
PanelOLS (fixed effects)0.458 (±0.052)0.881 (±0.095)0.573 (±0.048)
Gaussian GLM0.257 (±0.061)0.995 (±0.112)0.634 (±0.055)
Table 3. Descriptive statistics for all variables used in the model.
Table 3. Descriptive statistics for all variables used in the model.
VariableNMeanStd. Dev.MedianMinMaxQ25Q75
Cases18005.410.21110812
Mean temperature (°C)18007.510.114.57−14.534.73−0.315.6
Atmospheric pressure ( hPa )1800955.7495.17997.75572.951038.9922.271012.55
Precipitation ( mm )18002.339.1200145.800.1
Lagged temperature (°C)180014.7110.2514−15.533.66.9724.2
Lagged pressure ( hPa )1800952.02109.96998.901038.8922.151011.5
Lagged precipitation ( mm )18005.57109.2200327200.2
Poultry density (log-transformed)18005.952.286.5509.535.327.45
Distance to IBA ( km )180044.837.441.2022516.985.3
Within IBA (binary)18000.0530.22500100
Longitude (°E)1800110.058.98112.2277.27131.47106.41115.97
Latitude (°N)18003062921482532
Month (sin-transformed)1800−0.0050.6150−11−0.50.5
Month (cos-transformed)18000.2020.7630.5−11−0.50.866
Day of year (sin-transformed)1800−0.0410.6040−11−0.5380.448
Day of year (cos-transformed)18000.2080.7680.556−11−0.3740.962
Köppen 14.0 (binary)18000.4750.500101
Köppen 11.0 (binary)18000.2550.43600101
Köppen 7.0 (binary)18000.070.25500100
Std. Dev.: standard deviation; Q25/Q75: 25th and 75th percentiles; IBA: Important Bird and Biodiversity Area. The Köppen variables are one-hot encoded representations of specific climate zones.
Table 4. Model performance metrics from 5-fold cross-validation.
Table 4. Model performance metrics from 5-fold cross-validation.
MetricMeanStd. Dev.
R 2 0.7760.039
RMSE0.610.07
MAE0.3180.009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, X.; Xu, Y.; Xi, X. Spatiotemporal Risk Assessment of H5 Avian Influenza in China: An Interpretable Machine Learning Approach to Uncover Multi-Scale Drivers. Animals 2025, 15, 2447. https://doi.org/10.3390/ani15162447

AMA Style

Wang X, Xu Y, Xi X. Spatiotemporal Risk Assessment of H5 Avian Influenza in China: An Interpretable Machine Learning Approach to Uncover Multi-Scale Drivers. Animals. 2025; 15(16):2447. https://doi.org/10.3390/ani15162447

Chicago/Turabian Style

Wang, Xinyi, Yihui Xu, and Xi Xi. 2025. "Spatiotemporal Risk Assessment of H5 Avian Influenza in China: An Interpretable Machine Learning Approach to Uncover Multi-Scale Drivers" Animals 15, no. 16: 2447. https://doi.org/10.3390/ani15162447

APA Style

Wang, X., Xu, Y., & Xi, X. (2025). Spatiotemporal Risk Assessment of H5 Avian Influenza in China: An Interpretable Machine Learning Approach to Uncover Multi-Scale Drivers. Animals, 15(16), 2447. https://doi.org/10.3390/ani15162447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop