Spatio-temporal Occurrence Modeling of Highly Pathogenic Avian Influenza Subtype H5n1: a Case Study in the Red River Delta, Vietnam

Highly Pathogenic Avian Influenza (HPAI) subtype H5N1 poses severe threats to both animals and humans. Investigating where, when and why the disease occurs is important to help animal health authorities develop effective control policies. This study takes into account spatial and temporal occurrence of HPAI H5N1 in the Red River Delta of Vietnam. A two-stage procedure was used: (1) logistic regression modeling to identify and quantify factors influencing the occurrence of HPAI H5N1; and (2) a geostatistical approach to develop monthly predictive maps. The results demonstrated that higher average monthly temperatures and poultry density in combination with lower average monthly precipitation, humidity in low elevation areas, and Ninh Binh are areas with higher probability of occurrence of HPAI H5N1.


Introduction
Highly Pathogenic Avian Influenza (HPAI) subtype H5N1 has posed a global threat to both animal and human populations [1].The HPAI H5N1 virus has caused unpredictable epidemics over the years.Understanding the mechanism affecting the occurrence of HPAI H5N1 plays a vital role in the prevention, control and eradication of the disease.However, this mechanism has not been fully understood in a scientific sense [2,3].In Vietnam, the occurrence and spread of HPAI H5N1 were thought to be associated with either increases in poultry production, trade and movement of live poultry before, during and after Tet (the national lunar new year celebration), or the expansion of free-range duck farming where domestic ducks came into contact with rice paddy fields during the rice crop harvest season [4,5].However, temporal distribution of the disease in Thailand showed this same trend, even though movement of poultry in high-risk areas was restricted [6,7].In addition, the US Centers for Disease Control and Prevention (CDC) found the HPAI H5N1 virus in live, healthy ducks and geese in Hanoi in 2001, two years prior to the disease occurrence [8].Although presently, the virus has been classified as inactive, mechanisms that activated the H5N1 virus and caused disease occurrence are still not clear.
Studies by Wilcox and Gubler (2005) and Wilcox and Colwell (2005) argued that weather variability across space and time, which were largely caused by agricultural intensification, industrialization and urbanization, likely promoted the survival and spread of pathogens in the environment [2,3].These unpredictable changes may directly cause circulation of the avian influenza virus through water birds [9].Although increased minimum temperature in January and decreased lower annual precipitation were respectively found to be associated with HPAI H5N1 in Europe and China [1,10], the mechanism of how space-time dynamics of these factors affected disease occurrence were not investigated.In other words, the variability of weather across space and time in relation with the occurrence of HPAI H5N1 has been overlooked.
This study seeks to understand the relationship between weather variability, as measured in terms of average monthly temperature, precipitation and humidity, and other anthropogenic and physical environmental factors, such as poultry density and elevation in association with the occurrence of the HPAI H5N1 virus.Records of past occurrences are incorporated with environmental and anthropogenic factors to predict probabilities and map the spatio-temporal distribution of HPAI H5N1.The study focuses on the Red River Delta in Vietnam, which was previously defined as a high-risk area for the disease [4,11].

Study Area and Data Sources
The Red River Delta (RRD) (Figure 1) is one of the two largest flood plains in Vietnam with dense population, including the capital city of Hanoi and the main port of Hai Phong.Agriculture and livestock are among the main economic activities, including primary crops-rice, corn, beans and rapeseed crops and poultry, pigs and cows husbandry.Three large waves of HPAI H5N1 outbreaks occurred the region over the period December 2003 to March 2010, corresponding to the first, the third and the fifth waves as defined at the national level, which happened in 2003-2004, 2004-2005 and 2007, respectively.These waves had high degrees of virulence which caused severe damages.By April 2004, HPAI H5N1 affected 57 of the 64 provinces in Vietnam, resulting in 44 million poultry culled which corresponds to approximately 17 percent of the total poultry population in the country [12].Economic losses to poultry producers were extensive and estimated at about 3,000 billion VND [13,14].The disease mainly impacted a large number of the poor households.Both decreases in demand for poultry products and significant declines in market prices due to the disease caused heavy losses to poultry producers [12].Human health also suffered.Several human cases of HPAI H5N1 were reported.By 19 November 2010, a total of 119 human cases were linked to HPAI H5N1, resulting in 59 deaths [15].Data on the occurrences of HPAI H5N1 collected and reported by the Vietnam Department of Animal Health (DAH) were provided through East West Center-National Science Foundation project.Each HPAI H5N1 occurrence was confirmed either by the National Centre for Veterinary Diagnostics or Regional Animal Health Offices by performing haemagglutination inhibition, real-time polymerase chain reaction (PCR), or real-time reverse transcriptase/polymerase chain reaction (RRT-PCR) tests [16].Although disease occurrences were first officially reported in the region at the end of 2003 by DAH, the dates and locations of occurrences were formally reported from the end of March 2004 [4].Therefore, this study focused on the period from the end of March 2004 to the end of February 2009 which included 333 confirmed HPAI H5N1 occurrences in the RRD.
All occurrence data were grouped accordingly to the start month and location at the commune level.Duplicate reports of the disease for the same month and location were discarded, resulting in 277 affected communes.The other 1,967 unaffected communes in the Delta were also considered as statistical units for the analysis.Therefore, the data included 2,244 infected and uninfected communes.Record of disease occurrences was derived for each commune on monthly basis from January to December and coded as 1 if the disease was reported within a month or 0 if there was no disease reported.The data file also included commune codes used for merging with other data of other factors to be analyzed.This step was done using Stata software version 12 (StataCorp LP., College Station, TX, USA).
Weather data on precipitation, humidity and temperature, measured daily from 2003 to 2006 at 30 weather stations throughout the Red River Delta and surrounding areas, were provided by the Hydro-Meteorological Data Center of Vietnam.Ordinary kriging within ArcMap version 10.1 (ESRI, Redlands, CA, USA) with a spherical semi-variogram model which was considered as one of the best approaches for climate data interpolation [17].This technique was performed to interpolate weather data for each month of the year in the Red River Delta and then converted to raster layers, resulting in 36 raster layers representing average monthly precipitation, humidity and temperature.The weather data consist of both spatial and temporal aspects since they vary across location and time.
Other factors included in the study were poultry density at the commune level and elevation, respectively, obtained from the 2006 Vietnam Agricultural Census provided by East West Center-National Science Foundation project and from the Shuttle Radar Topography Mission (SRTM) 90-m resolution Digital Elevation Model (DEM).These two factors contain the spatial aspect only.Figure 2 represents the spatial distribution of poultry density and elevation in the Red River Delta of Vietnam.Poultry density of each commune was divided into 4 groups and coded as 1 for no poultry, 2 for a group of 1 to 25 poultry/ha, 3 for a group of 26 to 100 poultry/ha and 4 for a group of more than 100 poultry/ha.Most of communes were categorized as poultry density group 3 and located inside the delta.The areas with no poultry were in the city center of Hanoi and other provinces such as Hai Duong, Hai Phong and Nam Dinh.Communes in poultry density group 2 were mainly distributed around big cities, mountain and coastline area by the east sea of Vietnam.
Following the idea of Le and Rambo on topographic classification [18], we categorized the Red River Delta topography into 4 groups of elevation less than 5 m, from 5 m to 15 m, from 15 m to 200 m, and above 200 m, respectively, coded from 1 to 4, representing coastal, lowland, midland and upland areas.The Red River Delta was characterized as a flat plain area with low elevation, mostly less than 15 m (Figure 2).The coastal area is located near the east sea of Vietnam, consisting of Hai Duong, Hai Phong, Thai Binh, Nam Dinh and an eastern part of Ninh Binh provinces.For the purpose of data analysis, Geographical Information System (GIS) technology was used to generate centroids from each commune polygon in the Delta, resulting in 2,244 points.The resulting commune centroid-based map was the basis for conducting temporal and spatial analysis (Figure 3).All layers detailing environmental characteristics (average monthly precipitation, average monthly humidity, average monthly temperature and elevation) were then extracted to the commune centroids and then exported to Stata software version 12 to be merged with poultry density data and disease occurrence records using commune codes.Details on variable description and sources were described in Table 1.A multilevel analysis technique was applied by performing a two-stage procedure: (1) logistic regression modeling followed by (2) a geostatistical analysis approach to interpolate results.Overall structure and analytical procedures of the study is shown in Figure 4 below.

Statistical Analysis
In the first stage, logistic regression models were fitted to the values obtained at the commune centroids to investigate critical factors affecting the occurrence of HPAI H5N1.The advantage of the logistic regression model is that it can explain the effects of the explanatory variables on the binary response.The logistic regression results provide odds ratios (ORs), standard errors, p-values and 95% confidence intervals.An OR equal to 1 indicates no relationship between the explanatory variable and the response variable, while an OR greater than 1 indicates a positive relationship while an OR less than 1 represents a negative relationship [20].
Computation of variance inflation factors (VIFs) for each explanatory variable in the model indicates the presence of multicollinearity.The critical threshold value of a VIF is 10.A VIF greater than 10 suggests severe collinearity and should be excluded from the model [20].Goodness-of-fit tests were also performed, reporting Akaike's information criterion (AIC) for each model.The AIC value helps select the optimal statistical model.A smaller AIC value implies a better fit model.The best fit model is indicated by the lowest AIC value [21].This model was then used to estimate predicted probabilities of HPAI H5N1 which were used in the geostatistical analysis performed in the second stage.All statistical analyses were performed by using the Stata software version 12 (StataCorp LP., College Station, TX, USA).

Geostatistical Analysis
The second stage in the analysis examined the spatio-temporal patterns of HPAI H5N1.The fundamental goal was to interpolate monthly-based probability predictions for the occurrence of the disease.The predicted probabilities of HPAI H5N1 obtained in the first stage of the analysis were fitted into the commune centroid-based map.A geostatistical analysis approach considers the physical location of each individual observation in a dataset and its value with respect to one another to interpolate values at unsampled locations.Given that HPAI H5N1 is a contagious infectious disease that is easily transmitted to surrounding areas over time, significant spatial autocorrelation may influence the estimates of the disease risk.The idea suggests that values of points located close together are more similar than those far away.This relationship was calculated through a variogram model which is the core of geostatistical kriging interpolation techniques [22,23].
Bayesian kriging is a useful tool for spatial interpolation in the context of epidemiology [24].Different from other kriging methods such as simple kriging or ordinary kriging which exclude uncertainty in the variogram model by using a fixed variogram parameter, Bayesian kriging considers variogram parameters as random variables and estimates the variogram model directly from data using restricted maximum likelihood (REML).Therefore, uncertainty in the variogram parameters is included in the final estimation [25].The use of a large number of simulations following Markov chain Monte Carlo techniques required in the Bayesian prediction approach yields more accurate predictions [25,26].The outcomes were monthly prediction maps of the occurrence of HPAI H5N1 in the Red River Delta.
The accuracy of Bayesian kriging was validated using the commonly accepted 10-fold cross validation technique [27].The original data were randomly divided into a training subset and a test subset at the proportion of 90%-10%, respectively.Bayesian kriging was performed using the training subset, then was cross-validated using the test subset.Level of prediction errors were then measured by mean absolute percentage error (MAPE): (1) where and are respectively the predicted and actual values at point t.The accuracy of the prediction is the converse of MAPE. (2)

Results
Logistic regression models provided important information about what factors critically affect the occurrence of HPAI H5N1 in the Red River Delta.Based on calculation of the Akaike's information criterion (AIC), it appeared that a model with all factors-average monthly temperature, humidity and precipitation, poultry density and elevation-was the best fitting model in terms of the lowest AIC value.The logistic regression results are shown in Table 2.The variance inflation factor (VIF) values indicated no problem with collinearity in the model.The results revealed that all five factors tested had statistically significant effects on the occurrence of HPAI H5N1 in the Red River Delta.Positive associations with odds ratios of 1.518 and 1.368 were found between the disease occurrence and average monthly temperature and poultry density, while other factors-average humidity, precipitation and elevation had negative relationships with odds ratios of 0.949, 0.975 and 0.716 respectively.The results suggested that higher average monthly temperature and poultry density would likely increase the probability of HPAI H5N1 while lower average monthly humidity, precipitation and elevation would likely increase the probability of occurrence of the disease.
Predicted probabilities of HPAI H5N1 occurrence (Table 3) were estimated from the logistic model.The results suggested that periods from October to January and April to June, contribute to the higher probability of HPAI H5N1.Of which, November, April and January are most vulnerable to disease occurrence (with high predicted probabilities of 0.0663, 0.0467 and 0.0471, respectively as compared to other months.A year average probability is estimated at 0.0102.Table 3. Monthly estimated probability of Highly Pathogenic Avian Influenza (HPAI) H5N1 in the Red River Delta.
All predicted probabilities were fitted to the commune centroid-based map to interpolate predictive spatio-temporal distribution of HPAI H5N1 in the Red River Delta using Bayesian kriging.The monthly prediction maps provided visual insight on the times and locations in which the disease would likely occur across the Delta.Empirical evaluation of mean absolute percentage error and accuracy based on a 10-fold cross-validation technique indicated the performance level of the prediction.Figure 5 representing the results of MAPE and accuracy of the prediction suggested a consistent trend across various months of the year.The cross-validation of the predictive maps showed accuracy around 85% which were considered as high values of the predictive models [1].

Discussion
This study presented a method for combining statistical and spatial analyses to identify significant factors affecting the occurrence of HPAI H5N1 and to produce maps of the monthly probabilities of HPAI H5N1 occurrence in the Red River Delta, Vietnam.The goal was to better understand the factors associated with the disease by addressing questions regarding why, when and where HPAI H5N1 would likely occur.
The logistic regression results suggested that HPAI H5N1 occurrences were significantly related to consistent and predictable circumstances.The key factors increasing the probability of the occurrence for HPAI H5N1 were lower average monthly precipitation, humidity and elevation and higher average monthly temperature and poultry density.These results were consistent with previous studies conducted separately in other regions or countries such as Europe, China, Thailand, and Vietnam, but at different scales.Increased minimum temperature and decreased precipitation in January were associated with a higher incidence of HPAI H5N1 in wild birds in Europe [1], while lower annual precipitation was related to the disease occurrence in mainland China [10].Lower elevation areas were more vulnerable to increased risk of contracting the disease in Europe and Thailand [1,6].Increased duck and chicken density were thought to increase the incidence of the disease in Thailand and Vietnam [4][5][6]11].
The occurrence of HPAI H5N1 could not be explained separately by a single factor, but rather by combinations of the key factors.As noted by Pfeiffer et al. (2007), a single factor, low temperature, would unlikely be associated with the first two waves of HPAI H5N1 which occurred around the Tet holiday, as temperature patterns vary substantially from the North with colder conditions to the South with warmer weather in Vietnam [4].However, the combination of all weather factors poultry density and elevation might explain the situation.Although the temperature in December and January are low (averaging around 16 degree Celsius), these months are characterized by low humidity, low precipitation and high poultry density to meet high demand for poultry for Tet holiday which favor the occurrence of the disease.November and April which were predicted as months with the highest probability of HPAI H5N1 occurrence have the same temperature patterns as in the south.These months are characterized by warmer conditions, lower precipitation and less humidity which appear to be associated with favorable conditions for the virus spread.
The temporal pattern together with geographical location provide interesting insights into the epidemic.Although the temporal aspect of poultry density and elevation was not recorded in the study, their combination with variable weather contributed the varying spatio-temporal distribution of the disease in the Red River Delta.The direct as well as indirect links between weather variability and the occurrence of HPAI H5N1 has been documented [1,9,10].Given the dynamic changes in the key factors over space and time, the probability of HPAI H5N1 occurrence changed accordingly.
The monthly prediction maps in Figure 6 show that the presence of factors contributing to the spread of HPAI H5N1 vary in terms of timing and location.It is noticeable that the region of highest probability, as identified in red, mostly occurs in coastal and lowland provinces throughout all months of the year.However, regions of highest probability and change across the months.As identified in the logistic regression model, weather factors significantly affected the occurrence of HPAI H5N1.Weather patterns in the Red River Delta vary and can be characterized as distinct seasons with Spring, Summer, Autumn and Winter.Figure 7 presents the range of weather patterns by month in the Red River Delta in the period from 2003 to 2006.The variable weather between months contributed to the spatial distribution of the disease.
In January, the predicted probability was high compared to other months with mean probability at 0.0216 (Table 3).The higher probability areas were predicted in provinces to the east and south of Hanoi capital such as Ha Nam, Hung Yen, Bac Ninh and Hai Duong and part of Hai Phong.Although temperature is lower than other months, precipitation and especially, humidity are also low which would likely favor the occurrence of HPAI H5N1.In addition, increased production, movement and trade of live poultry prior to the Tet holiday to meet the high demand for poultry in Hanoi, as noted by Pfeiffer et al. (2007) [4], may be associated with the spread of the H5N1 virus.The predicted spatial distribution is consistent with the first and the second waves of HPAI H5N1 which mostly occurred in Bac Ninh, Hung Yen, Hai Duong and Hai Phong in January.
These provinces still remain at higher risk for HPAI H5N1occurrence as compared to other areas in the Red River Delta in February and March.However, the probability of HPAI H5N1 appears to decrease when the number of poultry significantly declines after the Tet holiday and humidity is exceptionally high (close to 100%).The fifth wave was concentrated in areas where rice was harvested and free-grazing ducks were raised to meet the high demand for duck meat during the summer.These factors are in addition to those associated with weather and represent factors consistent with other studies [4][5][6][7]10,11] explaining the occurrence of HPAI H5N1.
The predicted spatial distribution of HPAI H5N1 shifts in the period from July to October.Starting from the southern areas of the Red River Delta, including Nam Dinh and Thai Binh provinces in July, a zone of high probability moves up North to Hung Yen, Hai Duong and Bac Ninh in August, then expands to the capital of Hanoi and Vinh Phuc province in September.The zone extends to Hai Phong province in October.The probability of HPAI H5N1 occurrence between July and September appears to be lower compared to other months.This period falls in the summer rice cropping which starts in June and ends with harvests in October and November.Hot, humid and rainy weather is typical for the Red River Delta during this time.This type of weather does not support the occurrence and circulation

Figure 1 .
Figure 1.The study area-The Red River Delta of Vietnam.

Figure 2 .
Figure 2. Spatial distribution of poultry density and elevation in the Red River Delta.

Figure 4 .
Figure 4. Data preparation and analytical procedures of the study.

Figure 5 .
Figure 5. Cross-validation for accuracy of Bayesian kriging for predictive distribution of HPAI H5N1 in the Red River Delta.

Figure 6 .
Figure 6.Monthly probability prediction of HPAI H5N1 occurrence in the Red River Delta.

Figure 7 .
Figure 7. Weather patterns by months in the Red River Delta from 2003 to 2006.

Table 1 .
Description and data sources of variables in the study.

Table 2 .
Logistic regression model of HPAI H5N1 in the Red River Delta, Vietnam.