Landscape epidemiology combines both disease ecology and landscape ecology to better understand the spatial aspects that can affect epidemiological processes across a disease’s geographical range and the spatial interactions involved [1
]. This is especially important when dealing with vector-borne diseases (VBD), such as Dengue fever (DF), because of the role the vector, in this case a mosquito, plays in the distribution of the disease. DF is a neglected tropical disease (NTD) and one of the leading causes of illness and death in tropical regions of the world. It is a vector-borne disease of the flavivirus family, commonly transmitted by mosquitos, with nearly 400 million people becoming infected each year, while roughly one-third of the world’s population live in areas of risk. With vaccines currently in trials and not readily available, prevention relies on reducing the impact of the main vector, the Ae. aegypti
]. DF is endemic to most tropical countries and includes four serotypes. Secondary infection by another serotype can result in a deadlier form, also known as Dengue hemorrhagic fever. Because of this, the National Institute of Allergy and Infectious Diseases (NIAID), a division of the U.S. Department of Health and Human Services, lists Dengue under category A, their highest risk to national security and public health.
The main vector of DF, the female Ae. aegypti
mosquito, is active during the day. The AEAe. Alboqictus is another mosquito vector capable of transmitting DF, but has not been identified as having as significant of a dispersal capacity due to its preference for feeding on animals rather than humans [3
]. The preference of Ae. aegypti
for daytime feeding and urban areas means that human-mosquito interaction is high. This explains why this mosquito is the focus of research around the world.
Colombia is one of the countries where Ae. aegypti
and DF are endemic, posing a serious public health concern and disease burden as a majority of the population live in areas at risk for DF and similar viral diseases transmitted by this vector [4
]. A major outbreak of DF occurred in the country in 2010, affecting over 150,000 people and caused 289 deaths [5
]. The Magdalena River watershed alone reported 24,949 cases of DF in 2014 by the National Institute of Health (Instituto Nacional de Salud, INS) of Colombia [6
]. With a projected national population of 36,127,443, this results in a national case rate of 6.9 cases per 10,000. The Magdalena watershed was chosen as the site for this study due to its natural separation from other geographical regions in the country, its wide range of climatic conditions, the fact that it includes the main urban centers in Colombia, and houses 80% of the country’s population.
Challenges for modelling DF in countries such as Colombia include the demographic, ecological, entomological, climatic, and social aspects of the human-mosquito-virus interrelationship triangle. Globally, the use of remote sensing (RS) satellite imagery has been one way of addressing these difficulties in recent decades. Advances in the quality and types of RS imagery has made it possible to enhance or replace the field collection of environmental data such as precipitation, temperature, and land use, especially in remote areas of the world [7
]. RS has been used to estimate mosquito abundance [8
] and Dengue incidence [9
]. Another challenge is the method used to combine the variety of data into an accurate model. Many methods have been proposed for disease intervention modeling, such as weighted linear combination used with geographic information systems, probabilistic layer analysis, decision tree analysis, as well as linear and logistic regression analysis [7
]. One such method, boosted regression tree analysis (BRT), has proven useful in a wide range of studies, including predicting forest productivity [10
], properties of wood composites [11
], crop disease outbreaks [12
], analysis of ecological data [13
], and epidemiological studies [14
]. More relevant to this paper, this methodology has been used in remotely-sensed imagery classification [15
] along with other vector-borne diseases such as Leishmaniosis [16
], and Crimean-Congo hemorrhagic fever [17
]. Using Geographic Information Systems (GIS), these methods have previously divided the area under study into a grid of equally spaced squares. In epidemiological studies such as Cheong et al. [18
], the squares represent an area of 200 m2
, while in environmental modeling studies, such as Messina et al. [4
], the squares represent areas up to 5 km2
in size. While smaller grids result in a finer resolution of a disease outbreak, their use requires increasingly more computing resources as the study area expands. There were no studies found that use counties or municipalities to divide the areas under study. However, the use of municipalities does add bias to the design, known as the Modifiable Areal Unit Problem (MAUP), but using municipalities could be a first step in identifying ‘hot’ regions that would then justify a follow-up smaller scale study of the specific area of interest, using a grid as previously described.
Highly urbanized areas have been associated with higher levels of Dengue due to the adaptation of Ae. aegypti
to the human-vector-human transmission cycle [19
]. Physical environmental characteristics of urban environments also play a role in the adaptation of Ae. aegypti
to these environments and consequently on DF [20
]. Therefore, using the Magdalena River watershed of Colombia as a study site and BRT as a statistical method of analysis, our research questions are:
Which environmental factors have the highest relative influence in association with Dengue fever?
What is the spatial distribution of the risk of Dengue Fever based on these environmental factors?
What are the differences between using presence/absence and case counts of DF in this type of analysis?
package in R was used to calculate the RMSE between the predicted mean and actual cases in the training data, along with the Pearson correlation coefficient and p-value. The results are summarized in Table 1
, and show that the Poisson family out-performed the Bernoulli family models across all years. The average RMSE for all three years was also considerably lower for the Poisson model (mean = 28.267) compared to the Bernoulli model (mean = 98.732), reflecting a better model fit.
The maps shown in Figure 2
, Figure 3
and Figure 4
reflect the results shown in Table 1
. The left panel represents the cases per 10,000 population per square kilometer for each municipality by year. The dark green color represents very low ratios of DF, and red color reflects a higher incidence of DF. All maps used the same classification as the reported cases map for comparison, with an additional symbol (black) used for values outside the reported cases range. The center panel represents the results of the BRT analysis using the Poisson family, while the right panel represents the results of the BRT analysis using the Bernoulli family, or presence/absence data. While the BRT Poisson map reflects a higher correlation and lower RMSE to the reported cases than the BRT Binomial map, there is still some apparent underfitting of the model.
Those areas in the central and southern sections of the watershed, along the foothills between mountain ranges, show an expected coincidence between high levels of DF cases and high estimated probability of DF occurrence. As expected, the areas, where low probability of occurrence coincided with low cases, also coincided with the higher elevation areas of the watershed. As previously noted, elevation was shown to be a limiting factor in the spread of DF by Ae. aegypti.
The 20 relative importance variables for each year are listed in Table 2
, with population density (POP_DEN), daytime LST minimum annual maximum municipality variables (LD1B) ranking high in both models. In the Poisson models population density or daytime LST mean annual maximum municipality (LD3B) represented more than 50% of the relative influence in a year, whereas there was no such clear distinction in the Bernoulli models. Nighttime temperatures (LNxy) were more common in the Bernoulli models along with mean elevation (EL3). Population density within a municipality was a significant variable across all models for all years.
Using boosted regression tree analysis within the Magdalena River watershed of Colombia as a study site, this paper sets out to identify environmental factors with high relative influence of DF incidence and to map the spatial distribution of DF risk. A comparison was also made between the standard presence/absence models found in the literature with a model that used the reported case counts by municipality. The results show that the interaction between population density, elevation, daytime LST, and nighttime LST played the most descriptive role in determining the niche of DF within this study. Population density was one of the highest relative influence variables across all of the years studied. This strong positive relationship can be explained by several factors, such as the preference of Ae. aegypti
to breed in water filled artificial containers associated with human activity [38
], which may be more abundant and closer to higher occurrences of human population density. In addition, greater population density means more opportunity for the mosquito to transmit the virus from an infected person to an uninfected one [39
A strong influence from elevation was expected due to the previously established inverse relationship between elevation and temperature in connection to the widely-reported influence of temperature on Ae. aegypti
] and Dengue transmission [24
]. In a study conducted in Central Mexico, Moreno-Madriñán et al. [8
] used RS technology to detect a strong inverse relationship (supporting previously used in situ measurement studies) between elevation and Ae. aegypti
abundance. However, in the present study, a strong influence from Mean elevation (EL3) was observed in the Bernoulli models, but not in the Poisson models. Assuming the Poisson models were indeed more accurate, a possible low influence from elevation may be related to the fact that most of the large cities and the most populated areas are located at higher elevations due to cultural habits in this region. As explained previously, population density was among the most influential variables, thus this cultural habit could have confounded the effect of elevation in this study area. In addition, urban heat island effect might have played a role in these large cities. Many cases reported at higher elevations may have been brought in by people traveling from other municipalities located at lower elevations. Indeed, several authors estimated an elevation limit for Ae. aegypti
to be between 1800 m ASL and 2000 m ASL [22
]. In Mexico, Lozano-Cifuentes et al. [42
] reported Ae. aegypti
rare but present at an elevation of 2130 m ASL while it had been previously reported at 1630 m ASL [43
]. In Colombia, the highest elevation previously reported was 2200 m ASL [44
], with a more recent report having found evidence above 2300 m ASL [30
]. In the data supplied for 2013, there were 15 municipalities above 2300 m ASL that listed one or more cases of DF.
Due to the aggregation of the environmental variables, several highly correlated sub-variables were generated and used. All raster pixels falling within a municipality border were aggregated to create a minimum, maximum, and mean temperature assigned to that municipality. In a linear or logistic regression analysis this would be a problem, but the BRT method used herein is a decision tree method with cross-validation, and was able to overcome this limitation by using many weak classifiers to create a stronger classifier [33
The Poisson models consistently showed daytime LST to be highly determinant variables, which is also in line with other literature that model mosquito habitat and in particular temperature in relation to disease [40
]. Daytime LST appear to be a more important limiting factor as compared with nighttime LST, probably because the highest temperatures reached in many areas of this tropical site can easily exceed the upper limit of the comfort window for Ae. aegypti
and would be expected to be reached during the day. It is important to mention that Stanforth et al. [46
] detected, as anticipated, respectively positive and negative relationships of DF to temperature and elevation, using the same data set as this study but with a principal component analysis methodology.
While precipitation has been considered significant in other studies [37
], here precipitation did not have a significant relative influence. Stanforth et al. [46
] also detected a relatively low influence of precipitation. Such a low role of precipitation might be explained by the domestic and peri-domestic environment that Ae. aegypti
], in which there is an abundance of larval rearing sites filled with water by humans (i.e., water storage tanks, flower pots) [39
], thus making its reproduction less dependent on precipitation. Accordingly, it has been suggested that the strong dependency of this mosquito on water containers filled by humans, makes it less susceptible to rain variability [19
]. This may also explain why Stanforth et al. [46
] found a negative association between the minimum annual precipitation and dengue incidence. Suggesting people may be more likely to store water during a dry season, providing more opportunities for larval rearing while on the contrary the faster surface water flow during the rainy season may not allow time for larval development [47
All other variables were below the significance threshold, notably including all the land use and land cover variables (LULC). The latter was an unexpected finding since other studies, e.g., [18
], have reported LULC variables to be useful in determining the risk of DF. The discrepancy may be due to the greater number of variables and sub-variables used in our models. For instance, sub-variables of LST showed to be more important than LULC, potentially minimizing the combined effect of LULC. In addition, Cheong et al. [18
] only used LULC variables, so the weight of their LULC index could not be outperformed in their model by stronger variables—e.g., temperature—as was experienced in this study. The small amount of any one type of land use in a given municipality compared to the satellite coverage of the entire area for other variables, such as LST, may also have overshadowed the contribution of LULC variables. It is noteworthy that ‘Urban’ was not as determinant a variable, despite the known affinity of the vector for urban areas. This may be due to ‘Urban’ derived from satellite imagery representing all impervious surfaces, such as roads, rooftops, and sidewalks, and other areas the vector could not use for breeding sites.
As discussed previously in Table 1
, the Poisson models had a lower RMSE and higher correlation over the study period than the Bernoulli models. Additionally, the Poisson models used actual cases reported for each municipality, thereby giving a more accurate picture of the spatial distribution of DF in the Magdalena River watershed. Generally, the municipalities in the central and southern sections of the watershed along the valleys between mountain ranges show an expected coincidence between high levels of DF cases and high estimated probability of risk. Likewise, as expected, the municipalities with low probability of occurrence generally coincided with low numbers of actual cases. While the models show a high correlation to the dependent variables, there was still some under-fitting and over-fitting occurring. In the case of the Bernoulli models, factors that influence this could be due in part to a limitation of the current model that can only accept a binary dependent variable, the presence or absence of DF cases, rather than the actual counts of DF for each municipality. Another reason for this could be the interaction and interdependence of the tree complexity, bagging fraction, and learning rate [48
Another strength of the BRT models described here is in the output of a map of the likelihood of DF prevalence on a municipality level. Not only is this method less computationally intensive, but it allows for a more identifiable and user-friendly result for end users that may not be as familiar with other types of maps or analysis results. Since the model uses aggregated municipality data, this method is also easier for those that may not be able to use identifiable patient data to get reliable and useful results. Future research could consider using the method outlined here as a first step to identify municipalities at high risk, and then follow up using data fusion and higher resolution analysis methods, along with the same environmental data presented here.
Studies, such as Cheong et al. [18
], use geocoded point locations of reported cases of DF. In this research, only weekly counts by municipality were available, which is a limitation in many cases, but it makes this method easier and more accessible to researchers without advanced computing capability or access to geocoded disease data. Being able to use readily available municipality shapefiles and standardized weekly reporting would allow this method to be adopted by a larger group of public health practitioners in more areas, especially those in developing countries or those with less geospatial analysis experience.
As DF is an endemic disease in the Magdalena River watershed of Colombia, the effect of asymptomatic infection rate and reporting bias may also play a role in the ability of the models [35
]. Due to the aggregation of all cases into a year, this could bias the ability of the model by including municipalities with misreported cases. However, such possible bias may be neutralized in follow-up studies with higher temporal resolution, where municipalities with few cases are also more likely to have frequent reports of absence while those with high numbers of cases are more likely to frequently be reported with presence.