The resurgence of tropical diseases in the Americas, including Dengue Fever (DF), may be related to the suspension of vector eradication programs, such as those conducted by the Pan American Health Organization (PAHO) programs until the 1970s [1
]. Implementation of new control programs require ongoing data collection. Cost limitation of in situ data collection has increased the use of remote sensing (RS) and Geographic Information Science (GIS) applications to estimate disease spread and population vulnerability [2
]. Public health agencies can use such tools to implement monitoring and surveillance programs for multiple purposes, including the estimation of vector abundance, implement prevention strategies, or to control for further dispersion of the disease and vector [4
]. Additionally, local metrics and research are needed to improve current vector disease monitoring practices before cost effective predictive or forewarning systems can be studied or implemented [6
]. Therefore, the goal of this study was to investigate the variables which would provide the most benefit to future investigations of predictive modeling. The outputs of this study identified variables capable of inclusion in such studies, but the statistical methods used here are not designed for predictive purposes.
One of the more commonly implemented remote sensing (RS) tools for vector research has been the Moderate Resolution Imaging Spectroradiometer (MODIS). This dual satellite platform, Aqua and Terra, of NASA’s Earth Observation System maintains a sun-synchronous orbit and is capable of collecting data over the entire Earth multiple times a day [10
]. The high temporal resolution reduces the sensor’s spatial capabilities, 250 m for multispectral data and 1 km for land surface temperature (LST), but its ability to collect a variety of spectral wavelengths daily has proven useful for a variety of environmental assessments [11
]. The satellite has already been implemented for a variety of studies. Fuller et al. [12
] demonstrated how predicative modeling of DF can be achieved through temperature records and remote sensing products. Vector modeling could therefore be implemented into current autonomous monitoring systems, such as MODIS’s Famine Early Warning Systems Network (FEWS NET [13
]. NASA scientists have developed such a Malaria transmission model though the Goddard Space Flight Center’s Health Planet program, which incorporates input variables for parasites, hosts, vectors, and environmental and human factors [4
]. Similar vector warning products could be developed to improve disease dispersion monitoring practices for additional diseases [2
The Aedes aegypti
mosquito vector is capable of spreading a variety of diseases, including: Yellow Fever, Chikungunya, Zika, and DF. Disease transmission rates of this vector are considered to be influenced by both cultural and climatic conditions. Although commonly associated with tropical and subtropical climates, due to the habitable zone of the mosquito, changing climate conditions may expand the dispersion potential of these diseases [1
]. This demonstrates why improved methods to identify habitable zones of this vector will become increasingly important to improve future disease mitigation and prevention practices.
Since no vaccine for DF is currently available, mosquito control practices are critical to reduce infection [1
]. Identification of geographical areas to reduce mosquito populations and contact between the vector and people are essential to attenuate potential outbreaks. Previous research has identified flower pots, water storage containers, impervious surface catchments, and other anthropogenic water sources as opportunities for Aedes aegypti
larval development [1
]. These features are common to domesticated or developed areas, perpetuating the likelihood of the Aedes aegypti
mosquito to be found in urban/populated environments, and increasing the opportunity for mature specimens to feed on human populations to disperse the virus [4
The vector’s habitat and disease transmission potential are largely set by given meteorological and elevation boundaries [3
]. Identification of these vector boundaries, and the pathogen it carries, is known as detection of the zoonotic niche [22
]. The variables defining this niche can be estimated through RS data to derive the geographic extent of the mosquito’s habitat [3
], and could become increasingly important to accurately monitor future transmission risk of DF. Similar habitat and dispersion studies are already being conducted for this and other vector diseases, including West Nile Virus and Malaria [2
]. The focus of this manuscript is the DF virus, although some of the incorporated variables and discussion directly address the Aedes aegypti
mosquito vector, which we assume the vector dispersion to be associated to the disease spread.
A study of DF in Singapore described an increase of 22%–184% in reported cases given a 2–10 °C rise in maximum or minimum temperature [8
]. Additional laboratory and field research in Cali, Colombia have also demonstrated a strong relationship between DF dispersion and temperature range [26
]. Under lab conditions, Brady et al. [28
] reported low survival of the mosquito vector below 14–15 °C or above 35 °C [29
]. Rueda et al. [30
] similarly saw survival decline outside the 15–34 °C temperature range, but also documented an inverse relationship between body size and increasing temperature [30
]. These temperature limits may also impact the latitude and altitude of the mosquito’s habitat [21
]. In Mexico and Colombia, the mosquito has been reported to be abundant up to 1700 [18
] and 1800 m [32
], respectively. However, more recently it was reported present (although not abundant) at 2130 m in Central Mexico [18
] and 2300 m in Colombia [33
]. Cultural parameters can involve the presence of anthropogenic water storing activities, which may be responsible for up to 79% of the reported cases [3
]. Water storage tanks, potted plants, and similar peridomestic related containers can provide larval rearing sites proximal to populated areas.
Exploratory factor analysis has previously been used to investigate the relationship between a dependent variable and potential contributing, independent variables [34
]. Principal component analysis (PCA) is a factor analysis method used to identify trends within a multivariate dataset to reduce or derive new, efficient, component variables [5
]. Independent variables are plotted against the dependent variable during the PCA to identify linear relationships, or clusters, to build the component variable relationships. The amount of the original dataset variability a component variable encompasses is demonstrated by its eigenvalue; the larger the eigenvalues, the stronger the composite component. Composite variables with an eigenvalue of one (1), or greater, are classified as principal components according to the Kaiser Criteria as they describe at least as much variability as any one of the original variables [35
]. Components which do not reach the Kaiser Criteria can be considered ‘noise’ and not implemented in future stages; therefore, fewer variables are used to describe the majority of the data [37
]. These methods have been shown to reduce the amount of data used and identify trends in the dataset [8
]. Other studies have used regression techniques, including the geographically weighted regression and auto-regression, methods for virus modeling [8
], and boosted regression trees [22
]. These methods can be computationally intense and difficult to implement. Although they were chosen for their respective studies due to their predictive ability, they sometimes incur complications with non-normalized data, such as that used in this study. Since the main goal of the present study was not prediction, but data reduction to identify strong determinant variables, the PCA methodology was identified to be more appropriate for this purpose [38
This project was designed as an exploratory analysis to identify variables which may explain the annual incidence of DF within Colombia’s Magdalena River watershed during the 2012–2014 study period. This study site is characterized by climate conditions ranging from sea level to permanent snow, and from desert to tropical rainforest. The wide range of climate conditions, which exceeds the presence of the upper and lower climate limits of the Aedes aegypti, makes this an isolated and ideal study site for this type of exploratory analysis. This methodology was intended to investigate whether the study site supports habitat and transmission information obtained about the vector during previous laboratory and other geographical area studies. Although this PCA analysis has a descriptive, rather than predictive, connotation, its results are aimed to guide future research projects intended to develop predictive models or early warning systems for the disease.
The Stage 2 categorical PCA reduced the list of 150 independent variables to 14 variables which were able to more efficiently explain the variance in the dataset without extraneous variable noise. Among the more prominent variables found from Stage 3 were nighttime LST minimum cell-max zone value, elevation minimum value, vegetation min cell-max zone, and daytime LST max cell-mean zone. Four LULC variables were used in the Stage 3 assessment based on Stage 2 results and a priori knowledge from the literature, such as the relationship between the mosquito and urban populated areas, but all four variables demonstrated limited results [1
]. A complete list of the Stage 3 independent variables can be viewed, along with their component scores, in Figure 3
; negative numbers indicate an inverse relationship, and the strongest component loading was bolded for each variable. The variables which load heavier in the earlier components (example 1 or 2) demonstrate a stronger relationship to the dependent variable.
The Stage 3 PCA descriptive results in Table 2
indicate the methods were fairly consistent between the three models. The results show both population-based models contain five components, and explain over 73% of the variance within the dataset. The EBE dependent model contained one fewer component, with four, and a lower explanation of variance compared to the other models. This may be due to the reduced number of input variables, aka population, but further analysis would be needed to confirm such a theory. The three models also had similar results when compared through the AUC, as is seen in Table 3
. These results suggest there were limited statistical differences between them. The ROC AUC also demonstrated the exploratory methods used in this study were able to provide results better than statistical chance, but could be improved. The ROC AUC represent a ranking of “Poor–Fair” for the statistical method results, but are acceptable for an exploratory study [66
]. With the similarity in model PCA results, similar component loadings between the three models will support the incorporation of those strongly-loading variables in future projects.
Night LST min cell-max zone consistently ranked high across all three models. Night LST mean cell-min zone and daytime LST max cell-range zone were also highly loaded in the first or second component, supporting previous documentation on the influence of temperature range, or diurnal temperature, on mosquito survivability and their use for DF modeling [26
]. Since both warmer daytime and cooler nighttime temperatures were found to be highly influential in the results, it demonstrates the variability of temperature within an area may influence DF transmission. This is particularly relevant since the range zonal statistic was determinant in both day and night temperature variables. Although daily diurnal temperatures has previously been demonstrated to influence the development of the mosquito, it should be noted these results are representative of data aggregated to an annual value which may not directly compare [27
]. The general term ‘temperature range’ may be used hereafter to refer to the general temperature variability observed on the dependent variable. LST data documented an annual mean nighttime pixel range of −8.218–21.88 °C, and 8.1–42.73 °C during the day between both 2012 and 2014. Although large ranges are detected in this study, they may be influenced by either the weather or the diverse landscapes within a municipality, and may not provide direct support for the previous laboratory identified vector temperature limits [27
]. The importance of the temperature range on DF is further supported by the consistently strong inverse loading of elevation in the first component across all three models; high elevations contain colder temperatures and have been documented to impede mosquito habitation [3
]. Another influential explanatory variable was identified as vegetation min cell-max zone; which loaded positive in the first component for all models and could represent areas capable of supporting flora and fauna [1
DF has been widely recognized as a mainly urban or peri-urban disease. The mosquito vector prefers populated spaces, due to the availability of anthropogenic water breeding sites and human food sources [1
]. However, these results showed Urban LCLU as one of the lowest ranked variables in all three of the models, and across all three of the study years. This unexpected result might have been due to the low annual temporal or spatial resolutions used. Similarly, the population and population density variables ranked low across all years, not listing higher than the fourth component. Conversely, the vegetation index results were superior to those of the LULC variables. However, these results do not necessarily contradict previous work, as they still loaded within the principal components of Stage 2. The lower than expected importance of urban LCLU and population in this study suggests the need for future studies to investigate contributory urban attributes, particularly if using different spatial or temporal resolutions [1
]. For example measures of local density, urban heat island, or water storage potential variables could be investigated rather than an individual LULC urban variable.
The SPSS regression variable output from Stage 3 was mapped by municipality to demonstrate the geographical distribution of the project’s results, following the procedures of previous vulnerability studies [34
]. Figure 4
demonstrates the visual comparison of the population density and EBE models to the distribution of reported DF cases per 10,000. The population and population density models were similar in component and mapped results. All three of the models follow a similar spatial pattern across all years by indicating low risk in higher elevated municipalities and increased risk in the more moderate temperature valleys, which coincide with areas more suitable for vector survival [4
]. The maps suggest the population based models have a propensity to indicate higher risk in more densely populated areas, even if they exist above the elevation limit of the mosquito, such as Bogota. The EBE model does not indicate as high of a risk for densely populated areas, even those within the mosquito’s habitable elevation. Therefore, it is anticipated that the population models added weight to municipalities with larger urban areas, potentially due to the inclusion of the population independent variable, which the EBE model did not have. These mapped results show similar distribution of the vector count, supporting the models were able to appropriately identify the contributing independent variables, for example, the temperature range and elevation.
Both nighttime LST min cell-max zone and daytime LST max cell-mean zone, were consistently ranked high in the top two components (Figure 3
) of this study. These results are similar to previous laboratory studies on the Aedes aegypti
mosquito which indicated temperature range can impact the vector’s ability to thrive [28
], and field studies identifying higher low temperatures improve vector resilience [3
]. Due predominantly to the mountainous terrain, the Magdalena River watershed has a large temperature range which exceeds both the minimum and maximum temperature thresholds (which has been found to extend from 14 °C and 24 °C, respectively, depending on location or study parameters [3
]). This provides more influence on the mosquito’s ability to thrive [3
], and impacts the extrinsic incubation period and duration of the gonotrophic cycle [30
], than has been witnessed in previous studies. As noted, the temperature range variables were based off three annual assessments, so the results do not necessarily document the actual temperature limits of the Aedes aegypti
. Additionally, since the temperature range is identified in this study as indicative of DF, it supports the theory of vector habitat re-distribution is impacted by climate change [4
Although the LULC forest classification often contained an inverse relationship in Stage 3’s results, the EVI variables (vegetation) had markedly better results in the top component of Stages 2 and 3 (Figure 3
). It seems logical to expect a lower prevalence of DF transmission in dense forest, since there are lower population rates. However, the vegetation index will have a more inclusive classification of features (grasslands, agriculture, etc.), which may allow for more anthropogenic uses, compared to the individual dense forested space variable. The positive relationship between vegetation and dengue cases might be explained by the increased vegetation in wetter areas, which in turn are more likely to provide larval rearing sites. This is particularly relevant near populated areas where surrounding vegetation (for example, gardens, potted plants or even subsistence agriculture) may be more likely to be irrigated and to have a greater human exposure potential as compared to dense forest. Furthermore, greener areas are often associated with increased moisture and biodiversity, leading to more fauna and flora, which may also provide opportunities for suitable breeding sites [1
]. These water sources are commonly located in more populated or urbanized areas, which may have lower forest LULC-classified space. The lower resolution of the forest LULC product used, compared to EVI vegetation index, could also explain the difference.
The widely accepted view that Aedes aegypti
preferentially breeds in peridomestic water sources, such as storage tanks or flower pots [18
], which are more abundant in urban areas, plus the higher density of people around these sites makes transmission of DF easier [71
]. Indeed, urban spaces may even be oasis breeding sites, due to impervious surfaces collecting water and increasing the UHI, in unexpected habitat ranges (temp or elevation) [72
]. The unexpected weak relationship detected here between urban LULC and DF in the Stage 3 models could be due to a low proportion of urbanized space within the municipalities, misclassification of suburban/rural communities in the LULC dataset, and/or the spatial resolution used in this study. Conversely, it may suggest future investigations as to whether higher contact with the vector is made outside, or on the fringe of, urban space, but reported in urban areas. The LULC might be identifying here primarily developed space, such as infrastructure, rather than metropolitan characteristics. Populated areas can be very green, which would increase misclassification of the urban LULC [50
], while high vegetation index records would coordinate with the populated spaces.
Contrary to what was expected, precipitation, population, and LULC variables were less determinant within the Stage 3 results. Precipitation frequently loaded within the lowest 3–5 components across all models. This could represent a limitation of the data resolution or methodology used to measure rainfall’s influence. Another potential limitation is the use of an annual aggregation of RS variables, which does not require the use of a temporal lag. Future studies incorporating higher temporal resolution should incorporate a time lag, to correspond with the gonotrophic cycle, to better identify precipitation trends. The low precipitation results could alternatively be explained by the dependence of this mosquito on peridomestic water sources for larval rearing sites. Use of containers filled with water by humans (i.e., water storage tanks, flower pots), may allow the mosquito to be less dependent on rainfall [74
], particularly during the moderate ENSO seasons experienced during the study period [55
]. LULC overall loaded in the weakest 4–5 components during this project, particularly the EBE model, even though the literature indicated landscape composition to be important in modeling DF [3
]. This may indicate the 500 m low resolution MODIS LULC product is less capable of documenting environmental discontinuity for this type of study [3
]. This may impact the ability to identify urban LULC due to the large pixel resolutions and municipality aggregation used in this study [75
]. It should be noted that barren LULC frequently contained stronger results than urban, which might be due to a misclassification of urban, agriculture, or periphery/transitional landscapes [50
]. The yearly assessment of this project may also fail to identify seasonal transitional landscapes, land with intermittent use, which may impact the rate of transmission by bringing people to areas with a higher mosquito breeding potential [12
]. Seasonally transitional land could include agriculture or seasonal markets. Additionally, the majority of urbanized features in Colombia are built higher in elevation, where the mosquito has not been shows to naturally thrive [3
]. This cultural dynamic may reduce the effect of urban or similar LULC results within this study area. Future studies should continue to explore varying LULC variables, even though they were less influential during this project.
All three of the mapped models locate higher risk areas within the mountain valleys, which is supported by the knowledge that the Aedes aegypti
does not naturally thrive in high altitudes [3
]. Although the population variables suggest increased identification of disease transmission risk in highly populated areas. Since impervious surfaces increase local thermal conditions in a phenomenon known as the urban heat island, it should be considered that urban environments hold the potential to be oasis breeding habitats. The increased temperature and water storage capacities of urban impervious features may provide a haven for mosquito habitats in higher elevations [3
]. Methods of transportation, particularly for goods, could bridge the gap between natural vector breeding sites in agricultural fields and municipalities outside the habitation zone. Human activities can also increase the promotion of vector breeding or host-vector contact through the establishment of gathering spaces in high vector breeding sites, such as seasonal markets, who then return infected to their urban residences where the disease is reported [39
]. Additionally, the EBE had a lower explanation of variance than the other models across all years, which may be attributed to the lack of a population independent variable. This suggests a need to strongly consider the use of population in future vector modeling or zoonotic niche studies. However, it seems appropriate to mention a priori knowledge of the areas would be required to take advantage of this type of information.
A supplementary study was conducted to ascertain the strength of the variables identified from the top 2 components in the Stage 3 analysis. A Stage 3 type PCA was conducted on the 6–7 variables which weighed heavier (0.7 rounded or higher) in the top two components of the original Stage 3 (see Figure 3
). The variables were predominantly from the temperature, elevation, and vegetation categories. These results were very similar to those of the original 14 variable PCA. Although the top two studies had fewer components and a larger explanation of variance, which could be explained by the fewer input variables. The AUC results also demonstrate the similarity between the results, so the component results of the top two component models were not provided. These results simply supported the previously demonstrated influence of temperature, elevation, and vegetation.
This analysis was able to identify variables which are statistically useful to estimate the likelihood of DF transmission in the Río Magdalena watershed in Colombia, as is demonstrated by Figure 3
. Stage 2 and 3 results indicate temperature is an important modeling variable within this study area. The results of nighttime min cell and daytime max cell variables demonstrate that temperature range is particularly relevant, presumably on how it relates to the habitation zone for the vector and virus replication. Elevation min also loaded highly through all of the results as an inverse value, supporting previously-identified altitude limitations to the vector. Vegetation was, similarly, an important variable to explain dengue cases, presumably due to a greater likelihood of larval rearing sites in wetter areas, which could also support increased quantities of vegetation. These results consistently suggest temperature and elevation as strongly related to the mosquito vector’s ability to thrive and transmit the disease.
The PCA, AUC, and mapped results suggest the variables used in this exploratory analysis should be considered in future DF monitoring or predicting tools. Whether to adopt a population or EBE-based model would be dependent on the individual’s prior knowledge and focus. The focus of population models appear to identify risk in higher populated areas, sometimes despite being located above the mosquito’s expected habitable elevation. Such methodology could provide future studies with a population weight to identify spaces where there is a higher rate of transmission, rather than simply the presence of the mosquito vector.
As an exploratory study, the conditions and methods depicted within this report were able to appropriately quantify and model DF records through the use of environmental and population data. The results were able to explain approximately 70% of the dataset variability, across all of the models used. Future studies should include the variables identified in Stage 2 of this study, but consider modifying them in an attempt to further improve the model strength. Although elevation min proved to be a strong variable is this study, additional variations on elevation may prove beneficial in future studies. Future exploratory analysis could investigate whether a more computationally heavy variable could improve analysis, such as by measuring the amount of municipality area which is within the vector’s habitable elevation limits or surface slope, either would better quantify the amount of potential breeding space over the simpler minimum elevation variable used in this study. Additionally, this analysis was focused on variables estimated through RS, however, it is imperative for future studies to incorporate a more direct variable of socioeconomic status. Other statistical methods, such as geographically weighted regression or boosted regression trees, should also be explored to interpret the relationship between similar independent variables and DF. Still, this project was able to appropriately and accurately accomplish its goal of identifying niche variables which identify increased Aedes aegypti, through DF cases, within the Magdalena Watershed. The results described by this exploratory analysis could be used to aid mitigation practices or develop an early warning system for the transmission of DF by the Aedes aegypti mosquito in future studies.