Mapping Environmental Suitability of Scrub Typhus in Nepal Using MaxEnt and Random Forest Models

Being a globally emerging mite-borne zoonotic disease, scrub typhus is a serious public health concern in Nepal. Mapping environmental suitability and quantifying the human population under risk of the disease is important for prevention and control efforts. In this study, we model and map the environmental suitability of scrub typhus using the ecological niche approach, machine learning modeling techniques, and report locations of scrub typhus along with several climatic, topographic, Normalized Difference Vegetation Index (NDVI), and proximity explanatory variables and estimated population under the risk of disease at a national level. Both MaxEnt and RF technique results reveal robust predictive power with test The area under curve (AUC) and true skill statistics (TSS) of above 0.8 and 0.6, respectively. Spatial prediction reveals that environmentally suitable areas of scrub typhus are widely distributed across the country particularly in the low-land Tarai and less elevated river valleys. We found that areas close to agricultural land with gentle slopes have higher suitability of scrub typhus occurrence. Despite several speculations on the association between scrub typhus and proximity to earthquake epicenters, we did not find a significant role of proximity to earthquake epicenters in the distribution of scrub typhus in Nepal. About 43% of the population living in highly suitable areas for scrub typhus are at higher risk of infection, followed by 29% living in suitable areas of moderate-risk, and about 22% living in moderately suitable areas of lower risk. These findings could be useful in selecting priority areas for surveillance and control strategies effectively.


Introduction
Scrub typhus, an acute febrile zoonotic disease originating from Japan, is caused by bacteria called orientia tsutsugamushi 1899 [1]. It is transmitted through the bite of infected mites (the larval stage; under the risk of disease transmission in Nepal using the ecological niche approach and machine learning modeling techniques.

Materials and Methods
The overall framework of this study included data collection, processing, the fitting machine learning model, model evaluation, and prediction and generation of the scrub typhus suitability map ( Figure 1). We introduce the details of each step in the following sections. environmental suitable areas of scrub typhus transmission and associated factors in Nepal. In this study, we mapped the environmental suitability of scrub typhus and estimated human population under the risk of disease transmission in Nepal using the ecological niche approach and machine learning modeling techniques.

Materials and Methods
The overall framework of this study included data collection, processing, the fitting machine learning model, model evaluation, and prediction and generation of the scrub typhus suitability map ( Figure 1). We introduce the details of each step in the following sections.

Study Area
Nepal lies on southern slope of Himalaya between India and China in latitudes of 26°22′N to 30°27′N and longitudes 80°04′E to 88°12′E. It is a mountainous country with an area of 147181 square kilometers. Administratively, Nepal is composed of 7 provinces, 77 districts and 753 local bodies [27]. There is vivid land topography in Nepal with elevations ranging from 60 m in the southern plains to 8848 m at Mount Everest in the north. Based on the land topography, there are five distinct physiographic zones-high mountain, middle mountain, hill, Siwalik, and Tarai [28]. Climate of Nepal is broadly subtropical monsoon with distinct seasonality. Depending on the pattern of precipitation, Nepal has four distinct seasons: winter (December, January, and February), pre-monsoon (March-May), monsoon (June-September), and post-monsoon (October and November) [29]. Due to variations of altitude and topography, there are distinct microclimate zones from subtropical in the southern Tarai plain, temperate in the midland to the freezing nival climate in the north. Owing to different terrain and harsh climates, the northern area of Nepal is sparsely populated while the lowland southern Tarai and less elevated river valleys are highly populated.

Scrub Typhus Occurrence Location
We extracted the occurrence locations of scrub typhus from the line listing file received from the Epidemiology and Disease Control Division (EDCD), Government of Nepal. The line listing file contains the addresses of patients which include names of the VDCs/municipalities and their respective wards, including the district name. Then, the address data were geocoded using the "opencage" geocoder package (https://opencagedata.com/) in R. The "opencage" is an open access platform, which provides a geocoding facility up to 2500 addresses free of cost per day. In this process, we were able to geocode 123 villages/towns from where at least one case of scrub typhus had been reported in the government health system during 2015 to 2018. These scrub typhus occurrence locations ( Figure 2) were spatially representative, covering almost entire country.

Study Area
Nepal lies on southern slope of Himalaya between India and China in latitudes of 26 • 22 N to 30 • 27 N and longitudes 80 • 04 E to 88 • 12 E. It is a mountainous country with an area of 147181 square kilometers. Administratively, Nepal is composed of 7 provinces, 77 districts and 753 local bodies [27]. There is vivid land topography in Nepal with elevations ranging from 60 m in the southern plains to 8848 m at Mount Everest in the north. Based on the land topography, there are five distinct physiographic zones-high mountain, middle mountain, hill, Siwalik, and Tarai [28]. Climate of Nepal is broadly subtropical monsoon with distinct seasonality. Depending on the pattern of precipitation, Nepal has four distinct seasons: winter (December, January, and February), pre-monsoon (March-May), monsoon (June-September), and post-monsoon (October and November) [29]. Due to variations of altitude and topography, there are distinct microclimate zones from subtropical in the southern Tarai plain, temperate in the midland to the freezing nival climate in the north. Owing to different terrain and harsh climates, the northern area of Nepal is sparsely populated while the lowland southern Tarai and less elevated river valleys are highly populated.

Scrub Typhus Occurrence Location
We extracted the occurrence locations of scrub typhus from the line listing file received from the Epidemiology and Disease Control Division (EDCD), Government of Nepal. The line listing file contains the addresses of patients which include names of the VDCs/municipalities and their respective wards, including the district name. Then, the address data were geocoded using the "opencage" geocoder package (https://opencagedata.com/) in R. The "opencage" is an open access platform, which provides a geocoding facility up to 2500 addresses free of cost per day. In this process, we were able to geocode 123 villages/towns from where at least one case of scrub typhus had been reported in the government health system during 2015 to 2018. These scrub typhus occurrence locations ( Figure 2) were spatially representative, covering almost entire country.

Environmental Covariates
We used a range of predictor variables in this study. These factors are environmental information relevant to disease amplification and transmission-related with scrub typhus including topographical, climatic, and vegetation dynamics [8,9,11,30]. Complete list of potential environmental variables used in the study is presented in Table 1.

Environmental Covariates
We used a range of predictor variables in this study. These factors are environmental information relevant to disease amplification and transmission-related with scrub typhus including topographical, climatic, and vegetation dynamics [8,9,11,30]. Complete list of potential environmental variables used in the study is presented in Table 1. We used SRTM DEM with 90-m spatial resolution (http://srtm.csi.cgiar.org) to derive topographical variables including elevation, slope, and aspect. The DEM was directly used as a continuous elevation layer. For slope and aspect geo-processing, tools available in arc GIS were used to calculate topographic slope and aspect of the study area. We used the most widely used 19 bioclim layers from the WorldClim geoportal (http://worldclim.org) [31] to characterize climatic variations in the study area. We removed highly correlated variables with a threshold of Pearson correlation r >|0.7| and retaining five least correlated variables [32]. To characterize vegetation dynamics, we used monthly MODIS time series NDVI (MOD13A3) [33] from 2015-2018 synchronizing the time periods of scrub typhus geolocation data using the MODIStsp package [34]. We extracted three NDVI metrics NDVI min , NDVI mean , and NDVI max . In addition, we also extracted four proximity variables including distance from cropland, distanced from shrubland, distance from urban land and distance from earthquake epicenter with a magnitude higher than 5.4 for this study. The landcover map of 2010, which is publicly available in the geoportal of ICIMOD [35], was used to compute proximity of selected land covers while earthquake epicenters with magnitudes of 5.4 or higher that occurred in Nepal following the devastating Barpak earthquake till 2017 were retrieved from the NASA website (https://earthquake.usgs.gov). Euclidean distance function in Arc GIS 10.3 was used to compute the proximity for selected land cover and earthquake epicenter data. The spatial distributions of the selected environmental variables are presented in Figure 3. The relative influence of different environmental variables during the model fitting process was assessed based on the AUC test score. Further, we created response curve plots for the most important variables to examine the nonlinear relationship between environmental variables and predicted scrub typhus transmission risk. Finally, mean spatial predictions of all three modeling techniques based on the fitted model and selected 16 environmental variables were produced and exported in raster format. The spatial risk patterns of scrub typhus were visualized using ARC GIS based on the mean spatial prediction of these techniques. To assess the coherence of three selected modeling techniques in spatial prediction, Pearson's correlation coefficient was calculated [37] using the 10,000 random points generated from the entire study area.
To estimate the human population exposed at different levels of risk, we overlaid the reclassified final suitability map derived from the ensemble technique using the natural breaks [43] in ARC GIS with human population raster data retrieved from WorldPop (http://www.worldpop.org.uk/) geoportal.

Results
We fitted 100 models, 50 models in each technique based on 10-fold cross-validation. Both modeling techniques performed well with mean AUC value above 0.8 and mean TSS value above 0.6 in both training and test dataset simulations, indicating the robustness of the fitted models (Table 2). We observed insignificant variations in AUC and TSS between the training and test split.

Mapping and Modeling
We applied MaxEnt and RF machine learning modeling techniques to identify different environmental factors and predict the spatial distribution of scrub typhus transmission risk in Nepal. These machine learning modeling techniques are robust due to their high predictive capacity [16]. Previous studies have widely used these methods in spatial modeling and prediction of different fields including spatial prediction and disease mapping [9,32,36,37]. MaxEnt estimates disease distributions by finding the distribution of maximum entropy: the simplest possible distribution that is consistent with the mean and variance of the observed distribution. RF is a tree-based classification and regression tree (CART) [38] algorithm. CART recursively partitions the environmental space into a large number of subsets within which separate regression models are fitted and then recombined to give a complex final response. The RF is the improved version of CART to address the overfitting problem through the bagging concept. RF builts trees using randomly selected bootstrap samples of the training data (used to build the model), with the number of bootstrap samples equal to the number of trees (ntrees) selected. Each tree is split by randomly sampling a number of predictor variables to use (mtry) at each node and then choosing the best split [23].
We used 89 geo-occurrence locations of scrub typhus and a set of environmental variables to fit the model. As running the modeling requires a background or absence location, we generated 153 background points using the randomPoint function of the "dismo" package covering the entire study area, keeping a proximity threshold of background points from the presence points (3 × 3 km) [39]. As an absence location which represents a location of the least likely occurrence of disease is normally difficult to collect [40], background points are usually used as alternatives in the modeling mapping of disease distribution.
For an accuracy assessment, we divided the geo-occurrence points randomly into training and testing subsets in the proportion of 70 and 30 percent, respectively. The area under curve (AUC) of the receiving operating characteristics (ROC) and true skill statistics (TSS) metrics were used to evaluate the model accuracy [41]. The AUC measures the predictive performance of the model by comparing the model's predictive ability to the random prediction, and values range from 0 to 1 where 0.5 indicates random prediction and higher values correspond to a better model [22]. The TSS compares the number of correct predictions, minus predictions attributable to random guessing. In other words, it is the sum of sensitivity and specificity minus 1. Its value ranges from −1 to +1, where +1 indicates perfect score, 0 indicates random performance, and values of 0.5 or higher are generally considered acceptable model performance [42]. To account for variation in the model results that can arise from an arbitrary data split, we fitted each model 10 times, using a different subset of geo_occurrence points and a different random assignment of training based on the cross-validation (CV) approach.
The relative influence of different environmental variables during the model fitting process was assessed based on the AUC test score. Further, we created response curve plots for the most important variables to examine the nonlinear relationship between environmental variables and predicted scrub typhus transmission risk. Finally, mean spatial predictions of both modeling techniques based on the fitted model and selected 16 environmental variables were produced and exported in raster format. The spatial risk patterns of scrub typhus were visualized using ARC GIS based on the mean spatial prediction of these techniques. To assess the coherence of the selected modeling techniques in spatial prediction, Pearson's correlation coefficient was calculated [37] using the 10,000 random points generated from the entire study area.
To estimate the human population exposed at different levels of risk, we overlaid the reclassified final suitability map derived from the ensemble technique using the natural breaks [43] in ARC GIS with human population raster data retrieved from WorldPop (http://www.worldpop.org.uk/) geoportal.

Results
We fitted 100 models, 50 models in each technique based on 10-fold cross-validation. Both modeling techniques performed well with mean AUC value above 0.8 and mean TSS value above 0.6 in both training and test dataset simulations, indicating the robustness of the fitted models (Table 2). We observed insignificant variations in AUC and TSS between the training and test split. Among the finally selected 16 geographic and environmental variables proximity to cropland, elevation, and slope and distance to urban land were the major contributors in both models, although rank of importance was little different depending on the modeling techniques ( Figure 4).
The negative association of proximity to cropland and proximity to urban land to the probability of occurrence of scrub typhus is observed for Nepal. The marginal response curve for proximity to cropland decreases sharply until the value reaches 800 m and no response is observed from then ( Figure 5). However, the response curve of proximity to urban land is gentler and goes up to 1500 m.
The marginal response of elevation is positive until the height reaches around 200 m and then it becomes negative. The negative association part of the curve is more smooth and has a response up to 6000 m asl. The risk of scrub typhus initially increases with an increase in rainfall but decreases gradually once rainfall reaches 43 mm.
Among the finally selected 16 geographic and environmental variables proximity to cropland, elevation, and slope and distance to urban land were the major contributors in both models, although rank of importance was little different depending on the modeling techniques ( Figure 4). The negative association of proximity to cropland and proximity to urban land to the probability of occurrence of scrub typhus is observed for Nepal. The marginal response curve for proximity to cropland decreases sharply until the value reaches 800 m and no response is observed from then ( Figure 5). However, the response curve of proximity to urban land is gentler and goes up to 1500 m. The marginal response of elevation is positive until the height reaches around 200 m and then it becomes negative. The negative association part of the curve is more smooth and has a response up to 6000 m asl. The risk of scrub typhus initially increases with an increase in rainfall but decreases gradually once rainfall reaches 43 mm.  Figure 6 depicts the spatial distribution of environmentally suitable areas of scrub typhus in Nepal based on MaxEnt, RF, and ensemble techniques where suitability values range from 0 (low) to 1 (high). In general, there is a broad consistency in both methods. Despite little variation in the prediction, both models are strongly correlated with Pearson correlation values higher (r = 0.8, p < 0.05) than when model predictions were validated based on 10,000 randomly generated sample points. The predicted high suitable areas are mainly distributed in lowland tarai and less elevated hill regions across the country in both models. The highly suitable areas are continuously distributed in the western tarai and the lower hills of central Nepal while it is irregular in the east, mainly in the southern region of east tarai.  Figure 6 depicts the spatial distribution of environmentally suitable areas of scrub typhus in Nepal based on MaxEnt, RF, and ensemble techniques where suitability values range from 0 (low) to 1 (high). In general, there is a broad consistency in both methods. Despite little variation in the prediction, both models are strongly correlated with Pearson correlation values higher (r = 0.8, p < 0.05) than when model predictions were validated based on 10,000 randomly generated sample points. The predicted high suitable areas are mainly distributed in lowland tarai and less elevated hill regions across the country in both models. The highly suitable areas are continuously distributed in the western tarai and the lower hills of central Nepal while it is irregular in the east, mainly in the southern region of east tarai. Table 3. Estimated human population exposed at different levels of scrub typhus transmission suitability in Nepal.  (Table 3) while 29% and 21% are living in suitable and moderately suitable areas. Only 6% of the population are living in the areas of environmental unsuitability for scrub typhus transmission. The results show that about 43% of the population of Nepal are currently living in the highly suitable areas of scrub typhus transmission in Nepal (Table 3) while 29% and 21% are living in suitable and moderately suitable areas. Only 6% of the population are living in the areas of environmental unsuitability for scrub typhus transmission. Table 3. Estimated human population exposed at different levels of scrub typhus transmission suitability in Nepal.

Discussion
Scrub typhus has been a major public health problem in Nepal since a 2015 outbreak across the country. About 3000 cases with 26 deaths were recorded in the last four years since its first outbreak [7,26]. The number of scrub typhus reported districts have also increased from 16 to 60 in the same period, showing a rapid geographic expansion of scrub typhus in Nepal [25]. In the context of rising incidences and expanding geographic distribution, scrub typhus is expected to replace typhoid as a common cause of febrile illness in Nepal [44]. This study assessed the spatial distribution of environmentally suitable areas of scrub typhus in Nepal using the ecological niche modeling approach, machine learning techniques along with reported cases of scrub typhus and a set of environmental and geographic explanatory variables. Further, this study also estimated the population living under different levels of risk. Our results revealed that both modeling techniques, in general, could be useful in identifying environmental factors and quantifying the areas susceptible to scrub typhus outbreak. However, ensemble prediction is more comprehensive and reduces prediction uncertainties compared to the single algorithms [45]. The spatial distribution patterns of environmentally suitable areas of scrub typhus disease were found to be largely influenced by the interaction of several environmental factors, including proximity to cropland, proximity to urban land, slope, and elevation.
Our study shows that environmentally suitable areas of scrub typhus are widely distributed in Nepal mainly in the lowland Tarai and less elevated areas in mid-hills and mountains. These areas are major population concentration areas of Nepal (Figure 7). As a result, a significantly higher proportion of the population is under the risk of scrub typhus transmission despite a lower proportion of highly suitable areas compared to the total area of the country (Table 3). Previous studies also claimed these regions are high-risk areas of scrub typhus [7,46]. However, this is the first attempt that quantifies and maps its distribution in Nepal. Concurrent with these findings, about 81% of total reported cases were from lowland Tarai [7]. Similar to Nepal, subtropical southern districts of Bhutan have a higher risk of scrub typhus compared to the high mountain region in the north, although few cases have been recorded from throughout the country [47]. This indicates that lowland has a higher risk of scrub typhus; however, it might also occur in higher elevated mountains and hills.

Discussion
Scrub typhus has been a major public health problem in Nepal since a 2015 outbreak across the country. About 3000 cases with 26 deaths were recorded in the last four years since its first outbreak [7,26]. The number of scrub typhus reported districts have also increased from 16 to 60 in the same period, showing a rapid geographic expansion of scrub typhus in Nepal [25]. In the context of rising incidences and expanding geographic distribution, scrub typhus is expected to replace typhoid as a common cause of febrile illness in Nepal [44]. This study assessed the spatial distribution of environmentally suitable areas of scrub typhus in Nepal using the ecological niche modeling approach, machine learning techniques along with reported cases of scrub typhus and a set of environmental and geographic explanatory variables. Further, this study also estimated the population living under different levels of risk. Our results revealed that both modeling techniques, in general, could be useful in identifying environmental factors and quantifying the areas susceptible to scrub typhus outbreak. However, ensemble prediction is more comprehensive and reduces prediction uncertainties compared to the single algorithms [45]. The spatial distribution patterns of environmentally suitable areas of scrub typhus disease were found to be largely influenced by the interaction of several environmental factors, including proximity to cropland, proximity to urban land, slope, and elevation.
Our study shows that environmentally suitable areas of scrub typhus are widely distributed in Nepal mainly in the lowland Tarai and less elevated areas in mid-hills and mountains. These areas are major population concentration areas of Nepal (Figure 7). As a result, a significantly higher proportion of the population is under the risk of scrub typhus transmission despite a lower The spatial patterns of environmentally suitable areas are continuous in the west Tarai while it is irregular and patchy in the east. Unlike the west, our model predicts southern east Tarai as less suitable for scrub typhus. The possible reasons could be the lower elevation and gentler slope of this region. As a result, such areas remain wet and waterlogged for most part of the year, not favoring the growth and proliferation of the rodent population. The recent reports also showed a higher prevalence of scrub typhus in the west Tarai including Kailali and Kanchanpur districts and central hills, including Palpa, Syangja, and Tanahu districts [48].
Concurrent with previous studies, we found an elevated risk of scrub in the vicinity of cropland [14,25]. In line with our findings, the mosaics of cropland and vegetation are positively associated with the risk of scrub typhus in Taiwan [11]. The reason could be the availability of plenty of food essential for the proliferation of rodent hosts. A number of previous research have shown a positive association of scrub typhus with rodent density [10]. The reason could also be due to higher occupational exposure of farmers working in the cropland [49]. Previous studies also found that farmers work near the grassland scrubby vegetation and thus have higher chances of acquiring scrub typhus [25].
We found an inverse association between the probability of occurrence of scrub typhus and elevation in general; however, the association was positive up to around 200 m. Similar to our findings, the risk of scrub typhus incidence increases with an increase in elevation [11] in Taiwan. The possible reason of the inverse association might be the low temperatures in higher elevations. A decreasing trend of the proportion of cropland with an increasing altitudinal gradient in Nepal [50] might be another reason for the inverse association of elevation and probability of occurrence of scrub typhus.
Slope is another important environmental factor of scrub typhus. Overall, there is a negative association between slope and the probability of scrub typhus occurrence. This association can again be explained by a decreasing trend of cropland with increasing slope gradients in Nepal. However, the association is much smoother in MaxEnt compared to the RF and BRT model.
Although previous studies have suspected some association of scrub typhus with the proximity to earthquake epicenters [2,51] as the first worst outbreak occurred in Nepal immediately after the Gorkha earthquake of April 2015. The speculation was based on possible intimate contact between humans and rats that might have come out of their usual underground habitat after the earthquake. However, we did not find a significant role of proximity to earthquake epicenter with the occurrence of scrub typhus.
Scrub typhus is generally a rural disease [2], and rural settings provide conducive environments for growth and proliferation of host and pathogen. However, in recent years, many urban cases have been reported from different countries including South Korea, Taiwan, and China [30,44,52]. In Nepal, urban cases of scrub typhus are also increasingly being reported [2]. The U-shaped marginal response curve of proximity to urban areas in the MaxEnt model indicates an elevated risk near the urban areas and far from the urban areas. All these indicate that the disease is no longer a rural disease and outbreak may occur in both rural and urban settings. Urban scrub typhus may have a significant impact because of the large population in urban areas.
Based on previous research findings [14], we included three NDVI layers of -minium, mean, and maximum-as potential predictors of scrub typhus distribution in Nepal. However, we did not find a strong predictive role of these variables in the occurrence of scrub typhus. Similarly, unlike previous studies, the role of climate factors was less important [14]. The possible reason could be due to the absence of proximity of variables in previous research or different ecological settings.
This study has some strengths as well as limitations. This is the first spatially explicit scrub typhus research from Nepal, which has identified environmental risk factors and mapped the spatial distribution of disease transmission risk. The findings may help to close the knowledge gap on the spatial epidemiology of scrub typhus. The concerned health authority, including the Department of Epidemiology and Disease Control Division (EDCD), could use these findings to improve surveillance and control efforts targeting more locations that are predicted as the potential scrub typhus areas. However, due to the absence of complete disease data, we were unable to explore the data-driven exploratory analysis. Future studies are encouraged to focus on an exploratory analysis in one and the validation of filed data on the other.

Conclusions
We assessed various environmental risk factors responsible for the occurrence and spread of scrub typhus, and mapped disease transmission risk for the first time in Nepal. Our results revealed that the environmentally suitable areas of scrub typhus in Nepal are widely distributed throughout the country with higher risks at lowland Tarai and the less elevated hills and mountains. Proximity to cropland and urban areas were the most important risk factors, followed by slope and elevation. Despite several speculations on scrub typhus and its association with proximity to the April 2015 earthquake epicenter, we did not find the earthquake's important role in the distribution of scrub typhus risk in Nepal.