Ecological Niche Modelling Approaches: Challenges and Applications in Vector-Borne Diseases

Vector-borne diseases (VBDs) pose a major threat to human and animal health, with more than 80% of the global population being at risk of acquiring at least one major VBD. Being profoundly affected by the ongoing climate change and anthropogenic disturbances, modelling approaches become an essential tool to assess and compare multiple scenarios (past, present and future), and further the geographic risk of transmission of VBDs. Ecological niche modelling (ENM) is rapidly becoming the gold-standard method for this task. The purpose of this overview is to provide an insight of the use of ENM to assess the geographic risk of transmission of VBDs. We have summarised some fundamental concepts and common approaches to ENM of VBDS, and then focused with a critical view on a number of crucial issues which are often disregarded when modelling the niches of VBDs. Furthermore, we have briefly presented what we consider the most relevant uses of ENM when dealing with VBDs. Niche modelling of VBDs is far from being simple, and there is still a long way to improve. Therefore, this overview is expected to be a useful benchmark for niche modelling of VBDs in future research.


Introduction
Diseases transmitted by blood-sucking arthropods, considered as vector-borne diseases (VBDs), pose a major threat to human and animal health since the beginning of time [1]. This kind of diseases accounts for around the 17% of all communicable diseases, causing more than 700,000 human deaths every year [2]. For instance, in 2017 malaria affected an estimated 209 million people globally and caused near 620,000 deaths, while 40,500 deaths were attributed to dengue and an estimated 105 million people suffered the disease in over 129 countries [3,4]. The burden of VBDs is greater in tropical and subtropical areas, with disproportionally higher morbidity and mortality rates among poorer populations [2]. Concerning its economic impact, VBDs represent an immense toll in economies, restricting both rural and urban development [2], e.g., the estimated global cost for malaria was US$ 2.9 billion in 2015 [2], and US$ 8.9 billion for dengue in 2013 [5]. Furthermore, more than 80% of the global population is at risk of acquiring at least one major VBD [2].
The ongoing worldwide environmental changes resulting from human activities have a profound impact on the epidemiology of VBDs [6]. There is plenty of evidence indicating that anthropogenic changes have already altered the transmission of VBDs in a number of ways [7][8][9][10][11]. Besides human-caused disturbances of landscape and an environmental attributes of each pair of geographical coordinates in the physical world can be represented in an n-dimensional environmental space, delimiting its corresponding niche space and allowing reciprocal projections between geographic and environmental spaces [46], represented in Figure 1). Hence, the distribution of disease occurrences can be seen as the geographic projection of the ecological distribution of the pathogen as constrained by the ecological and geographic potential of each of its interacting species: reservoirs, vectors, etc. [45]. To date, ENM techniques have been used numerous times to model the distribution of vector-borne pathogens or their vector species, i.e., a non-exhaustive search of the combined terms "niche model AND vector" and "species distribution model AND vector" (although not equivalent, the terms niche model and species distribution model are frequently used as synonyms- [47]) in the Scopus and Web of Science databases yielded 187 and 226, and 106 and 90 published articles, respectively, in the last two decades.  24 February 2023, and https://doi.org/10.5061/dryad.2h3f2, accessed on 24 February 2023). Worldwide (a) and detailed (b) geographical distribution of the H. marginatum tick (red dots); (c) distribution in the environmental space (as defined by mean annual temperature and annual precipitation) of the environmental combinations available worldwide (grey dots, 100,000 random locations chosen worldwide), and the environmental space occupied by the H. marginatum tick (red dots, 475 non-duplicated georeferenced records).
The purpose of this overview is to provide an insight of the use of ENM to assess the geographic risk of transmission of VBDs, by focusing on a number of crucial issues which are often disregarded when dealing with ENM of VBDs. As proof of concept, we use studies from the three-year period from 2021 to 2023 as listed in Scopus, and, when necessary, relevant literature from other time periods. The reader should also consider the vast literature available on this subject (i.e., [39,[49][50][51][52][53][54]).

Fundamentals of ENM for VBDs
Ecological niche modelling (ENM) provides a basis for understanding the complexities of distributions of species in both geographic space and ecological dimensions [55]. In ecological niche theory, the fundamental niche (N F ) represents the set of abiotic environmental conditions necessary for the long-term persistence of the species population [24]. However, all the environmental conditions in N F may not be entirely available for the species, as it might be constrained by dispersal limitations, biotic interactions, and the evolutionary capacity of the species to adapt to new conditions. This heuristic configuration, named as "BAM" sensu Soberon and Peterson [56], centres on requirements of particular abiotic conditions ("A" for abiotic conditions) and how they relate to biotic interactions ("B" for biotic conditions) [56,57]. Finally, the species may be limited from occupying the entirety of its distributional potential by accessibility considerations ("M" for mobility) [56,57]. As such, the conjunction A∩B can be seen as the potential distribution of the species, while A∩B∩M would be a hypothesis of the actual distribution of the species and the current conditions present across it. This portion of the N F that is actually occupied by the species is referred as the realized niche, N R [24].
Since VBDs (and disease transmission systems in general) are largely the product of interactions among species (pathogens, vectors and hosts), their ecological and distributional dynamics differ from those of more "normal" species. A quite novel framework to ENM of disease systems proposes to treat them from the perspective of disease biogeography (further details in [45]). This framework not only emphasizes parasite-host interactions ("B") as essential to the process, but requires considering a crucial and formal hypothesis about the area that have been accessible to the species via dispersal over relevant periods of time ("M") [45,58,59]. A number of issues differentiates the transmission of VBDs from this general framework proposal for disease systems: increased complexity in the web of interactions among species (pathogen-vector-host), at least one obligate interaction (pathogen-vector), in many cases multiplicity of competent vectors and susceptible hosts, and abiotic conditions of great importance in shaping the distribution of climate-sensitive vectors.

Common Approaches to ENM of VBDs
Ecological niche modelling (ENM) is categorized depending on how explicitly biological processes (e.g., metabolism) are incorporated [60], and can take two main forms: process-based modelling and empirical reconstructions [61]. Process-based models (also known as mechanistic or biological) are based on detailed physiological information on the species that make up the system, thus estimating how habitat suitability for the species changes with the environment [40,61]. As aforementioned, parameterizing process-based models requires full knowledge of the factors influencing the distributions of the species involved [40,61]. Despite its inherent complexity, niche models based on ecological processes do not fully capture dispersal and biotic interactions, thus describing something closer to the fundamental niche than the realized niche (actual transmission) [40,62].
In contrast, empirical reconstructions (also known as correlative, statistical, or pattern matching models [40]) are based on associations between known geographic occurrences of species and the ecological characteristics of the landscapes in which they occur [61]. They differ from process-based models in that they do not assume known functional relationships between vital rates and environmental variables, allowing a wider range of environmental variables [40]. A major weakness of this approach is that it can be biased by sampling and the influence of other species not included in the study [61]. Since the appearance of "Maxent" [63], a popular modelling algorithm based on a maximum entropy algorithm, correlative models have gained terrain and are by far the most widely used [64].
Correlative methods can be classified into three categories depending on the type of species' occurrence data used: presence-absence, presence-background (or profile), and presence-only methods (reviewed in [24,60]). Briefly, presence-absence models compare environmental conditions where the organism occurs (i.e., presence) vs. where it did not occur upon observation (i.e., absence), providing the probability of finding the species at each place [65]. Since absence data are largely questionable in quality and of limited availability ( [24]; and discussed in [55]), a common approach is to simulate these data by generating random points across the study area and to use it in presence-background models. This kind of models compare the available environmental conditions in the study area (i.e., background) with the conditions used by the species (i.e., occurrences), providing an index of habitat suitability [60]. At last, presence-only models rely solely on the environmental values inferred from the species occurrence in the study area. Modelling algorithms for each category have been listed and reviewed elsewhere [24,60].
Furthermore, the assessment of the geographic risk of VBDs is generally approached from two alternative methodologies, either modelling components (component-based approach) or outcomes (black-box approach) [66]:

•
The black-box approach integrates across the entire transmission system, assuming that the final distribution of disease cases summarizes all biotic interactions involved in the transmission [44,61]. This approach is useful when transmission dynamics are poorly understood and data are limited [44,66], as happens for many VBDs (e.g., [67][68][69]). However, the black-box oversimplification in such intricate transmission systems may be perilous, as it neglects the ecological requirements of the individual component species [55,66].

•
The component-based approach parses the overall transmission cycle into the ecological niches of the individual component species (e.g., pathogens, vectors, hosts) [55]. This approach may allow to distinguish different reasons for presence or absence of disease transmission in an area (i.e., presence/absence of a competent vector and/or susceptible host), but requires potentially lacking in-depth knowledge of the disease system (e.g., the identity and ecologies of relevant species, transmission cycle) [55]. The latter is particularly true for diseases in which the competent vectors might be multiple or poorly known (for instance, see Celone et al. [70], for an approach with the mosquito Haemagogus janthinomys, vector of the Mayaro virus).
Choosing between these two approaches for disease distribution modelling should be carried out in accordance with the research question, data availability and implicit assumptions [44,66].

Accurate Identification of Occurrence Records
Much has been written regarding data sources (occurrence records and environmental data), its quality and concerns. It is not our purpose to make a review on this matter, so only a few relevant concepts will be highlighted (you can find further information in [24,55,60,66,71]). Ecological niche modelling relies on occurrence data for model calibration, which must be revised and curated thoroughly to only include valid occurrence records. Indeed, the problematic of niche modelling relying on biodiversity records with poor taxonomy and/or misidentification was already highlighted more than a decade ago [72]. The advised first step should be to thoroughly check the correct taxonomic identification of the species' records to include in the study. With exceptions, this step might represent a minor issue with regard to vertebrate hosts. However, when dealing with invertebrate vectors and pathogens, things get trickier. In these cases, specific taxonomic identification is often immersed in controversy. For instance, certain mosquito species of sanitary importance are frequently grouped in complexes of closely related taxa, as their morphology is barely indistinguishable, as the Culex pipiens complex [73] or the Anopheles gambiae complex [74]. Cryptic species usually have different ecology and host preferences [75], posing an undeniable impact on disease epidemiology and hence modelling procedures. Similarly, the classification of the triatomine kissing bugs transmitting Chagas disease is subdued to constant changes [76,77], which sometimes are related with ecological characteristics (i.e., [78]), as well as occurs with the Lutzomyia sand flies transmitting leishmaniasis [79]. Thus, whenever possible, the databases to be used must be filtered and reinforced with a molecular identification (i.e., [69,[80][81][82]). Despite the aforementioned, the issue is usually neglected, and most of the studies rely solely on morphological identification (e.g., [83][84][85][86][87][88][89][90][91][92][93][94][95][96], but see [97] for a mixed approach), or do not even describe the identification method [70,82,[98][99][100][101][102][103][104][105][106][107][108].
In the case of diseases, occurrence data are represented as disease cases, or serology or direct detection of pathogens or parasites [24,71]. These disease data should have a trustworthy metadata, including traceable diagnostic methods, data sources, transparent surveillance protocols, temporal details and quantified uncertainty (e.g., spatially error, sensitivity of the diagnostic method) [24]. Moreover, critical care must be taken to include only those cases that were infected by ways that are relevant to the models being developed [66], i.e., only those cases with vectorial transmission should be included if the models are to be meaningful estimates of the ecological niche of the disease in question.

Global versus Local Occurrence Data
A seldom treated, but crucial issue in ENM of VBDs, concerns the geographical extent of the occurrence data. Despite being acknowledged long ago in ecology [109,110]), fitting ENM with focus only on a part of the entire distribution of a species continues being a common approach [111]. As far as we know, this is what occurs with a large amount of the studies dealing with VBDs. As these kinds of studies are usually performed with public health purposes at regional or local scales, they often rely on regional/national datasets collected by administrations or NGOs whose range of action is defined by restricted geographical or political borders (i.e., country or even provinces' administrations) (e.g., [67][68][69]80,[83][84][85][86][87][88][89][90][91][92][93][94][95][96]98,99,101,[104][105][106][107][108][112][113][114][115][116][117]). The concern resides in the fact that ENM built with occurrence data restricted to artificial boundaries might consider only a subset of the environmental conditions experienced by a species across its entire range (i.e., "spatial niche truncation" [118]); therefore, providing an incomplete description of the environmental limits [109] and underestimating the environmental conditions that the species can withstand [111] (see an example in Figure 2). Spatial niche truncation is particularly problematic when the aim is to project the prediction to other areas or time-periods not found in the calibration area (that is to say, to extrapolate) [118]. Conversely, it has been argued that partial (truncated) datasets may be better to identify other distribution constraints, which may differ in different parts of the distribution range [60]. Such is the argument alleged by Tjaden et al. [119] when proposing a "non-tropical" ENM to assess the transmission of the chikungunya virus outside the tropics. According to these authors, the ENM restricted to non-tropical regions predicts chikungunya transmission in Europe with greater accuracy than the previous global ENMs that were based on predominantly tropical occurrence locations (e.g., [119,120]). In any case, if not appropriately justified (e.g., [119,121]), the prudent approach is to use global over local data in spite of the area of interest (for example, [70,81,82,97,100,102,103,122]; but see [123,124] for a dual approach). Otherwise, models built upon partial datasets risk providing a biased description of the species' niche [109,125], and thus failing to forecast the complete species range [126]. Spatial niche truncation is particularly problematic when the aim is to project the prediction to other areas or time-periods not found in the calibration area (that is to say, to extrapolate) [118]. Conversely, it has been argued that partial (truncated) datasets may be better to identify other distribution constraints, which may differ in different parts of the distribution range [60]. Such is the argument alleged by Tjaden et al. [119] when proposing a "non-tropical" ENM to assess the transmission of the chikungunya virus outside Figure 2. Representation of the environmental spaces available in (a) continental Spain (yellow dots), and in (b) the Canary Islands (light green dots) as defined by mean annual temperature and annual precipitation. The blue dots represent the environmental space occupied by the Hyalomma marginatum tick as restricted to the boundaries of continental Spain (77 non-duplicated occurrence records), whereas the red dots represent the environmental space occupied by its global occurrences (475 non-duplicated occurrence records). (a) A niche model of the H. marginatum tick, based solely on the occurrences in continental Spain (blue dots), will truncate the lower limit of the species' thermal tolerance. Most of these boundary-restricted occurrences (blue dots) have been recorded in areas with mean annual temperatures superior to 13 • C (blue dashed line), while the species withstands mean annual temperatures as lower as 6 • C (red dots, red dashed line). Thus, a projected distribution based on this truncated niche model will underestimate those environments available in Spain with mean annual temperatures between 6 • C and 13 • C (between red and blue dashed lines). (b) A similar situation will occur if this same truncated niche model should be transferred to the Canary Islands to identify areas suitable for the establishment of H. marginatum populations. Such a model will fail to predict those areas in Canary Islands presenting mean annual temperatures between 6 • C and 13 • C (between red and blue dashed lines).

Importance of Defining M for Background Selection
As aforementioned, presence-background models (with Maxent the most popular) contrast the environmental characteristics of sites of known occurrence against those associated with background locations where presence/absence is unmeasured [127]. Being this essential to predict the probability of presence, it is crucial to outline the calibration area as the area that has been accessible to the species (M) [58,128], as the species will be absent from outside of this area for reasons unrelated to A [129] (such as geographical barriers [130] or anthropogenic range contractions [131]). Defining M has strong implications in several aspects of ENM, for instance: (i) Being M the geographic extension across which the contrasts should be developed [132], it determines the area within which presences may exist and within which absences are meaningful, in that they represent sites with the broader background landscape actually likely to have been "tested" by the species for suitability, but not occupied [58]. Thus, to minimize the impact of assumptions about absences from areas that are not accessible to the species, the background sample should be chosen to reflect the environmental conditions that one is interested in contrasting against presences [128]. (ii) M has effects on model validation, as areas outside M (where the species cannot occur, owing to restrictions resulting from M but not A) will generally be predicted at lower suitability levels. In consequence, inclusion of these areas (which hold no presence data, but often includes absences that are more distant environmentally from the presences) makes the model look better than it actually is [58].
A conscientious estimation of M should be carried out prior to initiation of analyses, and for this a number of approaches have been proposed: • Geopolitical boundaries: a frequently applied approach is to use administrative boundaries (e.g., municipality, department, province and country) to delimit the calibration area [24,58]. However, this pragmatical use of geopolitical boundaries without an explicit, a priori hypothesis regarding the extent of M presents a major weakness: restricting models based on administrative areas does not account for the biology of the organism [24]. Indeed, most of the cases that applied this approach lacked an explicit assumption about M [67,68,80,[83][84][85][86]88,89,[91][92][93][94][95]98,99,101,104,[106][107][108][109][110]112,116]. Since organisms do not know about geopolitical borders, the perils of this common failure will result in models that are misaligned with the ecology of the organism, resulting in underestimations of the true potential of the disease spread [24] (somehow related to the spatial niche truncation mentioned before).

•
Occurrence buffering: an often-used approach consists of determining an optimal buffer area around the occurrences. The radius buffer can be selected in consideration of the dispersal potential of the species [69,81,87,97,101,105,114,117], or after assessing the performance of a series of test models based on buffer with increasing radii (e.g., [100,119]). • Biogeographical units: these geographical areas are categorized in terms of their biotas [133,134], being defined based on distinct sets of endemic taxa and communities [134]. These areas are usually limited by geographical barriers, altitudinal ranges or a vegetation type [134]. Therefore, the boundaries of the biotic regions within which a species is known to occur may be informative about the barriers that have constrained its distributional potential, resulting in a reasonable hypothesis of the areas that have been available for the species over relevant time periods [58]. To account for potential dispersal beyond these boundaries, an additional buffer can be created around each biogeographical unit occupied [135], or just around the occurrence points within certain distance from the borders of the established biogeographic area [136]. This approach is quite simple, and may prove the most operational [137]. In fact, it has been applied in a number of opportunities to define the accessible area of mosquitoes, kissing bugs and others (i.e., [87,102,117,124,136]).
• Niche-model reconstructions: a simple present-day niche model could be used to estimate the basic dimensions of a species' distributional potential, and then could be back-projected over the historical conditions that the species has experienced (e.g., Pleistocene Last Glacial Maximum, Last Interglacial) [58,66]. These estimates of past distributional areas can then be combined in a proxy of the long-term dispersal potential, providing a broad initial hypothesis of areas that have been accessible to the species (M) to be used in a second round of model calibration [58,66]. Despite this approach risks some circularity in its implementation, is operational and could be implemented readily [58]. Furthermore, this risk of circularity, owed to the lack of an explicit hypothesis of M in the initial round of modelling [58], could be accounted for by combining the two approaches already described: the biogeographical approach could be used to define M in the initial calibration round, to be later back-projected and used to estimate M in the second round of modelling. • Full dynamic dispersal models: a more realistic approach should join estimates of the niche with scenarios of dispersal potential through periods of environmental change, considering explicitly the spatially path-dependent nature of effects of environmental change on species' dispersal reach and consequent distributional potential [58]. Despite its computational challenges, a first akin simulation of this general framework was outlined by Barve et al. [58], and later extended by Machado-Stredel et al. [132]. This intricate approach of defining M is based on a cellular automata simulation where dispersers colonize (or died out) in a cell grid that has suitable conditions drawn from a preliminary estimate of the fundamental ecological niche. These processes are replicated several times to generate an account of accessed cells, to be later summarized in an estimate of M that houses the most frequently accessed cells. This simulation-based method presents a quantitative approach for estimating the accessible area of a species under biologically realistic assumptions, offering the possibility of incorporating relevant climate changes into this estimation of environmental suitability across space and time [132]. As far as we know, this approach has not been applied yet in the modelling of VBDs.

Sampling Bias and How to Deal with
Sampling bias is pervasive in most datasets of occurrence records, and represents a major concern when developing ENM (detailly explained in [55,128]). Briefly, a disproportionate geographical or environmental sampling will likely hamper the predictive ability of the niche model based on that sampling [55,138]. Inextricably linked to collection methods, the spatial pattern of occurrences is mainly driven by accessibility [139]. Indeed, the majority of records for all terrestrial groups fell within 2.5 km of a road [139]. The sampling bias not only affects the spatial distribution of occurrences from vertebrate hosts, but also from invertebrate vectors (e.g., insects [140]). This is even more relevant when considering the occurrence of pathogens, often recorded as disease reports. For instance, disease data are generally aggregated at administrative levels (e.g., province, country), where the geographic site of infection is usually not reported and is instead referred to the health facility where it was diagnosed [44]. At last, if the bias is not accounted for, the fitted model might be closer to a model of sampling effort than to a model of the true niche of the species [141].
The most straightforward approach would be to manipulate the occurrence data [141]. Perhaps not the ideal, because it disregards the distribution of the sampling effort, thinning of occurrence records either in environmental space or geographic space is a viable solution [143]. Geographic and environmental thinning are conceptually equivalent, as they use a distance measure to determine a filter size [144]. Geographical thinning involves two methods [143]: i) overlaying an equal-area grid on the study region and randomly sampling a set number of occurrence records (e.g., one) from each grid cell (Hijmans and Elith, in [143]); or ii) randomly removing occurrence records so that no two are closer than a given linear distance [145,146] (you can find an easy-to-implement method in [143]).
On the other hand, spatial bias usually leads to environmental bias because of the over-representation of certain environmental features of the more accessible and extensively surveyed areas [145]. Moreover, geographical thinning might enhance a disproportionate sampling in the environmental space [147,148]. Therefore, geographical thinning should be used with caution, and the use of an environmental thinning technique is highly recommended [144,147]. For this, Varela et al. [147] proposed to place a regular grid with a specified cell size to a multivariate environmental space, so as to randomly choose one occurrence from each grid cell and remove the rest from further analysis (the method is fully described in [147] and available in https://github.com/SaraVarela/envSample, accessed on 27 February 2023). For an application with VBDs see [90,102].
Like all thinning approaches (both geographical and environmental), the optimal degree of thinning remains subject to empirical determination [143]. However, the main drawback of these manipulations is the resulting discard of a number of occurrence records, which in many cases might be unaffordable [141]. An alternative approach, which avoids the loss of occurrence records, is to manipulate the selection of the background data so that these share the same bias as the occurrence records [149]. This approach stands on the assumption that a similarly biased background nullifies the bias in the occurrence records [141]. One largely used method consists of selecting the background data based on a target group [141], which restricts the background to locations with recorded presence of a particular group (usually a higher-rank taxon), including the modelled species. This method assumes that the presence records of the target group reflect the sampling probability distribution that led to the presence records of the modelled species [146] (e.g., [82]). Alternatively, a recently proposed background thickening method relies on increasing the density of uninformed background locations around presences [150]. Briefly, this method samples background locations from an estimate of sampling probability based exclusively on density of occurrences [150], emphasizing the comparison between a presence and its surroundings.
In all, it has been suggested that accounting for sampling bias in environmental space is more efficient than doing so in geographical space [151]. However, if many occurrence records are available, geographical filtering and balancing of occurrence data should be preferred relative to background manipulation [145]. Indeed, this method seems to be by far the most frequently used (e.g., [69,70,81,83,87,92,93,97,100,103,[105][106][107]112,116,117,119]). However, if only few spatially clustered occurrence records are available, the manipulation of the background dataset seems to be the second-best option [145].

Model Complexity and Fine Tuning of the Model Calibration
Ideal niche models oscillate in a delicate balance between the precision that is possible with highly complex models and the general predictive power that can derive only from sufficiently simple models [152]. Often, model complexity has been considered as unimportant [153]. However, models that are inappropriately complex or inappropriately simple show reduced ability to infer habitat quality, reduced ability to infer the relative importance of variables in constraining species' distributions, and reduced transferability to other time periods or geographical regions [153]. Indeed, many models have been developed that are overly complex and that lack predictive power [154]. Despite the choice of algorithms or estimation frameworks, model calibration is a key process in which variable selection and model settings are fine-tuned to obtain the best model performance [64,155]. Unfortunately, in spite of the recommendations for maximizing reproducibility of ENMs [64,156], the information concerning model calibration is lacking in a high number of studies dealing with VBDs [80,[82][83][84][85][86]88,89,91,92,96,98,107,112,113,124].
Given the resource-demanding and time-consuming process of model calibration, many empirical studies rely on default settings of a given algorithm/software package [64,157] (e.g., [67,68,70,93,94,101,116]). This click-and-run modelling framework is based on the use of "recipes" to model the distribution of any species, neglecting the biology of the organism in question [24], and thus a common practice ought to be avoided. Indeed, the species-specific tuning of model parameters affects model complexity [158] and have been shown to improve the performance of ENM [159]. Hence, in recent years, a number of initiatives have dealt with the automatization of the calibration process in order to increase robustness, performance and reliability of ENMs. Most of these initiatives are meant for Maxent, the most popular software for ENM, in the form of open-source R packages (e.g., ENMeval [155] and its updated version ENMeval 2.0 [160], wallace and its updated version wallace 2 [161,162], kuenm [163]). A few additional R packages allow for the fine-tuning of hyperparameters from other algorithms, such as SDMtune (artificial neural networks, boosted regression trees, maximum entropy and random forest) [158]. Therefore, testing of optimal parameters settings and model selection approaches have now become the rule in the field [155]. Some examples of its application in VBDs are found in [69,81,87,90,95,97,99,100,[102][103][104][105][106]108,114,117,119].
In order to optimize model complexity and performance, not only the optimal parameters settings should be determined, but also the predictors used to build the model should be carefully selected [158]. The number of predictors included in the model determine the specific dimensions of the environmental space to be modelled [55]. A highly dimensional environmental space will yield overly complex, and most probably overfitted, models [55,155]. In addition, highly collinear variables allow alternative model structures to yield very similar model fits [156]. Dealing with multicollinearity could be achieved with a number of strategies (revised in [155]), which will reduce the dimensionality of the environmental space and lead to a reasonable set of predictors [55]. However, a recent analysis advocates for exhaustive searches inspecting all possible combinations of environmental variables, in combination with the different value sets for other parameters of interest [155]. This proposed approach is somehow related to the information-theoretic approach of model selection and multimodel inference from traditional multivariate statistics [152]. Despite its relevance, as far as we know, the selection of environmental predictors is only supported by the R packages kuenm [163] and SDMtune [158].

Unveiling the Uncertainty Inherent to ENM
Model uncertainty is a consistent, but largely unassessed, concern in ENM [152]. Although it is mainly due to the many assumptions underlying the modelling approach [164], this is not the only cause. There are also other sources, such as the uncertainty inherent to the incoming data and their scale, the modelling algorithm and its parameterization, and the chosen climate models when dealing with future projections (general circulation models and representative concentration pathways) [42,152,164,165]. Furthermore, the aforementioned approaches to deal with model complexity and calibration allow to use multiple sets of parameters [163], which might result in more than one parametrization giving an adequate model. This use of multiple parameters settings and replicates gives rise to additional sources of variation in model outcomes [152]. Hence, measurement of certainty or uncertainty are crucial when performing rigorous model calibration processes [166].
An additional source of uncertainty that should be accounted for concerns areas of strict extrapolation. This means detecting areas with novel combinations of environments outside those in M, which are areas with conditions extending beyond those over which the model was calibrated [59]. Interpretation of predictions in transfer areas of strict extrapolation may lead to erroneous conclusions [59]. When the physiological response of the species in those areas is unknown, caution is advised about assuming that a species could potentially establish populations or not in such environmental conditions [166]. The environmental similarity between calibration and transfer regions should always be quantified prior to model transfer, so that the interpretation of final models can incorporate the levels of uncertainty and extrapolation caused by model transferability [167,168]. For this, we acknowledge at least two different metrics which identify nonanalog conditions between calibration and transfer regions highlighting regions in geographic space where strict extrapolation occurs: (i) the Multivariate Environmental Similarity Surface (MESS) readily incorporated in the Maxent software [169]. (ii) the Mobility-Oriented Parity (MOP), available in the kuenm R package [163], which modifies and extends the MESS approach [59].
The Mobility-Oriented Parity outstands as it reflects more closely the reality of environmental difference from the set of conditions represented across M and experienced by the species [59], offering more robust measures of extrapolative conditions [163]. This analysis of nonanalog conditions via extrapolation metrics is beginning to be regularly applied in the ENMs of VBDs (MESS [102,119,170]; MOP [69,81,87,97,103,117]).
Identifying areas with high associated uncertainty in model projections is crucial to giving a clear picture and avoiding misinterpretations of the potential distributions of disease vectors [55,166]. Even more when the primary objective is model transfer [171]. Evincing the uncertainty from various sources and manifested on multiple levels entails certain difficulty [166], but recent advances in hierarchical modelling could offer a solution [171]. Hierarchical partitioning analysis of the uncertainty, available via the kuenm R package [163], let error estimates to propagate through various submodels within one 'integrated statistical pipeline' [171], allowing to assess magnitudes, sources and patterns of variation from the different sources [166]. To sum up, model projections must be accompanied by a spatially referenced analysis of the uncertainties [164], such as the reader is given a fair idea of the confidence in predictions for any particular species or place [42]. Even though the predictive capacity of ENMs is only applied along with full appreciation of the inherent uncertainty [165], many studies still omit including their uncertainty results (for example, [80,82,[84][85][86][88][89][90][91][92][93][94][95][96]98,100,101,[104][105][106][107][108]113,116,124,141,172]; but this seems to be changing (e.g., [67,69,70,81,83,97,99,102,103,114,117]).

Main Applications
Ecological niche models are most often used in one of four ways [153]: (i) to estimate the relative suitability of habitat known to be occupied by the species, (ii) to estimate the relative suitability of habitat in geographic areas not known to be occupied by the species, (iii) to estimate changes in the suitability of habitat over time given a specific scenario for environmental change and (iv) as estimates of the species niche. In the following paragraphs, we briefly present what we consider the most relevant uses of ENM when dealing with VBDs.

Disease Distribution and Risk Mapping
If a comprehensive understanding of the natural history of the disease (pathogen/vector) system is available, a logical use of ENM is for the identification and characterization of distributional areas meeting the ecological requirements for disease transmission [173]. This approach not only aids to predict disease distribution, but also allows to characterise the ecological requirements of the species involved, which may be unknown or poorly described [61]. Depending on the approach (the previously discussed black box vs. component-based approaches), the modelling process may only include reports of human or animal disease to summarise the entire disease system, or may incorporate occurrence records of vector and hosts to identify the overlapped area among system components.
Since the access to trustworthy disease occurrence data are still restricted, the distribution modelling of vector-borne pathogens is limited in either of the approaches (black-box approach [67][68][69]98,112,119,172]; component-based approach [81,85,87,101]). As a matter of fact, the most frequent is the modelling of the various vector species involved in the transmission of VBDs (e.g., [70,83,84,86,[88][89][90][91][92]96,99,[102][103][104][105][106][107]114,115]). Even though still not perfect, the distribution modelling of either of the components of a VBD improves the identification of potentially risky areas suitable for transmission, allowing for disease control and planning in the form of informed interventions and appliance of control/palliative measures.

Filling Gaps in Transmission Cycles
As mentioned before, disease transmission can be conceptualized as the integration of the various ecological niches and distributions of each of the species participating in the transmission cycle [45]. Therefore, the characterization of distributional areas via ENM can be applied to identify unknown elements of the transmission cycle, such as vectors or hosts [173]. The geographical distribution of disease occurrences can provide an indication of which suspected taxa are potentially involved and which are not [173], i.e., when one of those species is unknown, ENM can be used in an exploratory sense to narrow possibilities and hypothesize which species may be involved [55]. An exemplary application of this approach can be found in Lippi et al. [116], where the authors analysed the contribution of the tick Dermacentor variabilis in the transmission of a pathogenic Rickettsia species from the spotted fever group (i.e., R. montanensis). Another good example is the one provided by Cuervo et al. [81], who analysed the geographical distribution of the West Nile virus in Spain, and could attribute the higher risk of transmission in southern Spain to the presence and abundance of the mosquito species Culex perexiguus.

Assessing the (Potential) Distribution of Invasive Species
An invasive species is defined as a species able to colonize geographical areas beyond their native distributional range, establishing viable populations [174]. During the last century, this extra-range dispersal of species has dramatically increased due to humanmediated processes (i.e., globalisation) [175]. The transmission of VBDs is not excluded from this phenomenon, as disease vectors and pathogens are spreading across continents due to human transport, land-use change and climate change [176]. The use of ENM can help to assess and foresee the geographical extent of these (potential) invasions by disease vectors and vector-borne pathogens [173]. Appropriately and correctly calibrated ENM can be transfer beyond the boundaries of the species' native range, seeking for environmental conditions that suite the ecological requirements for a species, even if the species is not present there [173,177]. However, when areas accessible to the species across its native distribution do not represent the full spectrum of environmental conditions that the species can tolerate (i.e., a truncated niche), the potentially invaded area is often underestimated [177,178]. To overcome this issue, the use of integrated data on closely related species (at a supraspecific level) allows a more complete characterization of fundamental niches, improving the predictive ability of potentially invaded areas [178]. Not surprisingly, the majority of studies dealing with invasive disease vectors refer to mosquito species, mainly of the genera Aedes and Anopheles. For instance, Adeleke et al. [100] found that wind speed limits the invaded area by Aedes albopictus in Europe, whereas Echeverry-Cardenas et al. [124] and Moo-Llanes et al. [117] analysed the potential geographic distribution of this same species in Colombia and Mexico, respectively. Finally, Valderrama et al. [108] assessed the potential distribution of Anopheles pseudopunctipennis in northern Chile.

Keeping up with Climate and Global Change: Changes in Time
The broad, correlative power of ENM provides a potent tool to address the question of likely geographic shifts in distributional areas of species or phenomena under scenarios of climate change [66,173]. To this aim, ENM appears more suitable than mechanistic models for the vast majority of disease systems, as they do not depend on a transmission model and the assumptions that are required [41,66]. Despite its extended use, many cautions and caveats are necessary when applying ENM for forecasting the future distribution of species as climate change proceeds. This subject is reviewed in detail by Peterson et al. [152], but three aspects that ought to be considered are: (i) Effects of niche truncation on model transfers to future climate conditions; (ii) Effects of model selection procedures on future-climate transfers of ecological niche models; (iii) Overall variance (uncertainty) in model outcomes.
In this sense, we have analysed 12 studies published in the last three years [68,87,94,95,101,105,109,112,115,124,142,170]. From these, only two studies considered the global occurrences of the modelled species [124,142]. Meanwhile, the rest restricted their data to local occurrences, mostly at the national level [68,87,94,95,101,105,109,112,115,170], suggesting that a niche truncation effect might have hampered their future projections. Furthermore, only five of them presented some kind of extrapolation assessment [69,87,112,115,170], but no overall variance of the model outcomes.
A further aspect to consider is the number of climate scenarios to be used in the niche model projections. It has been suggested that to represent a decent amount of uncertainty in climate model projections, a minimum of five non-related models ought to be selected [179]. Future climate projections represent, at least to some degree, a sample of uncertainty of future climate evolution [179]. Therefore, the use of multiple climate scenarios in the niche modelling procedures allows to capture its inherent variance. However, out of the studies analysed, only three of them considered more than one climate model [95,109,115]. Yet, all of them failed to present any uncertainty assessment of their future climate projections, meaning that the confidence of their predictions might be compromised.

Keeping up with Climate and Global Change: Changes in Space
Climate and global change should not be exclusively seen as a driver of geographical shifts in the future, since many native species are already shifting their ranges to keep up with climate change [180]. Invasive species are excluded from this consideration, as in general they have not reached a distributional equilibrium in the invaded range by fulfilling a niche similar to the original [181]. Thus, an additional application of ENM is to assess the availability of suitable habitats which might explain and forecast the (potential) distributional shifts of hosts and vector species. Distributional shifts might involve either range expansions or contractions. Range expansions are driven by the dispersal of individuals away from a population's core and can be particularly important following a shifting climate envelope, where edge patches may become thermally suitable at different times [182].
Conversely, during a contraction, a species' range gets fragmented due to climatic heterogeneities, which usually leads to different subpopulations residing in small refuge areas [183]. In this sense, ectothermic and thermophilic species are assumed to benefit from increasing temperatures, giving place to a range expansion at higher latitudes and altitudes [184]. Indeed, a northward expansion is projected in a number of tick species [93,170,184,185] (reviewed in [186]). However, the opposite is also feasible (range contraction), as the upper limits of a species thermal tolerance might be reached due to the continued warming, with negative consequences for thermophilic species (as an example, [187][188][189]). Indeed, a range contraction is expected for Lutzomyia intermedia (a tropical sand fly vectorizing Leishmania braziliensis), while other subtropical Lutzomyia species (L. neivai) is predicted to shift its range southwards in southern South America [189].
In summary, climate change will not always lead to range expansion of VBDs [189], and ENMs should be considered thoroughly to foresee the forthcoming changes.

Aiding to Disentangle Species Complexes
Due to the aforementioned complexities of taxonomic identification among numerous taxa of vectors, ENM may be useful for delimiting species and populations [55]. Since morphological differentiation is unreliable in these taxa complexes, identification relies on phenotypical or molecular characters. However, ecological characters can offer additional, complementary information [55,189] (reviewed in [190]), giving the opportunity to identify populations that are differentiated in ecological features [191]. For instance, such an approach has been used to analyse the climatic niches of closely related taxa of ticks [97], kissing bugs [82] and mosquitoes [192], providing an insight on the evolutionary patterns of these problematic taxa.
Furthermore, this approach aids to reach a more complete understanding of vectors' distribution based on the, often, limited available data [193]. For instance, ENMs provided clues on the ecological preferences of the species involved in the Anopheles maculipennis complex [80]. However, if during the diversification of closely related taxa, niche divergence is absent, ENM should rely on all available occurrences unified at the suprataxa level [193]. Such approach was the one followed to model the potential distribution of Culex pipiensrestuans complex in Canada [90].

Concluding Remarks
It is clear that niche modelling of VBDs is far from being simple, and that there is still a long way to improve. However, and despite the difficulties, ENM provides the best available tool for the assessment of diseases geographical distribution and disease risk mapping, as it has been already argued [44]. Indeed, this is what models are: educated guesses about the future [164]. By knowing the underlying assumptions, improving the calibration process, evaluating their potential effects and incorporating the inherent uncertainty in their interpretation, we can narrow the guesses to define the more probable futures. This overview is expected to be a useful benchmark for niche modelling of VBDs in future research.

Data Availability Statement:
The datasets generated for this study are available on request to the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.