Satellite Earth Observation Data in Epidemiological Modeling of Malaria, Dengue and West Nile Virus: A Scoping Review

: Earth Observation (EO) data can be leveraged to estimate environmental variables that inﬂuence the transmission cycle of the pathogens that lead to mosquito-borne diseases (MBDs). The aim of this scoping review is to examine the state-of-the-art and identify knowledge gaps on the latest methods that used satellite EO data in their epidemiological models focusing on malaria, dengue and West Nile Virus (WNV). In total, 43 scientiﬁc papers met the inclusion criteria and were considered in this review. Researchers have examined a wide variety of methodologies ranging from statistical to machine learning algorithms. A number of studies used models and EO data that seemed promising and claimed to be easily replicated in different geographic contexts, enabling the realization of systems on regional and national scales. The need has emerged to leverage furthermore new powerful modeling approaches, like artiﬁcial intelligence and ensemble modeling and explore new and enhanced EO sensors towards the analysis of big satellite data, in order to develop accurate epidemiological models and contribute to the reduction of the burden of MBDs.


Introduction
Mosquito-Borne Diseases (MBDs) infect almost 700 million people every year and are recognized in over 100 countries affecting all continents apart from Antarctica and causing millions of deaths annually [1]. The burden of MBDs is estimated to be higher in tropical and subtropical areas, affecting disproportionately the poorest populations. Despite the fact that there have been global campaigns to eradicate MBDs [2], these diseases are re-emerging and even more emerging in countries where they were previously unknown. The reason for this may be manifold. The changing climatic and ecological conditions, global travel and trade, human behavior [3], as well as the rapid and unplanned urbanization [4], are key factors that influence the seasonal and geographic distribution of vectors' population and therefore the transmission of the pathogens.
Most of the environmental variables (geographical, climatological, and hydrological) that influence the transmission cycle of MBDs between pathogenic agents, vectors and intermediate hosts can be monitored efficiently from satellites that carry specific instruments capable of capturing these parameters frequently and on a global scale [5][6][7]. Kazansky et al. listed the various satellite sensors that can provide environmental data and could contribute as an input to a Malaria Early Warning System [8].
It is worth noting that between the years of 2014 until 2018, there has been a remarkable growth in EO satellites, counting in total approximately 700 satellites in space [9], operating advanced sensor payload providing enhanced spectral and spatial resolution with shorter revisit times and larger coverage, enabling improved earth monitoring at global level [10]. This growth of observations was followed by a substantial increase in the number of studies that exploit EO data in order to better understand the geographic distribution, abundance and dynamics of MBDs and the associated vectors and pathogens [11].
The scope of this paper is to review recent literature for identifying studies that utilized satellite EO data for epidemiological modeling of malaria, dengue and West Nile Virus (WNV). Epidemiological models and Early Warning Systems (EWSs) that utilize EO data have been used as tools for helping decision-makers to improve health system responses, take preventive measures in order to curtail the spread of MBDs and address the relevant priorities of the Sustainable Development Goals (SDGs) such as good health and well-being (SDG 3) and climate action (SDG 13) [12].
In this scoping review we solely focused on three MBDs, namely malaria, dengue and WNV. The exclusion of other MBDs serves the economy of the paper only. We believe that these choices are representative of a wide range of MBDs; malaria is most commonly spread by the Anopheles mosquito genus that is also responsible for the transmission of Lymphatic Filariasis. The Aedes genus can transmit Dengue fever, Chikungunya, Lymphatic Filariasis, Rift Valley Fever, Yellow Fever and Zika, while the Culex genus is responsible for transmitting Japanese Encephalitis, Lymphatic Filariasis and WNV. Therefore, the use of EO data in the epidemiology of malaria, dengue and WNV do not differ substantially from those we did not consider.

Malaria
Malaria is the most prevalent, life-threatening and costly parasitic infection worldwide, affecting over 100 countries and territories which live under the risk of malaria transmission [2]. The global campaign to eradicate malaria rolled out by the World Health Organization (WHO) led to important reductions in new malaria cases in endemic countries during 2000-2015. Despite this achievement there were still 219 million cases of malaria in 90 countries reaching 435,000 deaths in 2017 [13]. Research has been conducted to examine the effect of climatic conditions on malaria transmission [14,15] with ambient ground temperature and moisture affecting Anopheles mosquitoes population and the incubation period [16]. Furthermore, intense rainfall can reduce the larvae density by flushing first stage larvae [17].

Dengue
Dengue is a mosquito-borne viral infection and is endemic in many tropical and subtropical regions in the world [18]. Since 1970 dengue has rapidly spread and can be found in more than 100 countries in regions of the world including the Americas, Eastern Mediterranean, South-East Asian and Western Pacific [19]. Temperature affects the extrinsic incubation period of the Aedes mosquito [20] and rainfall is also one of the most important environmental factors that affects the vector's reproduction cycle. The vector abundance can be influenced by rainfall events by increasing the availability of mosquito juvenile habitats (e.g., containers in the patio with standing water) [21] and drought conditions can increase the larval habitat by increasing household water storage [22].

West Nile Virus
WNV was first identified in the West Nile district of Uganda in 1937 and was considered a low risk disease for humans and livestock species until the 1990s [23]. Since then WNV has rapidly spread across all continents except for Antarctica [24]. In nature, the WNV cycles between birds, which act as the principal hosts and mosquito vectors and transmit the virus to other birds. Humans, equines and other mammals act as incidental or dead-end hosts and are not involved in the transmission cycle. Most human infections are asymptomatic (around 80% of infected people) or can lead to mild symptoms like fever, headache, tiredness and body aches. More severe cases can cause neuroinvasive disease including meningitis, encephalitis, acute flaccid paralysis and death in people [23]. The transmission and geographic distribution of WNV is associated with the existence of both the avian reservoir host and mosquito-vector, which is affected by environmental (abiotic and biotic) and socio-economic conditions [25]. The involvement of birds (in addition to mosquitoes) and the potential of EO techniques that may contribute to understanding of movement of migratory birds make the disease of particular interest in this context. WNV transmission can thrive under favorable environmental conditions; Culex pipiens can transmit WNV efficiently at a temperature of 30 • [26].

Literature Search Strategy
This scoping review included epidemiological and entomological studies that utilized EO (climatic and environmental) data in mapping, modeling and forecasting of malaria, dengue and WNV. Whereas a systematic review uses systematic methods to critically appraise a focused research question, a scoping review comprehensively maps evidence across a broader research question using diverse sources [27]. Accordingly, in this scoping review we have sought to review recent literature for identifying the current state-of-the-art in epidemiological modeling of the MBDs using satellite EO data. The search was limited to peer-reviewed literature in English that was conducted during the period 1 January 2012 to 31 December 2018. This search phase was selected due to the fact that from 2012 onward the annual growth rate of publications relating to health and dealing with remote sensing has been steadily increasing, with malaria and dengue being the most frequent disease-specific keywords [11]. The Web of Science, PubMed and the Scopus databases were searched electronically to retrieve relevant literature and articles. Boolean operators combining multiple keywords salient to the research topic were queried in the abovementioned databases. The keywords were "Earth observation", "Remote sensing", "Satellite data", "vector-borne disease*", "mosquito-borne disease*", "modeling", ("NDVI" OR "NDWI" OR "EVI"; AND "malaria" OR "Dengue" OR "WNV"), "temperature", "precipitation","malaria", "West Nile Virus" and "dengue". The results were combined using the Mendeley software, and duplicates were removed. The titles and abstracts were initially examined to determine the relevance of the articles. Thereafter, full texts were screened to ascertain if the selection criteria were met. Finally, the reference lists of the reviewed papers were scanned to gain additional literature. All the authors listed in this paper participated in each step of the selection procedure.

Inclusion and Exclusion Criteria
The scoping review was conducted adapting the Arksey and O' Malley [28] and Levac et al. [29] methodological framework. This framework includes a transparent method for linking the purpose and the area of research [28] and uses an iterative team approach for the selection of the studies, while it includes a numerical summary and qualitative thematic analysis [29]. The selection criteria involve post hoc inclusion and exclusion criteria. To ensure consistency and eliminate studies that were out of the scope of this paper the authors discussed and agreed on the initial criteria at the beginning of the selection process with further refinements until the final selection.
The articles finally selected were: 1.
Peer-reviewed articles published in English between 1 January 2012 and 31 December 2018.

2.
Publications that integrated satellite EO derived climatic and environmental predictors for analyzing mosquito-borne epidemics. Studies that did not use satellite EO data or used solely in situ data were excluded from the review.

3.
Studies on models that included disease incidence, prevalence and cases as variables, as well as studies that used entomological data as response variables.

4.
Articles referring to the impact of (inter-annual) climate variability on pathogen transmission, excluding the ones using climatic scenarios. By climatic scenario we refer to studies that used future projections under different climate change scenarios. Therefore, we only focused on studies that utilized historical data and built knowledge from the past events.

5.
Studies that used epidemiological models, making reference to the achieved level of accuracy rates. In contrast studies that did not bring any evidence or information on the accuracy of the used models were excluded.

State-of-the-Art Review
A total of 576 relevant articles were initially identified by the electronic search on Web of Science, Scopus, and PubMed during the period January 2012 to December 2018 ( Figure 1). The 112 articles in Figure 1 refer to records found through the review of the references of the selected articles as well as articles that authors recommended. A total of 43 articles were finally selected as meeting the eligibility criteria for this scoping review; they are briefly described in Tables A2 and A3 in the Appendix A.
The majority of the 43 studies have analysed cases of malaria (n = 20), followed by dengue (n = 15), and WNV (n = 8). Figure 2 illustrates the number and geographic reference of the selected studies together with the disease of study.  The selected articles were organized into two main categories ( Figure 3) with respect to the data used as dependent variables for the prevalence of the diseases: (a) epidemiological data (disease incidence, prevalence or case, mortality data) (n = 31) and (b) entomological data (n = 11), while Stilianakis et al. has examined both (a) and (b) [30], and Valiakos et al. has additionally used wild bird data in complement to the epidemiological data [31]. The first category (a) used clinical records from the general human population as the main data source. In this case the majority of the studies (n = 23) referred to the clinical data as "confirmed cases", meaning that the patients were confirmed through laboratory testing. Buczak et al. [32] and Arboleda et al. [33] included also cases that were considered as "possible", meaning that the patients exhibited some of the symptoms of the infection. Refs. [31,[34][35][36][37][38][39] explicitly used laboratory confirmed cases (microscopy/Rapid Diagnostic Tests (RDT)), while Sewe et al. utilized the number of deaths caused by malaria [40]. The second category (b) used entomological data providing information on the vectors' density, that is highly dependent on the ambient climatic and environmental conditions and significantly influences the transmission of the pathogen. The mosquitoes' collection was implemented by ovitraps, classical dipping techniques [41,42] or by recording indices, like the Breteau Index (BI) [33,43], House Index (HI) or Container Index (CI) [33,43,44], and the Entomological Inoculation Rate (EIR) [45,46] indices, the latter being a commonly used measure that estimates the number of infected bites per person and per unit time (usually year) [47]. Different kinds of traps (e.g., light traps, magnet traps baited with octenol, CO 2 baited traps, odour-baited MM-X) were used by the various studies [41,45,46,[48][49][50][51][52] as a method to capture mosquitoes, and produce the vectors' density assessments and its status as infected or not. It is worth noting that there is only one study [53] that used crowdsourced data, providing information on both the vector (% of mosquito bites, % of mosquito Larvae) and human cases (% of known human dengue cases) in the area. The share of the applied methodology with respect to the epidemiological or entomological data is illustrated in Figure 3. Through our database search, mainly data-driven and statistical approaches were returned. Increased computational power as well as the number of open-source datasets over the past years has given rise to these data-driven models that can relate environmental variables with the species occurrence or abundance [54]. These models represent input-output relationships built upon available datasets and do not require detailed knowledge about the complex interactions of climate, vector, host and pathogen. Mechanistic models on the other hand aim at capturing the biological and environmental mechanisms using dynamic equations [55] in order to define causality. In order to capture the full dynamics of the system, in their majority these models, when the study area has small spatial extent, utilize weather data from ground stations; in situ measurements generally represent the input fluctuations with smaller error compared to the satellite-derived indirect products. This scoping review will focus predominately on data-driven and statistical models.

Environmental EO Predictors
The environmental EO based predictors that were leveraged by the studies examined in this review are listed in Figure 4. Among the various climatic and environmental variables that were examined as possible predictors in these models, the vast majority of the studies used air, land and soil temperature data (n = 45) , precipitation (n = 34) , and vegetation indices (n = 42) as listed in details in Figure 4. Many processes that are associated with mosquitoes are strongly influenced by temperature, as the rate of development of the virus inside the vector is linked to warmer temperatures [26]. Air temperature estimates were either indirectly linked to the remotely sensed Land Surface Temperature (LST), which is widely used as a proxy, or by collection of in situ observations. LST is the radiative skin temperature of the land surface and is an important climate variable that is estimated from Top-of-Atmosphere brightness temperatures from the infrared bands of the satellite's sensors [56]. Associating LST and air temperature is a complex task as proved in [57] since it is highly dependent on the geographic location of the study area. Only one study [30] reported the association of WNV infections to soil temperatures obtained from the ECMWF's (European Centre for Medium-RangeWeather Forecasts)Re-Analysis (ERA-Interim) datasets. Soil temperature data refer to ground based observation at several depths [58]. Mendez-Lazaro et al. [59] and Laureano-Rosario et al. [60] have examined the influence of the Sea Surface Temperature (SST) instead, because of the vicinity to the coastal areas of San Juan in Puerto Rico and Yucatan state of Mexico respectively. Likewise for the LST parameter, the results from these studies showed that also the SST was significantly associated with the reported dengue cases. Bhatt et al. developed a dengue-specific temperature suitability index based on a biological model with temperature as an input [61]. This index included two temperature-dependent values affecting the dengue transmission cycle: (i) the life duration of Aedes vector and (ii) the Extrinsic Incubation Period (EIP).
Vegetation and vegetation indices are another important parameter that showed strong correlations with the vectors' behavior and their biological cycle. Most of the studies (n = 26) used the Normalized Difference Vegetation Index (NDVI), which is a proxy index of vegetation density and distribution due to the fact that is chlorophyll sensitive. NDVI is not only restricted to studies of plants; various studies have coupled vegetation dynamics with biodiversity, animal species distributions [68], movement patterns of animals (e.g., migratory birds) and the performance of animal populations (reproduction or survival). NDVI data can be used in combination with other data to model the temporal and spatial dynamics of vectors [69]. The Enhanced Vegetation Index (EVI) that is relevant to the canopy's structural variations [70] was also used (n = 8), followed by the Green Index (GI), the Soil Adjusted Vegetation Index (SAVI)(n = 2) and the quasi-yellowness index (p-YI) [34].  [71]. Vegetation information could also be derived from Land Use/Land Cover (LU/LC) maps in order to identify suitable vector breeding sites and was used by several studies (n = 14). LU/LC maps were also utilized for identifying other factors that might influence the transmission of MBDs like urban areas, health facilities, proximity to water bodies, etc.
Relative humidity plays an important role in the survival rate of the vectors, affecting differently various species [72]. Relative humidity estimates were derived from EO sensors systems onboard of satellites such as the Indian National Satellite System (INSAT)-3D imager [63], the Atmospheric Infrared Sounder (AIRS) instrument onboard the NASA's Aqua satellite [43] and the ERA-Interim reanalysis approach by Stilianakis et al. [30]. Adde et al. estimated the minimum and maximum relative humidity [48], while in Machault et al. the relative humidity was calculated using only in situ observations [44].
The evapotranspiration (ET) encompasses the amount of water that is removed from the land surface and returns to the atmosphere through the process of evaporation and transpiration [73]. Remote sensing techniques have been used by [74] in order to estimate the actual ET (ETa), that is the quantity of the water actually removed from the surface of the earth. Positive ETa is often associated with high levels of surface water and soil moisture availability, both of which indicate suitable vectors' breeding site conditions. In [65,75] the ETa parameter was derived from MODIS sensor data products, in [66] it was used in the global ET from MOD16 product, while [48] used in situ data to assess the ET.
In the coastal city of San Juan, reference [59] has associated the dengue cases with the Sea Level Pressure (SLP) and the Mean Sea Level (MSL). MSL was one of the variables that was significantly associated with the human dengue cases. Higher dengue incidences were related to the MSL maxima due to the fact that coastal areas are more prone to flooding during seasonal peak. Stilianakis et al. examined the soil water content as a parameter [30], which did not show an association with the presence of infected mosquitoes. Most of the studies (n = 16) referred to the role of the water bodies in the life circle of the mosquitoes because they serve as breeding sites for larval development. In order to depict the water bodies extent, several studies (n = 9) used the Normalized Difference Water Index (NDWI), which represents changes in liquid water content, while other studies (n = 7) additionally included as a parameter to the model the proximity to the different kinds of water bodies. Studies such as Diboulo et al. [46] and Giardina et al. [39] used the permanent or semi-permanent waters taking into account both natural and man-made containers. Other studies considered proximity to the running water, like rivers [76] or streams [31,77] and stagnant waters and lakes [45,78].
Topography is a significant factor in the transmission of MBDs as it affects the living conditions of the Anopheles and Aedes aegypti mosquitoes and indicates the best suited breeding sites [79,80]. Fourteen studies in this review [31][32][33]39,42,44,45,49,67,[76][77][78]81,82] utilized Digital Elevation Models (DEMs) to extract the topographic parameters of elevation, aspect and slope. Moreover, DEMs were used for calculating the Topographic Wetness Index (TWI) [41,42,78], an index that gives information about the wetness of an area, taking into account the topographic slope and the upstream area. Nmor et al. specifically focused on topographic variables for the prediction of malaria vector breeding sites [42]. It was the only study that considered the association of topographic position index (TPI), curvature and Convergence index (CI) with malaria vector habitats.

Other Non-Environmental Predictors
Although climatic parameters are highly influential in the transmission of MBDs, non-climatic factors such as social, economic and demographic parameters listed in Table 1 can affect the magnitude and the spatial extent of the MBDs' transmission [55]. The studies of Quintero et al. [89,90] examined cases of poor water and sanitation conditions that compel inhabitants to store water in open containers, thus building breeding habitats for the mosquitoes. In this review the study that was conducted by Buczak et al. also included variables related to socio-economic parameters, namely running water, hygienic services and electric lighting into the final model [32]. Moreover, Homan et al. highlighted the impact of socio-economic risk factors in malaria spread and constructed a socio-economic status index (SES) [78]. The SES was constructed using Principal Component Analysis (PCA) and was based on six variables: (a) rented or owned dwelling, (b) owned agricultural area, (c) highest education level of household, (d) location of the kitchen, (e) the wall structure and (f) the floor cover. Furthermore, Bhatt et al. included multiple socio-economic variables to generate dengue risk maps [61]; the urban accessibility data that define the travel time of people to a city using land and water-based mass transit mechanisms, relative poverty and demarcation of urban and peri-urban areas.
Furthermore, demographic data were used to estimate the rate of vulnerability of the population and identify the geographical areas where the risk of a disease outbreak is higher [91]. Areas with higher population density are at higher risk of transmitting a pathogen [90]. Several studies (n = 5) have used population data as independent variables [41,64,78,82,86], while others (n = 2) used the population data for estimating the vulnerability and the level of the risk [38,85]. In general the population density originated from National Administrative Departments. Marcantonio et al. that used Europe as its AOI, additionally used satellite imagery to extract the intensity of light at night that was used as a proxy for human population density [83].
Two studies that investigated the WNV risk factors, used data related to birds, since birds act as the principal host of the WNV transmission cycle. Tran et al. digitized and categorized the birds' migratory routes based on their fly way direction (western and eastern) [86], while Valiakos et al. utilized as a predictor, cases of birds that were positive to WNV antibodies [31].
It is believed that moon light affects the vectors' behavior, increasing its activity during full moon and third quarter moon phase. According to Mokraoui et al., there is a correlation between moon phases and dengue outbreaks [53]. Moon light affects the vectors' behavior, increasing its activity during full moon and third quarter moon phase. This was the only study that examined if there is any correlation between the moon light and dengue outbreaks.

Satellite EO Systems Used for Assessing the Environmental Predictors
Satellite EO data were the main sources for assessing the environmental predictors. All studies used remotely sensed data to estimate environmental or climatic parameters, while some others incorporated in situ data additionally. In terms of data sources, the low resolution satellites (spatial resolution of 300 m and less) were used as the major optical data sources. The MODIS based information products that were mostly used for the prediction of mosquito borne diseases refer to the LST, NDVI, EVI, and NDWI. Moreover, the TRMM mission was the most important source to timely acquire precipitation data, while Meteosat-7 data were also used by Amek et al. to estimate the rainfall at a 8 km wide cell [45]. Landsat-7 and Landsat-8, were both used to derive vegetation indices [33,76,77] as well as to generate LU/LC maps. Correspondingly Adde et al. [48] and Yue et al. [85] generated detailed LU/LC maps from using SPOT-5 and GF-1 data at the spatial resolution of 10 m and 16 m respectively. Machault et al. utilized very high resolution satellite imagery (GeoEye-1, 0.41 m) in order to create detailed LU/LC maps and downscale the study at the level of the household [44]. Similarly, Homan et al. used QuickBird imagery at the spatial resolution of 0.61 m for deriving precise information on household's proximity to lakes, and to the nearest clinic, but also for producing detailed estimations of the NDVI and TWI indices in the neighborhood of the households [78].

Results and Discussion
EO data were found to elicit environmental and climatic variables, that could significantly contribute to epidemiological modeling of MBDs referring to predictive mapping, geographic distribution and abundance of the pathogen and vectors, health risk assessment, understanding the transmission dynamics, identification, implementation of appropriate control strategies and their assessment. The associations between predictors and MBDs were either positive or negative as listed in Table 2. Furthermore, the study by Merkord et al. has successfully incorporated satellite EO data into the early warning EPIDEMIA System [67], while Lowe et al. highlighted the potential for integrating EO data into a EWS for Southern Brazil [82]. All studies have used satellite remotely sensed data, while some incorporated in situ data. Although the latter tended to be accurate sources of information, they have been of limited use in the literature, most likely because they were single point measurements and sparsely distributed observations, while the interpolation between points, necessary in case of larger study areas, added more uncertainty in the prediction. Contrariwise, according to the majority of the studies the satellite based observations provided large areas of coverage and uninterrupted acquisitions of series environmental data needed for the predictions. The primary satellite data sources that were exploited extensively from most of the studies in this review were medium to high resolution and freely available. It is worth noting that besides the unforeseen long lifespan of the two most commonly used satellites Terra and Aqua, they are expected to switch off operations in a few years, and there is a need for replacing them with similar satellite/sensor systems e.g., SUOMI-NPP/VIRS, JPSS/VIRS, Sentinel-2, Sentinel-3, which have not yet been fully exploited. From the analysis it is obvious that three MODIS based vegetation indices have been widely used; these are the NDVI, the EVI and the water index NDWI with 500 m spatial and 16 days temporal resolution. The day and night LST products based on MODIS were also extensively exploited providing continuous daily information in 1 km wide cells. Furthermore, remotely sensed precipitation data in the spatial resolution of 0.25 to 5.0 degrees have been derived using the TRMM mission with a revisit time of 23 days at the equator and 46 days at the highest latitudes. The TRMM mission provided precipitation data for 17 years over the tropical and subtropical areas; however this mission is no longer available as it was turned off in 2015.

Predictors for Malaria
Most of the studies in this review took place in tropical, subtropical, temperate and Sub-Saharan climatic zones as shown in Figure 2. Temperature was one of the most influential parameters affecting the malaria occurrence in tropical regions as shown in [35,37,39,45,46,48,84,88] and only one study [38] located in Tanzania, claimed that temperature performed poorly as a predictor. Lag times and degrees of temperature varied between the studies, which could be explained due to the fact that the studies were located in different climatic zones; Diboulo et al. that is located in Burkina Faso, and as such is characterized by a Sub-Saharan climate, has found that the density of the vector An. gambiae was positively associated with the day temperature during the two previous months counting from the date that the collection of the mosquitoes has occurred, and it was negatively associated with the night temperature during the current and two previous months [46]. In studies that run in the tropical zone, as for example [45] that geographically refers to Western Kenya has shown that there was a three month lag pattern between temperature and peaks of malaria admissions and that the An. gambiae mosquito's density was negatively associated with the mean day temperature of 29 • C. The study of Adde et al., which was located in French Guiana, concluded that a minimum temperature of 20 • C proved to be always beneficial for mosquito An. darlingi breeding [48], while the study conducted by Ssempiira et al. that was located in Uganda, observed that the incidence of malaria was increased with day temperature, however very high temperatures above 29 • C resulted in a decline of malaria incidences [37]. This result was concordant with the study of Amadi et al., which was located at Baringo in Kenya and found that average monthly minimum temperatures between 16.2-21 • C (lag 1-month) made favorable conditions for the increase of the malaria risk [84].
Additionally, precipitation proved to be another significant predictor in the tropical zone highly associated with malaria occurrence [35,37,40,84]. Contrariwise, the study of Kabaria et al. that was conducted in Tanzania (tropical region) was the only one study that claimed precipitation performed poorly as a predictor [38]. Lag periods ranged significantly in the case of precipitation as well; Sewe et al. found out that all three study regions across the Area of Interest (AOI) located in Western Kenya resulted in different lag periods (0 to 12 weeks) [40] and Amadi et al. found positive associations between rainfall and malaria at a 2-month lag time [84]. Kanyangarara et al. [77] and Midekisa et al. [65] took place in humid subtropical climates; Kanyangarara et al. [77] found higher malaria risk during the rainy season at a total monthly rainfall between 94-181 mm, while Midekisa et al. found positive associations between rainfall and malaria cases at a lag time of one to three months [65].
Different vegetation indices (NDVI & EVI) were also tested in the literature and have been associated with malaria occurrences in tropical areas [35,37,40,76,84]. NDVI values between 0.3-0.4 have showed an increased correlation with the malaria risk as stated in [40,84]. EVI was also positively associated with reported malaria cases in humid subtropical areas like Midekisa et al. [65] and dry Mediterranean climates like Portugal in the study of Benali et al. [52]. The SAVI was positively associated with malaria distribution, meaning that the malaria vector prefers greener vegetation according to Malahlela et al. [34], which examined an AOI that is located in Vhembe District in South Africa and is characterized by varying topography; the north-western part is in the semi-arid climatic zone, while the southern-eastern part lies on the subtropical zone.
Proximity to water bodies however was negatively associated with malaria incidence in tropical areas like Uganda [37] and vector' density in Western Kenya [45], while water indices performed poorly as predictors both in temperate climates like South Africa [34] and tropical like Tanzania [38]. The percentage of dense/riverine vegetation was the most significant predictor for Kabaria et al. [38].
Socioeconomic factors were proved to be significant covariates [78], with the outdoor occupation being the most significant risk factor, followed by the SES and population density. This result is concordant with [92] that associated outdoor occupation with higher malaria risk, since people working outdoors are more exposed to receiving infective mosquito bites.

Predictors for Dengue
Temperature was an important explanatory variable related to the prediction of dengue cases in the tropical zone [43,59,64,71]. Ashby et al. claimed that day and night temperature played an important role in the determination of the dengue fever niche, with the day temperature limiting the reproduction rate of the main vector of dengue fever, Aedes aegypti [64], while the study of Sarfraz et al., which was located in Thailand, found that a temperature range between 30-35 • C had a high impact on Aedes vector breeding [43]. Hii et al. mentioned a consistent and stable association between mean temperature and dengue incidence [87] and Mokraoui et al. highlighted the importance of temperature for estimating the dengue index, as longer dry seasons create more suitable sites for dengue outbreaks [53]. Ssempiira et al. found that both temperature indices of SMT and TCI with a lag time of 8-9 weeks were related to dengue incidence rate [37]. Two studies were located in subtropical areas; Yue et al. was conducted in a coastal area in China and found that day temperature and night temperature were significantly correlated with dengue fever outbreaks [85], while German et al. found that low temperatures had a negative association with the oviposition activity [50].
NDWI is an indirect proxy for precipitation and humidity, and it was associated in many studies conducted in subtropical climates with dengue occurrences; Scavuzzo et al. [51] and German et al. [50] examined areas that were located in the subtropical city of Tartagal in Argentina and claimed that NDWI was positively associated with the oviposition activity of Aedes aegypti vector, while [85], found that NDWI was significantly positively correlated with dengue outbreaks.
Other variables that proved to be significant were various LU/LC classes as well as socioeconomic factors; Yue et al. mentioned that the land type was significantly correlated with the dengue fever outbreak [85] and the study conducted by Machault et al. took place in Tartane in French Antilles, which is characterized by a tropical climate and found that the "sparsely vegetated soil" land use class was associated with the presence of water filled containers, while the "asphalt" land use class was negatively associated with the presence of Aedes larvae-positive containers [44]. Yue et al. claimed that human population density was one of the most significant predictors [85].

Predictors for WNV
Temperature was one of the most significant predictors for the WNV incidence, which appeared mainly in temperate [30,31] and continental climates [75,81,83,86], meaning large seasonal temperature differences, with warm to hot summers and cold winters. Temperature was positively associated with the WNV incidence [75,81,83,86], while Stilianakis et al. located in Greece mentioned that soil and air temperature were between the most significant predictors for WNV disease outbreak [30]. Marcantonio et al. [83] and Young et al. [81] which examined Europe and the US Great Plains respectively, both characterized by warm and humid continental climate, suggested precipitation as one significant predictor for WNV incidence. Moreover, and according to [75] located in the US Great Plains, cumulative ETa has shown positive association with WNV relative risk. Elevation played an important role in the prediction of WNV incidence [81], while low elevation was positively associated with both human and wild bird cases [31].
The vegetation index NDVI showed positive associations with the WNV risk according to Chuang et al. [75] and Young et al. [81] both located in the same region of the US Great Plains, while the study of Conley et al. that was conducted in arid and semi-arid areas mentioned that the seasonality of EVI was a significant predictor of the vector's habitat [41]. Of the land use predictors, the irrigated croplands and the populated forest were the most significant predictors that were positively associated with the WNV incidence [75]. Kanyangarara et al. identified the WNF outbreak of the previous year as a risk factor [77].  [86] [83] WS Mediterranean [30] Hum Mediterranean [30] Pop Tropical [64] Pop Semi-arid [41] Pop Subtropical [85]

Data Driven Uncertainties and Limitations
The quality of the input data has been the major limiting factor in regard to the sensitivity of the models and the accuracy of the predicted variables and risks. Given the dependence of MBDs prediction algorithms on temporal data, data availability and reliability were the major concerns of many studies.
A lack of systematic epidemiological and entomological data collection [34,37,81], uncertainty of the ingested dataset due to under-diagnosis and underreporting [35,38] and a confined number of cases [30] were reported as the main limitation reasons. Furthermore, the computation of entomological indices such as the CI, HI and BI [43] may have constrained the estimations due to the fact that these indices are highly dependent on the samples of the vectors in the containers that use the immature forms of the vectors. Sarfraz et al. suggests that pupal survey may have been more suitable for investigating the risk because it collects the mature form of the vectors, which may be more indicative for revealing the real trend of the risk [43]. The use of the statistical method of Inter-VA Autopsy from Sewe et al. might have under or over estimated the number of deaths [40].
Some of the studies mentioned additional limiting factors that were related to the satellite imagery. Quality problems relevant to image blurring and image stripe was mentioned by Yue et al. [85]. In addition studies that examined areas located in subtropical and tropical regions [34,48] and used optical data faced issues due to the existence of thick clouds. Fusion of optical and Synthetic Aperture Radar (SAR) data seemed to have resolved the problem to some extent.
From this review it comes to our attention that the employment of regression methods has been relatively easy to apply and automate. These approaches included autoregressive terms and functions for seasonality in order to model the serial correlation. A Generalized Linear Mixed model (GLM) with Poisson distribution was utilized by Amadi et al. [84], while Sewe et al. [66] used the Generalized Additive Model (GAM) that is an extension of the GLM, in which the predictor variable is estimated using unspecific (non-parametric) functions [94]. Sewe et al. [66] compared the GAM with the Boosting (GAMBOOST) ensemble model, where GAMBOOST performed better since it reduced the over-fitting of the model and could handle the data that were non-stationary. Logistic regression was employed to model binary response variables for example the presence or absence of the infection by [30,34,44,77,86]; Malahlela et al. used a Stepwise Logistic Regression (SLR) [34], which is one of the most commonly used methods for relating remotely sensed data with disease distribution [95,96] to analyze the spatial distribution of malaria.
The ARIMA models have been used for analyzing and forecasting time series data and have been performing well in cases where the data appear to be non-stationary [97]. Furthermore, ARIMA models seemed well suited for representing temporal patterns, such as seasonality and serial correlation. Extensions of the ARIMA were utilized by two studies; Kamya et al. used the ARIMAX model [35], which is a multivariate Autoregressive Integrated Moving Average Model, extending the ARIMA model by including multiple predictors using current and past values of the independent variables [98] and [65] used a Seasonal Autoregressive Integrated Moving Average (SARIMA) modeling approach, including a seasonal component to relate the lagged association of environmental variables with malaria cases. SARIMA models performed well and could be used in cases where the time series of the dependent variables exhibit a seasonal variation. However, the SARIMA models might fail to provide accurate prediction, if the preceding sequence of the time series exhibit abnormal variations [99]. Furthermore SARIMA were strongly data-driven, requiring a sufficient time series set of historical data for the model's parameterization.
Spatial statistics were used to analyze and predict the values associated with spatial or spatiotemporal phenomena by studies [37,39,45,46,85]. Yue et al. analyzed spatial patterns of dengue fever by conducting the following spatial analysis methods [85]: point density, average nearest neighbor, spatial autocorrelation and hot spot analysis, while Bayesian binomial models were utilized by [37,39,45,85].
Only Ruangudomsakul et al. estimated the dengue outbreak level by utilizing Bayesian Network (BN) [71]. BN is a probabilistic graphical model that uses Bayesian inference in order to perform probability computations. The goal of the BN is to model conditional dependence of the variables using a Directed Acyclic Graph (DAG) [100]. In this study, three BN models were tested, that included expert knowledge, the Greedy Thick Thinning algorithm (GTT), and a combination of both. After assessing the performance of the three different models, the model that combined the GTT and expert knowledge was suggested for forecasting dengue at the different outbreak levels.
Boosted Regression Trees (BRT) could handle a big variety of predictors, complex nonlinear relationships and missing data and were utilized by [38,41,61,64]; reference [38] used BRT to relate the high resolution urban LC classes, as well as other satellite derived environmental variables with the malaria prevalence, while Ashby et al. used BRT to quantify the risk of dengue incidence comparing the Poisson and the Bernoulli family models [64]. The BRT analysis that was conducted used disease presence/absence data for Bernoulli family and the actual case counted for the Poisson family. The results showed that the Poisson family returned a better model fit compared to the Bernoulli one, with a lower Root Mean Square Error (RMSE) and higher correlation.
The ANNs approach has been used for time series prediction and has been capable of reproducing and modeling nonlinear processes. ANNs had also the advantage of detecting every possible interaction between explanatory variables. On the other hand, the ANNs could not explicitly identify the causal relationships due to the unexplained behavior of the network and have been more prone to overfit the models in case of inadequate input datasets. Bui et al. tested multiple machine learning classifiers and ensemble techniques for relating malaria cases with socio-physical parameters and creating malaria vulnerability maps [76]. ANNs, Support Vector Machine (SVM), J48 and ensemble techniques using the J48 as a base classifier and Adaboost, Bagging and Random Subspace were used; the Random Subspace ensemble model performed the best. In the study conducted by Scavuzzo et al. multiple ML algorithms were examined to model temporal variations of the oviposition in both urban and rural areas [51]. SVM, ANN multi-layer Perceptron, Decision trees and K-Nearest Neighbor (KNN) were compared with two linear regression models. KNN performed better than the rest of the methods. Furthermore, two studies [33,41] utilized the Maximum Entropy (MaxEnt) approach for predicting the distribution of the vectors' population. Both studies compared the MaxEnt with other models; Conley et al. used the BRT method [41] that proved to have a strong agreement in results, and Arboleda et al. showed that the combination of BRT and Genetic Algorithm for Rule-set Prediction (GARP) yields the best models [33].
Although various evaluation metrics were used (Appendix A-Tables A1-A3), we tried to summarize the accuracy of the most commonly used methods in this review; regression methods that predicted malaria incidence varied widely year by year and showed big spatial heterogeneity R 2 ∼ 0.4-0.9. However, classification methods in vector population prediction yielded a mean model accuracy of ∼ 80% with a Confidence Interval (CI) of 95% and an AUC ∼ 0.8. Methods for prediction of dengue incidence in humans resulted in a mean R 2 ∼ 0.35, while mosquito population dynamics were modeled with R 2 ∼ 0.7. Regression models utilized WNV incidence in humans with the dependent variable showing a high temporal variability over the year (R 2 ∼ 0.1-0.7), while early warning vector population dynamics and their potential transmission risk were modeled with R 2 ∼ 0.5.
Studies used human incidence data that ranged significantly in terms of accuracy, mainly due to data gaps, non-systematic and non-standardized collection of the input data. The assumption that the infections were locally acquired might be biased due to population travel from one region to another. Studies that used entomological data seemed to yield slightly better accuracy. A possible explanation for this result is that disease incidence is highly correlated with the mosquito density and therefore models that took the vectors densities as response variables, seemed to result in more accurate predictions.

Scalability and Transferability
It is essential to know which methodologies seemed promising and claimed to be easily replicated in different geographic regions, enabling the realization of predictive systems on regional and national scales. The review shows that some of the methods used are scalable and transferable in other areas with similar climatic and MBDs conditions. The automated FARM method used by Buczak et al. [32], was described as generic and extendable in any geographical area. Rosa et al. [49] applied linear mixed models and stated that they are transferable and applicable in other areas with similar climate and land cover conditions, while the principles used in the models' design could be applied in any area. Machault et al. [44] used logistic regression analysis in a two-step approach and mentioned that the equations derived from the final model could be applied in regions with similar morphological and LU/LC conditions, while Scavuzzo et al. utilized different ML algorithms (SVM, ANNs, K-NN, decision trees), allowing transferability of the methodology in other regions as well [51]. In contrast, Sarfraz et al. mentioned that the fuzzy approach could not be extrapolated to a regional scale, since the environmental and social factors affecting the dengue vector density would vary significantly [43].

Conclusions
A wide range of both predictors and modeling approaches were found in the literature to forecast epidemic diseases like malaria, dengue and WNV. Researchers have examined different methods and have utilized different data sets in order to model the MBDs. Barriers until recently have been the temporal and the spatial resolution of the data and the data accuracy, as shown in this review; most of the studies have used data from satellite missions that are at the end of their operation or are no longer available. Because of this, it is strongly believed that state-of-the-art EO sensors and satellite systems need to be envisaged for the prediction of MBDs. Actually, with the advent of the Sentinel data, offered freely by the Copernicus EU program, a new challenge has arisen for the analysis of big satellite data and the employment of data science approaches. Their enhanced capabilities for multipurpose environmental monitoring at various scales along with the higher temporal and spectral resolutions will significantly increase the level of information on predictors such as soil moisture, vegetation and water bodies and therefore the accuracy of the models. There is a great advantage to using Sentinel-1 SAR images because of their enhanced azimuth spatial resolution (5 m) and mainly the ability to be used frequently every 6 days during day and night independently of the atmospheric and cloud conditions. Li et al. [101] and Catry et al. [102] have successfully used the fusion of optical and SAR data for generating LU/LC maps to better address the challenge of malaria elimination, while Catry et al. leveraged SAR data for estimating the extent of wetlands in the Amazon river basin [103]. Moreover, the Sentinel-2 images (10 m GSD, 6-days revisit time) offer a unique continuity and high accuracy assessments of indices such as NDVI, EVI, SAVI, NDWI, in complement to the SPOT and Landsat missions which have been widely used so far. In addition, the need to scale up predictions and move from the local to regional or continental level, can be ideally addressed if medium resolution (500m-1km GSD) data from Sentinel-3 are adequately combined with Seninel-1 and Sentinel-2 and other existing operational HR and VHR satellite missions (SPOT, IKONOS, WorldView, etc.). In this regard, it is worth mentioning the benefits of using the Global Precipitation Measurement (GPM) mission of NASA as it offers an enhanced continuity for the TRMM mission, and provides precipitation data more frequently, with increased accuracy in the spatial resolution of 250 m every 3 h. Last but not least, the Soil Moisture Active/Passive (SMAP) mission, that has not yet been fully exploited, exhibits a high potential as it provides global soil moisture assessments at a spatial resolution of 3 km within 2-3 days revisit time.
Nowadays, new IT technologies allow for high computational performance to perform time-series analysis of big satellite derived data in order to estimate infectious disease trends enabling more accurate predictions for MBDs. Despite the progress made in epidemic forecasting there is still the need to exploit in depth new powerful modeling approaches like artificial intelligence and ensemble modeling ingesting long-lasting EO observations (space/in situ) and EO derived variables that allow the identification of highly complex relationships across data and risk factors influencing the MBDs transmission. However, possessing the ability to unhindered and continuous processing of volumes of data leveraging on High Performance Computing environment and Data Cubes, will assure geographic upscaling and transferability of the predictions in larger geographic areas. This is the primary challenging scientific problem of the days in EO, which in turn, if met, will lead to data driven decisions of high societal benefit.

Disclaimer
The views expressed are purely those of the writer (N.I.S.) and may not in any circumstance be regarded as stating an official position of the European Commission.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Stepwise logistic regression model/SM Classification accuracy of 82% at a threshold of 0.9 (buffer distance of 10 km) [35] 2006    Land type(LT), NDWI (GF-1 satellite), day and night LST(GF-1 satellite)