Developing the Role of Earth Observation in Spatio-Temporal Mosquito Modelling to Identify Malaria Hot-Spots

: Anopheles mosquitoes are the vectors of human malaria, a disease responsible for a significant burden of global disease and over half a million deaths in 2020. Here, methods using a time series of cost-free Earth Observation (EO) data, 45,844 in situ mosquito monitoring captures, and the cloud processing platform Google Earth Engine are developed to identify the biogeographical variables driving the abundance and distribution of three malaria vectors— Anopheles gambiae s.l. , An. funestus , and An. paludis —in two highly endemic areas in the Democratic Republic of the Congo. EO-derived topographical and time series land surface temperature and rainfall data sets are analysed using Random Forests (RFs) to identify their relative importance in relation to the abundance of the three mosquito species, and they show how spatial and temporal distributions vary by site, by mosquito species, and by month. The observed relationships differed between species and study areas, with the overall number of biogeographical variables identiﬁed as important in relation to species abundance, being 30 for An. gambiae s.l. and An. funestus and 26 for An. paludis . Results indicate rainfall and land surface temperature to consistently be the variables of highest importance, with higher rainfall resulting in greater mosquito abundance through the creation of pools acting as mosquito larval habitats; however, proportional coverage of forest and grassland, as well as proximity to forests, are also consistently identiﬁed as important. Predictive application of the RF models generated monthly abundance maps for each species, identifying both spatial and temporal hot-spots of high abundance and, by proxy, increased malaria infection risk. Results indicate greater temporal variability in An. gambiae s.l. and An. paludis abundances in response to seasonal rainfall, whereas An. funestus is generally more temporally stable, with maximum predicted abundances of 122 for An. gambiae s.l. , 283 for An. funestus , and 120 for An. paludis . Model validation produced R 2 values of 0.717 for An. gambiae s.l. , 0.861 for An. funestus , and 0.448 for


Introduction
Malaria is a life-threatening vector-borne disease caused by five Plasmodium species in humans and is transmitted by infected female Anopheles mosquitoes. In 2020, there were an estimated 214 million malaria cases globally with 627,000 deaths, an increase of opportunities to acquire cloud-free imagery. Additionally, by using these optical (spectral) and Sentinel-1 Synthetic Aperture Radar (SAR) data, which provide structural information of target features synergistically, improved classifications of land cover, which can be a key driver of mosquito distributions, have been observed [39,40], compared with where either optical or SAR data have been used independently [41,42]. Importantly, Sentinel-1 and Sentinel-2 data are available cost-free, enabling wide uptake of these datasets and methods as a monitoring tool.
Recent advancements in data processing have further enabled the broad-scale application of remote sensing data to environmental studies via the cloud-computing platform Google Earth Engine (GEE) [43]. GEE provides vast computational power for processing satellite and ancillary datasets on a global scale [44] and holds ingested archives of satellite imagery and environmental datasets. Importantly, GEE functionality enables the processing of collections of satellite data rather than just individual images, including generating cloud-free composite images from a series of cloud-affected images. This is particularly valuable in persistently cloud-affected areas, such as many tropical malaria-endemic areas. These capabilities mean large-scale land cover mapping and environmental monitoring can be performed in a manner that was previously unfeasible. Additionally, GEE holds the Open Buildings V1 Polygons dataset, comprising buildings across Africa and enabling modelling results to be linked to individual buildings.
Previous work, by focusing principally either on mosquito-landscape-climate interactions at coarse resolutions to assess temporal variability or at higher resolutions to assess spatial variability for specific snap-shots in time, leaves a knowledge gap in relation to temporal variability in mosquito dynamics at high spatial resolutions. Additionally, these previous studies have not necessarily included a broad range of environmental covariates that are fully representative of the variables driving mosquito distributions. This data gap is addressed here, where, uniquely, a comprehensive suite of variables capturing landscape structures and proximity to key landscape features; temporal variability in rainfall patterns and land surface temperature; landscape conditions via aggregated vegetation; and water indices variables are used to model the abundance of multiple mosquito species in Africa, identifying bespoke sets of key variables for each. The higher spatial resolution of Sentinel-2, compared with MODIS or Landsat in previous studies, and the production of mosquito abundance measures also build on and advance previous work for individual buildings by relating risk to household location.
This project develops a new approach for broad-scale, repeatable monitoring of malaria risk hotspots in the DRC, using in situ mosquito data and cost-free remote sensing data. This is timely, as robust, repeatable methods are required to underpin future malaria monitoring systems and specifically to integrate in situ observations of mosquito abundance with the spatial and temporal coverage of Earth Observation (including land cover) and climate data. This is achieved through two specific objectives: (1) identifying key variables driving three Anopheles mosquito species distributions and abundances, and (2) estimating monthly mosquito distributions and determining how these vary spatially and temporally. To enable wider application of the developed methods by end-users without prior expertise in Earth Observation analysis, training materials and sample data sets enabling independent implementation of these methods are available in multiple languages in [45]. This can allow organisations such as disease control agencies to incorporate these analysis methods in their planning and management processes without the need for EO specialists, which historically may have limited opportunities to incorporate EO into malaria control programmes.

Study Area
This project focused on two 50 km × 50 km urban areas and their surroundings in the DRC: the medium-sized town of Lodja (population approx. 80,000) (latitude: −3.524661°, longitude: 23.596669°) and a smaller settlement, Kapolowe (latitude: −10.942157°, longitude: 26.952115°) (Figure 1). Both are meso-endemic for malaria, with parasite prevalence between 6 and 30% [46] and with high Plasmodium sporozoite rates for An. gambiae s.l (Lodja 2.9% (9/307), Kapolowe 6.3% (8/127)) and An. funestus (Lodja 0% (0/24), Kapolowe 8% (4/50)) [47]. The areas surrounding these two towns are typified by a mix of land cover components, including traditional smallholder shifting cultivation (cleared land, active fields, and fallow fields) along with settlements, grassy and bare areas, and a forest interface. The Lukene river flows immediately to the south of Lodja, and the area to the east of Kapolowe is dominated by Lake Tshangalele and large expanses of swamp. At Lodja, mean monthly rainfall (between 1991-2015) was <100 mm for June and July and 100-220 mm/month for the other months of the year, with mean temperatures of 24-26 °C all year round. In Kapolowe, mean monthly rainfall is more seasonal than that of Lodja, with a pronounced dry period between May and September, followed by a rainy season from October to April when the mean rainfall is approximately 30-200 mm/month [47]. The mean temperature in Kapolowe over the year varies more than that of Lodja, ranging from 18 °C in June and July up to 25 °C in October. In 2015, PermaNet 2.0 (deltamethrin) nets were distributed in Lodja, and in 2016, DawaPlus 2.0 (deltamethrin) nets were distributed in Kapolowe. Both distributions were part of provincial distributions. No indoor residual spraying was conducted at either site. The areas surrounding these two towns are typified by a mix of land cover components, including traditional smallholder shifting cultivation (cleared land, active fields, and fallow fields) along with settlements, grassy and bare areas, and a forest interface. The Lukene river flows immediately to the south of Lodja, and the area to the east of Kapolowe is dominated by Lake Tshangalele and large expanses of swamp. At Lodja, mean monthly rainfall (between 1991-2015) was <100 mm for June and July and 100-220 mm/month for the other months of the year, with mean temperatures of 24-26 • C all year round. In Kapolowe, mean monthly rainfall is more seasonal than that of Lodja, with a pronounced dry period between May and September, followed by a rainy season from October to April when the mean rainfall is approximately 30-200 mm/month [47]. The mean temperature in Kapolowe over the year varies more than that of Lodja, ranging from 18 • C in June and July up to 25 • C in October. In 2015, PermaNet 2.0 (deltamethrin) nets were distributed in Lodja, and in 2016, DawaPlus 2.0 (deltamethrin) nets were distributed in Kapolowe. Both distributions were part of provincial distributions. No indoor residual spraying was conducted at either site. Mosquitoes were collected from each of the Lodja and Kapolowe study areas using human landing catches (HLCs), as previously described in [47]. Field technicians undertook eight HLCs over four nights each month, with two different houses each night (total of eight different houses each month). At each house between 18:00 hrs and 06:00 hrs, two technicians conducted indoor HLCs, and two others conducted outdoor HLC. Although focused within the urban areas of Lodja and Kapolowe, the areas surrounding the sampling locations were heterogeneous in terms of land cover present, comprising built-up areas, trees, grassland, Remote Sens. 2023, 15, 43 6 of 31 fallow areas, static water (fish ponds), and flowing water. Sampling locations at Kapolowe were focused on the edges of or outside the built-up area, in close proximity to a mix of land cover types that are distributed across the wider study area (Figures 2 and 3). Houses were chosen by the collection teams based on proximity to known larval sites, acceptance by households, and other logistical concerns.

In Situ Mosquito Data Collection
cale (INRB, DRC) during 2015-2017 and 2019 in Lodja and 2016-2017 in Kapolowe. Mosquitoes were collected from each of the Lodja and Kapolowe study areas using human landing catches (HLCs), as previously described in [47]. Field technicians undertook eight HLCs over four nights each month, with two different houses each night (total of eight different houses each month). At each house between 18:00 hrs and 06:00 hrs, two technicians conducted indoor HLCs, and two others conducted outdoor HLC. Although focused within the urban areas of Lodja and Kapolowe, the areas surrounding the sampling locations were heterogeneous in terms of land cover present, comprising built-up areas, trees, grassland, fallow areas, static water (fish ponds), and flowing water. Sampling locations at Kapolowe were focused on the edges of or outside the built-up area, in close proximity to a mix of land cover types that are distributed across the wider study area (Figures 2 and  3). Houses were chosen by the collection teams based on proximity to known larval sites, acceptance by households, and other logistical concerns.   In Lodja, a similar proportion of An. gambiae s.l. and An. coluzzii was collected, and species identification for An. gambiae s.l. was not performed for mosquitoes from Kapolowe [47]. An. gambiae s.l. refers to the An. gambiae complex of eight species that are morphologically indistinguishable from each other and that can only be differentiated by In Lodja, a similar proportion of An. gambiae s.l. and An. coluzzii was collected, and species identification for An. gambiae s.l. was not performed for mosquitoes from Kapolowe [47]. An. gambiae s.l. refers to the An. gambiae complex of eight species that are morphologically indistinguishable from each other and that can only be differentiated by molecular analysis [48]. In this study An. gambiae were not identified at the species level; therefore, senus lato (s.l.) was used for the rest of the study. Monthly collection enabled the assessment of the variation in mosquito prevalence in response to current and preceding rainfall and land surface temperature conditions.

Environmental Variables
A series of biogeographical variables, previously identified as being ecologically relevant to the mosquito life cycle and malaria transmission, were modelled here in relation to mosquito abundance. These include topography, land cover, vegetation index, land surface temperature, and precipitation variables (Table 1).
For this study, all variables were considered to be temporally stable for the time-period of mosquito data collection with the exception of the monthly precipitation and land surface temperature data, which varied each month. Median vegetation index value datasets were generated from all available cloud-free Sentinel-2 imagery, acquired during the mosquito data collection periods for each study area, quantifying the relative 'typical' vegetation state across the study areas and the spatial variability thereof.

Land cover
Land cover Proportional coverage of each land cover class, derived from a land cover classification of the study area. The area over which land cover proportions was calculated differed for the different mosquito species based on their established flight distances. Proportional coverage of forest and fallow classes combined was also calculated.
Derived from land cover classification. [37] Proximity to water body Distance from pixel centroid to the nearest patch of flowing water, static water, and swamp.
Derived from land cover classification. [6,19]  This study used Google Earth Engine (GEE) as the principal source of remote sensing and environmental data sets and for performing data pre-processing and analysis. The utilized remote sensing datasets were Sentinel-1 (SAR) [52] and Sentinel-2 (optical) data [53], along with Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) monthly precipitation data [54], MODIS land surface temperature and emissivity data [55], and Shuttle Radar Topography Mission (SRTM) data [56,57] for topographical characterisation. The acquisition dates of the remote sensing datasets covered the period of mosquito data collection for each site.
To generate land cover maps of each study area, Sentinel-1 SAR data and Sentinel-2 optical images were utilised synergistically. The Sentinel-1 SAR data were descending orbit data, with VV and VH polarisations and the derived ratio (VH/VV), from the Ground Range Detected (GRD) data product in the Interferometric Wide Swath (IW) acquisition mode. These data underwent pre-processing (GRD border noise removal, thermal noise removal, radiometric calibration, and orthorectification to derive the backscatter coefficient (Sigma Naught dB) of each pixel) [58]. Median reducers in GEE were used to produce median pixel value composite raster layers from image collections for VV, VH, and VH/VV polarisations from all available data acquisitions in the time periods of 1 January 2016 to 31 December 2019 for Lodja and 1 January 2016 to 31 December 2017 for Kapolowe, prior to subsetting the data to the study area extents. At each location in the output median pixel value composite, the pixel value is the median of all unmasked pixels in the input imagery collection. Median values, derived from a full annual cycle of Sentinel-1 data, were used here to be representative of the general landscape condition throughout the year. This was performed instead of using data from a single date, which, depending on the vegetation phenological stage, may be less representative of the general landscape condition.
The Sentinel-2 (optical) Level 2A orthorectified surface reflectance data at a 10/20 m spatial resolution (depending on spectral band) was used in this study. All available Sentinel-2 images corresponding to the mosquito sampling periods for each site were selected and cloud and cloud-shadow masked using the Sentinel-2 Cloud Probability product, which is also available in GEE. A median pixel value composite image, also produced in GEE using a median reducer, was then generated from the collections of images for each site. For the Sentinel-2 imagery, spectral bands 2 (490 nm), 3 (560 nm),

Land Cover Classification
To perform the land cover classifications, the Sentinel-2 imagery (10 retained bands), the median Sentinel-1 SAR bands (VV, VH, and VH/VV), and the SRTM bands (elevation, aspect, and slope) were composited together to form a 16-band data stack. This data stack was then classified using a Random Forest (RF) classifier with 500 trees. An eight-class classification nomenclature was used specifically targeting land cover types relevant to the ecology of mosquitoes, for example, forests (with woodland edges being important resting areas), water (larval habitats), and built-up areas (with human populations susceptible to malaria infection) ( Table 2). Table 2. Land cover classes, adapted from [59].

Land Cover Class Description
Forest A forest stand with over 60% tree cover. Reference data for both training and validating the land cover classifications were collected from three sources: (1) field collected locations of known land cover classes (with locations recorded via GPS and collected during field mosquito sampling); (2) further reference locations derived from field-collected photographs; and (3) visual interpretations of VHR satellite imagery available through public portals, such as Google Earth and Bing Aerial map layers. The use of higher resolution imagery as reference data is an established technique [41,60], with Google Earth archives also previously being successfully used for this purpose [32,39].
For validation, a regular point grid with 2 km spacing was generated for both study areas, with the land cover at these locations identified via the VHR imagery, ensuring a spread of validation points across the full areas. Additional points were added for sparse and spatially clustered classes to ensure a minimum of 30 validation points were collected for each class (following [32]). An RF classifier was used to generate the land cover classifications using Google Earth Engine. RF classifiers are well established for mapping land cover and have also been used to generate land cover maps within a malaria study context [32]. Accuracy assessments of each land cover classification using the reference validation data were performed, with confusion matrices produced for each study area (Tables S1 and S2).
Proportional coverage (percentage fractional cover) of each land cover class across the study areas was calculated using a moving window to assess whether the increased presence of certain habitats, for example, water serving as a larval habitat, is related to increased mosquito abundance. Additionally, the proportional presence of forest and fallow classes combined was calculated. This was performed independently for each mosquito species, with different kernel sizes used based on established mosquito flight range. Mosquito flight range is species-specific and is influenced by environmental conditions (e.g., wind and vegetation) and physiological status (e.g., sugar or blood fed). The average flight range for An. gambiae is 846 m and for An. funestus is 300 m, with no information available for An. paludis [61]. Maximal flight distances of 3-10 km have also been reported for An. gambiae based on laboratory experiments using flight mills [62]; however, these may not give realistic values compared with field studies [61]. Here, the average flight range for An. gambiae (846 m) was also used for An. paludis. Finally, 'distance to feature' raster surfaces were generated, calculating for each pixel the distance from that pixel to the nearest patch of a specific land cover class. This was performed for the forest (i.e., distance to the nearest forest edge), fallow, forest and fallow classes combined, static water, flowing water, and swamp classes.

Vegetation and Water Indices
Two vegetation indices, NDVI [63] and SAVI [64], along with two water indices, NDWI [65] and MNDWI [66], were also calculated. These index values typically vary over time due to phenological and rainfall patterns. A median reducer was applied over the full time periods of mosquito data collection in each site for each of the indices. These median composites represent more 'typical' vegetation and water index values over the time period, avoiding the influence that specific or outlying phenological and rainfall patterns could have on single-date images.

Climatic Data
CHIRPS precipitation data were used to quantify the seasonal precipitation variability, both spatially and temporally, across the mosquito data collection period of the two study areas. The daily CHIRPS product was here converted to a monthly mean product (mean mm/day) for August 2014 to December 2019 for Lodja and for August 2015 to December 2017 for Kapolowe. An additional five months of data prior to the commencement of mosquito data collection was included to assess the influence of rainfall values in the months preceding mosquito data collection. The spatial resolution of these data is 0.05 • ; however, to avoid artificial pixel-edge boundaries, the data are here smoothed using a 10 km kernel size moving window to generate a smoothed precipitation surface for each month. Each output pixel value corresponds to the mean precipitation value of the kernel area centred on the location of that pixel for that given month. To quantify temporal variability in land surface temperature, MODIS MOD11A1.061 Terra Land Surface Temperature (LST) and Emissivity daily global 1 km data [55] were used to calculate monthly mean LST data sets for the period five months preceding and for the duration of mosquito data collection.

Random Forest Analysis and Predictive Risk Modelling
The remote-sensing-derived and environmental raster data sets were combined into a single data cube for each 50 km × 50 km study area at a 10 m spatial resolution. The coarser bands (such as the CHIRPS precipitation and MODIS LST data) were effectively resampled to a higher level of detail, with multiple 10 m pixels (of the same value) in the data cube corresponding to a single original CHIRPS or MODIS pixel extent.
The mosquito data were then split into separate calendar month datasets (48 in total for Lodja and 24 for Kapolowe). For each of these, the variables contained as layers in the data cube were extracted. For each month of mosquito data collection, the rainfall values for the current and preceding five months were extracted on a rolling basis and were recorded in a consistent manner (rainfall in current month, rainfall 0 , rainfall in preceding month, rainfall -1 , etc.). Extracted data from all months were then recompiled into species-specific data sets, combining the data from each of the two study areas for each species.
Modelling of the explanatory variables (comprising the topographical variables, land cover class distributions (proportional coverage of individual land cover classes, and the distance to the nearest patch of key land cover type), vegetation indices, and the current and preceding five months of precipitation and LST data) was performed in relation to mosquito abundance data for each mosquito species using RF analysis. Mosquito data from the two study sites were pooled for each species, with correlation plots for the explanatory variables (Table 1) produced (available in Supplementary Information, Figures S1 and S2, which differ slightly due to different species average flight ranges) prior to feature selection being performed to identify and retain a subset of only those variables that were statistically relevant to the mosquito abundance. Given the high correlation between NDVI and SAVI, as shown in the correlation plots, SAVI was disregarded from further analysis. Feature selection used the Boruta method, which implements a feature selection algorithm designed as a wrapper around an RF algorithm [67], and was performed in R (v4.0.3) using the Boruta (v7.0.0; [68]) and randomForest (v4.6.14, [69]) packages.
The Boruta algorithm provides a stable and unbiased selection of important and non-important features by comparing the relevance of the real features to that of random probes [69]. This works by firstly adding randomness to the data set by creating multiple shuffled copies of all features, which are termed as shadow features. It then trains an RF classifier on this extended dataset (including the shadow features) and applies a feature importance measure to evaluate the importance of each feature. In each iteration, this process assesses whether the real feature has a higher importance than its shadow features (whether the feature shows a higher Z score than the maximum Z score of the corresponding shadow features), removing features which are deemed unimportant. The process is repeated until the status of all features is identified as important or unimportant or until a user-defined limit of RF runs is achieved [70]. This process identifies all features that are relevant, either strongly or weakly, to the response variable. This differs from the minimal optimal method of more traditional feature selection methods, which, while producing a minimal set of features and minimising the error of the RF model, can over-prune and discard some relevant variables. The Boruta method of feature selection has been shown to be highly effective and has been recommended above alternative feature selection methods [71].
Once a series of parsimonious variables had been identified by the Boruta feature selection for each of the three mosquito species, each species-specific dataset was split randomly into separate training (80%) and validation (20%) datasets. RFs were used in a regression capacity to model the relationships between mosquito abundance and the parsimonious set of response variables.
RF hyperparameter tuning was performed, with the internal RF parameters (number of trees, minimum leaf population, maximum nodes, number of variables per split, and bag fraction) tuned iteratively to identify the best performing model, determined by the highest R 2 value when comparing predicted values to the observed values for the validation data set. For all sites and mosquito species, the RF model parameters that produced the highest R 2 values between the predicted and observed values are as follows: number of trees = 200, minimum leaf population = 1, maximum nodes = unlimited, and variables per split = √ number of variables. Optimal bag fraction values did vary for each species, with optimal values being 0.7 (An. gambiae s.l.), 0.91 (An. funestus), and 0.6 (An. paludis). RF parameter tuning results are available in Supplementary Information (Tables S3-S5).
With the optimal RF parameters established, each species-specific model was applied predictively [30] for both study areas. All analyses were performed within Google Earth Engine, except for stepwise variable removal using the Boruta method, which was performed in R (v4.2.2, [72]).

Open Buildings
Open Buildings V1 Polygons data [73], available via Google Earth Engine, were used to identify building footprints within the study areas. This dataset identifies building outlines from 50 cm high-resolution imagery and contains 516 million building detections across 19.4 million km 2 of the African continent. Each building has an associated confidence score as a measure of uncertainty that a feature is correctly identified as a building. All buildings with a confidence score of <60% were disregarded, retaining a total of 78,885 and 169,204 buildings for the Lodja and Kapolowe study areas, respectively. For each building, the mean predicted abundance for each species for each month was extracted using zonal statistics in QGIS (v3.28.0).

Land Cover Classification
The land cover classifications successfully characterised the rural complex and wetland areas across the Lodja and Kapolowe study areas (Figure 5), and the land cover class coverages are displayed in Table 4. The overall classification accuracies were 90.60% for Lodja and 90.04% for Kapolowe, and the error matrices are presented in Supplementary Information (Tables S1 and S2).

Random Forest Regression Analysis
The Boruta feature selection of explanatory variables achieved parsimonious models for the abundance of each species, retaining 30 (An. gambiae s.l.), 30 (An. funestus), and 26 (An. paludis) variables for the combined Lodja and Kapolowe data ( Table 5). RF regression analysis ranked the relative importance values of the biogeographical variables in relation to An. gambiae s.l., An. funestus, and An. paludis abundance. These importance rankings indicate that, for all species at both sites, the climatic variables of rainfall and land surface temperature (LST) were predominantly identified as being of the highest importance, although the month in relation to mosquito collection (i.e., the month in which a given mosquito sample was collected or any of the preceding five months) varied between species and sites.
An. paludis numbers were greatest in July-August 2015, December-January 2016, and August 2016 (Figure 4). Although far fewer An. funestus were collected in Lodja, the largest counts were collected in January-February 2015 and February 2017. For Kapolowe, An. gambiae s.l. abundance peaked in February-March 2016 and December-March 2017, with the highest counts of An. funestus observed in April 2017. Although fewer An. paludis were collected, the highest capture numbers were observed in February-April 2016.

Land Cover Classification
The land cover classifications successfully characterised the rural complex and wetland areas across the Lodja and Kapolowe study areas (Figure 5), and the land cover class

Predictive Risk Modelling
RF-predicted abundance surfaces were generated for each month and for each mosquito species at the two study sites (January 2015-December 2017 and January 2019-December 2019 for Lodja and January 2016-December 2017 for Kapolowe). The predicted surfaces for all months for all species are presented in Supplementary Information (Figures S3-S20). Specific examples showing the spatial variability in predicted abundance for the three species in Lodja in November 2015 are presented in Figure 6, and Figure 7 shows the seasonal variability in predicted abundances over a year, which here is for An. funestus in Kapolowe in 2016. Remote Sens. 2022, 14, x FOR PEER REVIEW 15 of 33

Random Forest Regression Analysis
The Boruta feature selection of explanatory variables achieved parsimonious models for the abundance of each species, retaining 30 (An. gambiae s.l.), 30 (An. funestus), and 26      An. gambiae s.l. exhibited considerable variability in predicted mosquito abundance, both spatially and temporally, in the Lodja and Kapolowe study areas. Temporal changes in predicted abundances in Lodja broadly followed patterns observed in the field survey data, with peaks consistently occurring in April-May and October-December each year (Figures S3-S6). Spatial variability in predicted abundance was also marked, with areas of higher abundance varying monthly. The highest abundances were observed around the Lukene River and the town of Lodja, for example, in November 2015 ( Figure 6) compared with surrounding areas. This could potentially be due to the increased opportunities for pools of water to form/remain in close proximity to the river, and due to the presence of fishponds (static water) found in close proximity to Lodja. Kapolowe also exhibited seasonal variability in predicted An. gambiae s.l. abundance, with lower abundances predicted for May-September in both 2016 and 2017, with higher abundances predicted for other months. The highest predicted abundances occurred in February in both 2016 and 2017 ( Figures S7 and S8).
An. funestus exhibited more stable seasonal patterns in predicted abundance for Lodja, with the highest localised abundance peaks occurring in January-  (Figures S13 and S14). Spatially, higher abundances were consistently observed for the Lake Tshangalele area, comprising static water and swamp.
Peaks in An. paludis predicted abundance for Lodja typically occurred in July-August, with broadly lower abundances predicted during the other periods ( Figures S15-S18), suggesting that these periods could consistently be periods of high risk for An. paludis in this area, when abundances of An. gambiae s.l. and An. funestus are conversely at their lower levels. Spatially, abundance peaks were distributed across the study area and did not exhibit consistent ties to specific landscape features, although higher peaks were sometimes (such as August 2015 and July-August 2016) observed in close proximity to the town of Lodja. This, along with the Random Forest variable importance rankings ( Table 5) is suggestive that abundances are driven principally by rainfall and land surface temperature patterns. For Kapolowe, a higher predicted abundance was generally observed in some sections of the study area from April to June, with broadly lower abundances predicted for all other months ( Figures S19 and S20). Spatially, the Lake Tshangalele area generally retains higher predicted abundances than the immediate surrounding area in April-October, although the highest predicted abundances were observed in areas to the east and southwest of the study area with higher levels of forest cover.
The predicted mosquito abundance values were extracted for the locations and times of the validation data, with predicted versus observed abundances compared in a series of scatter plots (Figure 8). This shows a high level of correspondence between the predicted and observed abundance values, with R 2 values of 0.707 for An. gambiae s.l., 0.861 for An. funestus, and a lower value of 0.448 for An. paludis. When peaks of predicted abundance occur in areas of low human population densities, malaria risk may remain low. However, if peaks of mosquito abundance coincide with areas of higher population presence, this can result in increased malaria risk in these communities. To assess the implications for the local community, predicted abundances were extracted for each building, for each species, and for each month. Assessing abundances on a building-by-building basis is valuable, as biting behaviour occurs mostly at night when people are in their homes. Therefore, identifying patterns of risk for individ- When peaks of predicted abundance occur in areas of low human population densities, malaria risk may remain low. However, if peaks of mosquito abundance coincide with areas of higher population presence, this can result in increased malaria risk in these communities. To assess the implications for the local community, predicted abundances were extracted for each building, for each species, and for each month. Assessing abundances on a buildingby-building basis is valuable, as biting behaviour occurs mostly at night when people are in their homes. Therefore, identifying patterns of risk for individual residences can aid specifically targeting these locations with preventive measures. Figure 9 shows, for a localised area of Kapolowe in February 2017, how An. gambiae s.l. abundance varied considerably over relatively short distances. Buildings on the edge of Kapolowe, which are closer to woodland and arable land cover, exhibited higher abundances than those of more central buildings.
Remote Sens. 2022, 14, x FOR PEER REVIEW 22 of 33 ual residences can aid specifically targeting these locations with preventive measures. Figure 9 shows, for a localised area of Kapolowe in February 2017, how An. gambiae s.l. abundance varied considerably over relatively short distances. Buildings on the edge of Kapolowe, which are closer to woodland and arable land cover, exhibited higher abundances than those of more central buildings.  Figures 10 and 11 display the distributions of predicted abundances for each mosquito species for all buildings within Lodja and Kapolowe, respectively, and their change over the duration of the study period. This shows how the predominant species (and therefore biting pressure from those species) differs throughout the study period. For example, Figure 10 shows that, in December 2016, far more buildings had a higher predicted abundance of An. gambiae s.l. than that of An. paludis; however, the opposite was true for July-August 2016. Conversely, An. funestus had a more static profile, reflecting its preference for more permanent water bodies.
The climate and temperature for Kapolowe fluctuate more than they do for Lodja, so the seasonal drop in An. gambiae s.l. numbers in May-September (the dry season) is more pronounced (Figure 11). This shows that a large number of buildings have higher abundances of An. gambiae s.l. during the wet season, with peaks in An. funestus occurring slightly later when An. gambiae s.l abundances drop, with a particularly high peak in abun-  Figures 10 and 11 display the distributions of predicted abundances for each mosquito species for all buildings within Lodja and Kapolowe, respectively, and their change over the duration of the study period. This shows how the predominant species (and therefore biting pressure from those species) differs throughout the study period. For example, Figure 10 shows that, in December 2016, far more buildings had a higher predicted abundance of An. gambiae s.l. than that of An. paludis; however, the opposite was true for July-August 2016. Conversely, An. funestus had a more static profile, reflecting its preference for more permanent water bodies.
Remote Sens. 2022, 14, x FOR PEER REVIEW 23 of 33 exhibited a more stable profile at Kapolowe; however, it did exhibit a very high peak abundance for a large number of buildings in April 2017.

Discussion
In this study, cost-free multi-temporal remote sensing data, Google Earth Engine, and in situ mosquito sampling data were utilised to identify variables influencing the distribution and abundance of three mosquito malaria transmission vectors and the relative importance of their association. This research, which would not have been possible using ground-based surveys alone, advances the understanding of how environmental dynamics in the study regions influence the distributions of these mosquitos, and it provides risk maps predicting seasonal hot-spots of mosquito abundance in this highly endemic malaria region in the DRC. The climate and temperature for Kapolowe fluctuate more than they do for Lodja, so the seasonal drop in An. gambiae s.l. numbers in May-September (the dry season) is more pronounced (Figure 11). This shows that a large number of buildings have higher abundances of An. gambiae s.l. during the wet season, with peaks in An. funestus occurring slightly later when An. gambiae s.l. abundances drop, with a particularly high peak in abundance in April 2017. Similar to at Lodja, this demonstrates that different species may be driving biting pressure at buildings at different locations and time periods. An. paludis exhibited a more stable profile at Kapolowe; however, it did exhibit a very high peak abundance for a large number of buildings in April 2017.

Discussion
In this study, cost-free multi-temporal remote sensing data, Google Earth Engine, and in situ mosquito sampling data were utilised to identify variables influencing the distribution and abundance of three mosquito malaria transmission vectors and the relative importance of their association. This research, which would not have been possible using ground-based surveys alone, advances the understanding of how environmental dynamics in the study regions influence the distributions of these mosquitos, and it provides risk maps predicting seasonal hot-spots of mosquito abundance in this highly endemic malaria region in the DRC.

Identifying Spatio-Temporal Drivers of Malaria Risk
Our first objective, to identify the key variables driving Anopheles mosquito species distribution and abundance, can be answered from our analysis. The distributions of An. gambiae s.l., An. funestus, and An. paludis are, in the Lodja and Kapolowe study areas, related to a number of biogeographical variables, but amongst these, principally, is a strong association with rainfall and land surface temperature. Previous studies have demonstrated that the amount and seasonality of rainfall in sub-tropical Africa drastically affect the occurrence and productivity of mosquito larval habitats, limiting the distribution of certain Anopheles species [8]. Rainfall is of particular importance, as higher rainfall generates more larval sites, particularly for species such as An. gambiae s.l., which rely on temporary pools for larval habitats. With the onset of higher rainfall, it follows that mosquito abundance increases. The time it takes from an egg to become adult is around 10-11 days [74], so mosquito numbers are expected to be higher if increased rainfall occurs during both the month in which mosquitos are sampled and during the preceding month, as was observed here for An. gambiae s.l. and An. paludis. An. funestus preferentially exploits permanent water bodies and is therefore less vulnerable to short-term fluctuations in rainfall. It should be noted that other previous studies, such as [75], did not find rainfall to be important. While conducting continental-scale species distribution modelling of An. gambiae, [75] found that temperature and vegetation (characterised by annual NDVI and NDVI variability) were important predictors but that precipitation was not, suggesting that their analysis shows this because rainfall can be highly correlated with other variables that characterise the aridity gradient.
Land surface temperature influences both mosquito abundance and malaria incidence. In general, African malaria vectors are most suited to a temperature range of 13-35 • C [76,77]. Land surface temperature, for example, was shown to be the main environmental variable driving malaria vector abundance in Ethiopia [78]. Temperature also influences the extrinsic incubation period of parasite development in the mosquito, which is a key factor in malaria transmission [79]. Land surface temperature has been shown to be a strong predictor of the deadliest and most common form of human malaria, Plasmodium falciparum [80].
Although rainfall is crucial for providing larval sites, the factors driving mosquito abundance are complex, and other factors are simultaneously at play, for example, vegetation, slope, and soil percolation, which vary in each area. Changes in development time from immature aquatic stages to adults can also be affected by several factors. In laboratory studies, temperature is known to influence developmental time and subsequent larvae and adult survival [81], but it can also influence larval competition in water pools, potentially leading to smaller adults and in turn affecting adult survival [82]. Furthermore, within a single egg batch, early and late hatching was reported with increasing development times for late hatches [83]. Late hatching is believed to be an adaptive strategy for dealing with unstable larval sites (e.g., desiccation) and competition. Although rainfall is identified here as a key variable influencing the abundance of all three mosquito species, other variables are also influential. For example, both the proportional cover of and the distance to forests and fallow was found to be of high importance for An. gambiae s.l., An. funestus, and An. paludis.
This was also reported for other Anopheles species [84], with forest edges providing refugia and non-human hosts for feeding.

Spatio-Temporal Risk Maps
The second objective was to predictively model mosquito distributions and their seasonal variability, and here, both spatial and temporal variability in predicted mosquito abundance for all three mosquito species is demonstrated. The predicted abundance maps illustrate the importance of considering temporal variables, such as seasonal rainfall and land surface temperature patterns, in addition to temporally static variables. Here, seasonal rainfall and land surface temperature are both identified as key drivers of mosquito abundance, resulting in clear variability in predicted abundances throughout the year in response to these variables. By improving the capability of identifying seasonal hot-spots of mosquito abundance, the improved targeting (both of specific localised areas of high risk and of high-risk periods during the year) of intervention measures, such as larval source management (LSM), is possible. Given increasing insecticide resistance and the resulting reduction in effectiveness of some intervention methods (theorised to contribute to the increase of five million recorded cases in 2016 compared with 2015 [3]), LSM interventions may become increasingly important for wide-scale malaria control. In many malariaendemic regions, disease control activities and monitoring over large geographical areas are exceptionally challenging for resource-constrained nations, such as the DRC, where a large proportion of mosquito-borne disease cases are found but where monitoring cannot currently be performed.
The response to getting the targets of the GTS back on track has been the 'High Burden to High Impact' (HBHI) initiative led by the WHO and Roll Back Malaria partnership [85]. This strategy has been adopted by high-burden countries and is underpinned amongst other factors by more timely and granular data that can be used to inform how limited resources can be used for maximum impact and that can improve targeted policies to empower countries to tailor the optimal mix of tools for a range of settings and engagement with sectors beyond health. The approach we used in this study, therefore, fits neatly with the HBHI initiative. By applying methods such as those presented here, high-risk areas, down to the individual building level, can be identified in a cost-effective manner, enabling the prioritisation of high-risk target regions for surveillance, monitoring, and treatment [8], thus shrinking the geographical extents that need targeting in control activities. This can, in turn, make an important contribution towards reducing morbidity and mortality from this preventative disease in the DRC, and it can have a greater impact on reducing malaria infection rates in children under the age of five, for whom the majority of deaths from malaria are recorded. As such, it is of particular use for policy makers in developing effective regional frameworks for malaria control and eradication [35], in line with the HBHI. The value of utilising malaria vector risk maps for intervention decision making was demonstrated in Zambia. Prioritising communities to receive indoor residual spraying (IRS) based on the predictive probabilities of An. funestus led to the greatest decrease in malaria, compared with concentrating IRS within a geographical area or prioritising IRS to communities based on observed malaria incidence [86]. It is possible that a similar approach could be adopted for other interventions.
Given that the mosquito species in this study bite in the evening, the type of a building can influence risk. A building that is identified as high-risk based on mosquito abundance may have less of a malaria risk if it is unoccupied during the evening, e.g., an industrial or commercial building. A second consideration is house structure, which can also influence malaria risk [87]. It was beyond the scope of this current study to determine these two points, but they will be factored into future mapping.

Transferability of the Method
This method was designed to be transferable to enable wide uptake by practitioners in the field. To accompany this manuscript, a user manual was developed in mul-tiple languages (English, French, and Spanish) and was made openly available (https: //doi.org/10.5281/zenodo.6973662, accessed on 1 August 2022), designed to provide non-specialists with step-by-step instructions, thus enabling independent implementation of these methods and avoiding reliance on external specialists [45]. By removing this limitation and empowering local disease control agencies to integrate Earth observation methods as a core element in their activities, it can enable them to independently generate frequent maps of predicted mosquito abundances and distributions for user-defined time periods and study areas as raster data products. This can enable the identification of hotspots of mosquito abundance-and therefore of potentially increased malaria transmission risk-both spatially and seasonally, allowing more specific targeting by disease control activities to concentrate on areas of the highest risk. As such, it is hoped that significant improvements can be made in targeting resource-constrained disease control programmes, reducing the burden of disease in malaria-endemic areas. Although the methodological framework presented is transferable, it does require input data from the local study areas in question to build site-specific predictive models. Although the utilized EO datasets are available globally via Google Earth Engine, in situ longitudinal mosquito data collection for the study areas of interest (here, collected monthly) from multiple locations within the study areas are required. These data must be collected over at least one year to enable variability in mosquito abundance to be related to temporally dynamic variables, such as annual rainfall patterns. Appropriate training and validation locations for conducting the land cover mapping stage of the analysis are also required. If these datasets are available, the presented methods should be implementable across other malaria-endemic regions and for other relevant mosquito species.

Future Development
This study identifies the key variables influencing mosquito distributions and abundance in the Lodja and Kapolowe study areas; however, the limited sample size of mosquito survey locations is acknowledged. Although predicted abundance maps for all three species at the two study sites were generated, and although the validation for the observed versus predicted abundance produced strong levels of correspondence, future work should look to expand the number of mosquito sampling locations, increasing their geographical spread and the longitudinal time period over which they are collected. There is limited understanding of the taxonomy of secondary African malaria vectors, as recent work has revealed ambiguous species identification of An. paludis samples, which are now subject to further genetic analysis. Host population abundance was not considered in this study with regarding malaria risk but could be factored into future work.
Further work could also usefully assess the transferability of these methods to different malaria-endemic areas and other mosquito species. This can enable the testing of different modelling methods at different locations to compare the accuracy, ease of application, transparency, and transferability between sites of different methods, which can progress towards the development of a more generic framework approach to modelling mosquito abundance using remote sensing datasets.
Additionally, the spatial resolution of the remote sensing datasets used here (10 m Sentinel 1 and 2 data) may preclude the detection of small water features, which could act as mosquito larval habitats, including features under tree cover. Evidence does, however, suggest that, in sub-Saharan Africa, water bodies under trees are not productive Anopheles habitats, with mosquitoes often preferring sunlit pools for larval habitats [7], potentially reducing the influence of these features. Future studies could additionally conduct dronebased surveys at higher levels of resolution to identify these features and to inform the extent to which the current method is detecting them. Further longer-term studies could also test the hypothesis that certain wetlands can act as refugia for mosquitoes (and therefore also for malaria transmission) during the dry season [32,88], serving as reservoirs for mosquitoes and facilitating the expansion of malaria transmission through the broader region in the wet season [19,32]. This potentially includes sites such as fish ponds, which are present close to Lodja, and permanent wetlands near Kapolowe. Sites such as these, once identified, could be ideal targets for the application of larval source management activities.

Conclusions
This study confirms the potential of cost-free remote sensing data, Google Earth Engine, and Random Forests to generate temporally repeatable predictive models of mosquito distribution and abundance over broad study areas to the individual building level. Although demonstrated here on a regional level, the computational capacity of Google Earth Engine offers opportunities for applying these methods to many different regions via cloud computing given suitable in situ data. This overcomes a major challenge in many malaria-endemic regions, where resource constraints inhibit broad-scale ground-based monitoring, and where cost and infrastructure requirements have historically limited the uptake of monitoring via remote sensing. Although the potential of such methods are often recognised, accessibility to and the practical application of these methods are often limited by a lack of Earth observation specialist knowledge. The development of the training materials describing the implementation of the methods described in this article can overcome this reliance on EO specialist knowledge, which historically may have limited the use of EO analysis in malaria programmes. By providing users with the means to implement these methods independently within the context of their own programmes, EO could play an increasingly significant role in informing malaria disease control activities globally, improving the targeting of disease control activities and reducing the burden of this disease in malaria-endemic regions.