Mapping the Habitat Suitability of West Nile Virus Vectors in Southern Quebec and Eastern Ontario, Canada, with Species Distribution Modeling and Satellite Earth Observation Data

: Transmission of vector-borne diseases (VBDs) relies on the presence of their vectors. Good knowledge of their habitat distribution could inform of their presence and then the potential transmission risk. In Canada, West Nile virus (WNV), a VBD transmitted by mosquitoes of the Culex genus to birds, humans, and other mammals, was ﬁrst reported in 2002. Since then, human cases have been reported every year. To reduce the health burden of the disease and to guide the vector control efforts, this work aims to provide a map of habitat suitability of the main vectors of WNV, Culex pipiens-restuans , in southern Quebec and eastern Ontario at 30 m spatial resolution. Landsat 8-OTI/TIRS images were combined with existing geographical data to characterize vegetated and paved areas in urban and peri-urban areas and to create a land use land cover map related to environmental determinants of Culex pipiens-restuans . Landscape metrics were calculated to characterize the neighborhood environment. They were used with 1008 presence sites of the vectors to build species distribution models with Maxent, a model based on the maximum entropy principle, and to predict habitat suitability for Culex pipiens-restuans in the study area. The performance of the models was very good, with a mean area under the curve of 0.92 and a continuous Boyce index of 0.97. A habitat suitability map of the whole study area was created for Culex pipiens-restuans . The resulting map and environment analysis highlight the importance of the edge of vegetation and mixed or paved areas for the bio-ecology of Culex pipiens-restuans .


Introduction
Vector-borne diseases are a major public health concern. According to the World Health Organization, they cause more than 700,000 deaths annually [1]. Mosquitoes are considered one of the deadliest of all vectors [2]. They are responsible for transmitting malaria, chikungunya, zika, dengue, yellow fever, West Nile virus (WNV), Rift Valley fever virus, and others [2].
Culex genus are very important vectors of arboviroses such as WNV in North America. WNV human infection was first reported in North America in 1999, in New York City [3]. It has since spread across North America and reached Canada in 2002 [4]. In eastern Canada, Culex pipiens Linnaeus and Culex restuans Theobald are considered the main vectors of WNV. They are primarily ornithophilic but also feed on humans and other mammals [5,6]. The adults are usually identified as Culex pipiens-restuans (Cx. pipiens-restuans) complex because they are morphologically difficult to distinguish [7]. Spatial distribution of these vectors is defined by landscape characteristics that create suitable sites for breeding, feeding, and resting. Considered as urban mosquitoes [8,9], they have been found in vegetated areas, vectors are capable of developing in artificial containers, such as used tires and bird baths, and in standing water such as detention and retention ponds and roadside ditches [12][13][14][15].
Following the Seventieth World Health Assembly held in 2017 in Geneva, the Global vector control response 2017-2030 [16] was developed. Its aim is to reduce the burden of vector-borne disease through key pillars of action that will enable a sustainable vector control locally adapted. Knowing the spatial distribution of vector habitat is crucial to helping public health authorities guide vector control and surveillance efforts. This information could be made available through use of species distribution models (SDM). These models relate species field observations (presence and absence data, when these latter are available) and environmental variables [17]-usually derived from remote sensing datato produce a habitat suitability map of the species of interest. They were used successfully to predict and map the habitat of disease vectors such as Anopheles genus, vectors of malaria [18][19][20]; Aedes genus, vectors of dengue, chikungunya, zika, and yellow fever [21][22][23]; and Triatoma species, vectors of chagas disease [24,25].
In Canada, [26] produced a historical map of Cx. pipiens and Cx. restuans distribution, and [27] for Cx. pipiens at large spatial resolution. However, this knowledge should be updated and [28] highlighted the importance of characterizing the environment on a scale that is consistent with the vector bio-ecology. The purpose of this research, therefore, is to model the habitat suitability distribution of Cx. pipiens-restuans in southern Quebec and eastern Ontario using a fine spatial scale (30 m). Satellite remotely sensed data were used with species distribution modeling to achieve this goal.

Study Area
The study area covers southern Quebec and eastern Ontario (see Figure 1), a surface area of 27,200,500 km 2 . Two major cities in this area are Montreal in the province of Quebec and Toronto in the province of Ontario. They are the two largest cities in Canada with respective populations of 1.7 million and 2.7 million people [29], followed by Ottawa, with 934,243 inhabitants and Québec with 531,902. All these cities are located in the Mixedwood Plains Ecozone [30], which is characterized by flat plains, a mean summer temperature of about 17 °C, a mean winter temperature of −5 °C, and annual precipitation of 720 mm to 1000 mm.

Maxent
Maxent is a machine learning method widely used in species distribution modeling to estimate the habitat suitability index (HSI) for species [31]. This model is based on the principle of maximum entropy, which consists in estimating an unknown probability distribution over the whole study area by finding the distribution that not only maximizes the entropy but also respects constraints imposed by environmental conditions at location where the species is present (see [31,32] for more details). Concretely, the estimated probability distribution assigns to each pixel of the whole study area a probability that is proportional to the probability of presence, interpreted as the HSI. As input, Maxent requires presence data, which is a set of geographical coordinates of the location where the given species was trapped or observed; a background, which is a set of sites reflecting environmental conditions available in the study area [33] and environmental features, which are a set of environmental variables that are relevant to the species.

Entomological Data
Mosquito capture data, seen in Figure 1 and Table 1, were provided by the Public Health Agency of Canada (PHAC) and Public Health Ontario (PHO). Captures were performed using Centers for Disease Control and Prevention light traps (CDC, [34]), Omni-Directional Fay-Prince Trap (OmniDirectional, [35]) and BG-Sentinel ® traps (BGS, Biogents AG, Regensburg, Germany) between July and September. Mosquitoes were morphologically identified with taxonomic keys [36]. The dataset corresponds to 5239 different capture sites from 2002 to 2019, where 82 different species were identified, and only adult presence data was available. To ensure consistency between entomological and environmental data, only data between 2011 and 2019 were used in this study. To represent the known bio-ecology of Cx. pipiens-restuans or to create a proxy for it, several geographical data were chosen. There are the Annual crop inventory (ACI), a land use land cover map at national scale, provided yearly from 2011 to 2018 by Agriculture and Agri-Food Canada (AAFC); the National Hydro Network (NHN) from the Geobase Series provided by Natural Resources Canada (NRCan) that maps hydrographic features such as lakes, reservoirs, rivers, streams, canals, island, obstacles and construction; the Altitude data from the Canadian Digital Elevation Model (CDEM) provided by NRCan; the Wetlands layers from the Canadian Wetlands Inventory (CWI) of Unlimited Ducks Canada (UDC) and Ministère du Développement durable, de l'Environnement, de la Faune et des Parcs Québec and from Ontario Ministry of Natural Resources and Forestry and a Drainage layer, extracted from Soils of Canada, produced by AAFC, which provides the distribution of soil attributes such as drainage, texture of parent material, and kind of material at the national scale. They are summarized in Table A2 in Appendix B.

Land Use Land Cover
The current ACI layer, which is composed of 71 classes, was created in the context of agricultural inventory. Its urban class includes not only paved areas, but also residential areas (mix of vegetation and build-up areas), parks, water bodies, and wetlands. Cx. pipiens-restuans are known as urban mosquitoes that require vegetation to survive, then to obtain a Land use land cover (LULC) map related to these vectors, a classification of the urban areas (the term urban areas is used here for both urban and peri-urban areas) was made. To define the boundaries of urban areas, a buffer of 500 m around the Urban class of the ACI layer and the roads (from the National Roads Network (NRN) from NRCan) was created. We assumed that inhabited areas alongside rural roads were situated at 500 m from the roads. To cover the study area, 25 Landsat-8 OLI/TIRS scenes, with a cloud cover <20%, were selected from the months of July and August in the years from 2014 and 2018. Adjacent image scenes were combined to create a mosaic before the classification process. Based on the quality assessment provided with Landsat scenes, clouds, and cloud shadows were identified and removed for the index calculation. Indexes such as the normal difference vegetation index (NDVI), the normal difference built-up index (NDBI) and the normal difference water index (NDWI) were calculated. A supervised classification was performed on urban areas with a random forest classifier on calculated indices and the first three factorial axes of the Landsat spectral bands (one to seven). Four classes were defined: tree cover, which represents areas with a high cover of trees; grass vegetation/herbaceous, which is areas with grass cover; mixed area, which is a mix of vegetation and buildings and paved area.
Ground truth observations were randomly selected in the urban areas and were based on observations from Google Earth satellite images. The classification was performed with 75% of training data per class, randomly selected, and was tested with the remaining 25% ( Table 2). The resulting LULC (with an overall accuracy of 0.97 and a Kappa of 0.97, see the Appendix A) was aggregated with non-urban classes of the ACI layer and was completed by NHN and wetlands layers. All agriculture classes were aggregated to be Agriculture. Shrubland and Grassland were aggregated to be Shrubland class and all forest classes and tree cover in urban areas to be the class Forest. The final classes of the LULC map are listed in Table 3.

Geographical Derived Data
Refs. [9,10] highlighted the importance of the neighborhood landscape on Cx. pipiensrestuans populations. To describe and characterize the neighborhood, three kinds of landscape metrics were calculated for each pixel of the study area, the distance between each pixel of the study area and the nearest pixel of the given class; the percentage of the given class in a defined window and the diversity, representing the number of different classes in a defined window.
The defined window was 2 by 2 km, in order to reflect the mean flight distance of Cx. pipiens-restuans (1.37 km, [37]). In this window, the longest distance between the center and the edge is 1.4 km (2 × √ 2/2). Distance and percentage layers were produced for all of the LULC classes. Slope was derived from the DEM layer and the drainage layer was rasterized to a spatial resolution of 30 m. All raw and derived geographical data are summarized in Appendix B. Data processing, classification and landscape analysis were done with free and opensource software, GRASS GIS [38] and R [39].

Model Calibration and Evaluation
In this study, to deal with collinearity, only variables with a Pearson correlation < |0.7| were chosen. Where there was high collinearity between two variables, the most ecologically relevant was retained. In total, 21 environmental variables were chosen (see Appendix B). Cx. pipiens-restuans presence data were considered biased because samples occurred mostly in anthropized areas. Ref. [33] highlighted that sampling bias can strongly impact the quality of the model. To correct the effect of the sampling bias, refs. [40,41] proposed filtering the presence sites based on their environmental characteristics, an environmental filtering. A factor analysis of mixed data [42] was performed on environmental data associated with the presence dataset. Next, a cluster analysis was performed, where the number of clusters is set to half of the number of presence sites [40]. One record was retained per cluster, i.e., a total of 1008 unique sites. Filtered presence sites, 10,000 randomly selected background sites, and 21 environmental variables were used as input for Maxent. Hinge and categorical features were selected for the environmental variables and the raw output was selected. In Maxent, the tuning of the regularization parameter permits to reduce overfitting [43]. It is by default set to 1. To select the optimal value, the R package ENMeval [44] was used. A set of values of regularization parameter varied from 1 to 5 with an increment of 0.5 were used to pre-built Maxent models, and a 5-fold cross-validation were used to calculate the Delta Akaike Information Criterion corrected (AICc, [45]), which is the difference between the AICc of a given model and the AICc of the model with the lowest AICc. The regularization value that permitted to have the lowest delta AICc was selected. The final model was fitted using 70% of the presence sites and tested using 30% and was computed using the version 3.4.1 of Maxent. A 10-fold cross-validation was also used to evaluate the prediction performance. The area under curves (AUC) and the continuous Boyce index (CBI, [46]) were calculated. According to [46], the CBI is better adapted to models that use presence-only data than AUC. Its value ranges between −1 and 1, a negative value means that the model is incorrect, a value close to zero means that the model is not different from a random model, and a positive value means that the predictions of the model are consistent with the presence distribution. The extrapolation option in Maxent was not used to avoid the model predicting in environments for which it was not trained. Finally, the contribution of each variable was estimated with a jackknife test and a HS map was produced.

Results
The test AUC was 0.92, the mean AUC from the 10-fold cross-validation was 0.92, and the CBI was 0.97. The HS map for Cx. pipiens-restuans is in Figure 2. The environ-Remote Sens. 2021, 13, 1637 6 of 14 mental variables that contributed the most to the models are the percentage of mixed area (MIX_PER), the percentage of paved area (PA_PER) and the distance to the nearest mixed area (MIX_DIST), with contributions of 53.5, 25.5, and 13.7% respectively. Other variables have a contribution lower than 5% (see Table 4).
which it was not trained. Finally, the contribution of each variable was estimated with a jackknife test and a HS map was produced.

Results
The test AUC was 0.92, the mean AUC from the 10-fold cross-validation was 0.92, and the CBI was 0.97. The HS map for Cx. pipiens-restuans is in Figure 2. The environmental variables that contributed the most to the models are the percentage of mixed area (MIX_PER), the percentage of paved area (PA_PER) and the distance to the nearest mixed area (MIX_DIST), with contributions of 53.5, 25.5, and 13.7% respectively. Other variables have a contribution lower than 5% (see Table 4).    The response curves of the environmental variables are represented in Figure 3. Among all LULC classes (Figure 3k), mixed area, paved area, and herbaceous are related to the highest habitat suitability index (HSI). For the drainage (Figure 3j), a soil poorly or imperfectly drained has a higher HSI than a soil moderately/well or rapidly drained. Figure 3a shows that the HSI increases to reach its highest value when the MIXED_PER is between 45 and 80%, after which the HSI decreases. The HSI is maximal when the MIX_DIST is at 0 and decreases rapidly with MIX_DIST (Figure 3b). The HSI is maximal for a PA_PER between 5 and 12%, after which it decreases (Figure 3c). The response curve of the percentage of herbaceous in 2 × 2 km (HERB_PER) follows the same pattern than the MIX_PER (Figure 3a,d). The HSI increases to reach its highest values when HERB_PER is at 15%, after which it decreases. For the percentage of forest in 2 × 2 km (FOR_PER, Figure 3e), the HSI is maximal when its value is between 3 and 20%, after which it decreases as the value increases. The HSI is maximal when the distance to the nearest forest (FOR_DIST) is close to 0, after which it decreases rapidly (Figure 3f), whereas the HSI increases to reach its maximal at a distance from the nearest shrubland (SHR_DIST) between 3000 m and 5900 m. The HSI increases rapidly for an altitude (ATL) between 10 and 180 m but decreases to reach 0 after an altitude higher than 500 m (Figure 3i). Figure 3h shows that as the distance to the nearest wetlands (WET_DIST) increases, the HSI increases. imperfectly drained has a higher HSI than a soil moderately/well or rapidly drained. Figure 3a shows that the HSI increases to reach its highest value when the MIXED_PER is between 45 and 80%, after which the HSI decreases. The HSI is maximal when the MIX_DIST is at 0 and decreases rapidly with MIX_DIST (Figure 3b). The HSI is maximal for a PA_PER between 5 and 12%, after which it decreases (Figure 3c). The response curve of the percentage of herbaceous in 2 × 2 km (HERB_PER) follows the same pattern than the MIX_PER (Figure 3a,d). The HSI increases to reach its highest values when HERB_PER is at 15%, after which it decreases. For the percentage of forest in 2 × 2 km (FOR_PER, Figure 3e), the HSI is maximal when its value is between 3 and 20%, after which it decreases as the value increases. The HSI is maximal when the distance to the nearest forest (FOR_DIST) is close to 0, after which it decreases rapidly (Figure 3f), whereas the HSI increases to reach its maximal at a distance from the nearest shrubland (SHR_DIST) between 3000 m and 5900 m. The HSI increases rapidly for an altitude (ATL) between 10 and 180 m but decreases to reach 0 after an altitude higher than 500 m (Figure 3i). Figure 3h shows that as the distance to the nearest wetlands (WET_DIST) increases, the HSI increases.

Discussion
The purpose of this study was to map the distribution of Cx. pipiens-restuans at a fine scale in southern Quebec and eastern Ontario. We used Landsat images to characterize the presence of vegetation and paved areas in urban and peri-urban areas and combined with existing data to create a LULC map more adapted to Cx. pipiens-restuans. Landscape metrics Remote Sens. 2021, 13, 1637 8 of 14 layers were derived and used with the geographical coordinates of Cx. pipiens-restuans presence sites to build species distribution models using Maxent. These models have very good prediction performance, with a mean AUC of 0.92, a test AUC of 0.92, and a CBI of 0.97.
The resulting HS map (Figure 2) shows that a high HSI overlaps highly anthropized areas such as mixed areas, herbaceous, and paved areas. The mixed area class, which is a mix of vegetation and buildings, appears to contribute to the highest HSI (see Figure 3k). The herbaceous class represents grass or low vegetation in urban areas and near roads, which is generally adjacent to mixed areas. This is consistent with [8,47,48] that found that Cx. pipiens-restuans are highly associated with urbanized areas with vegetation. Figure 3k shows that forest and shrubland classes have a low HSI, and the response curve for FOR_PER (Figure 3e) shows that areas with a high proportion of forest are less suitable for Cx. pipiens-restuans than those with low proportions of forest. In fact, the HSI reaches its maximum at a FOR_PER between 3 and 20%, after which it decreases. Ref. [49] found that Cx. pipiens was present in forest canopy in their study area, which was a woodlot of 17 ha in Ontario. Given the window of 2 × 2 km used in our study, the surface area corresponding to 3% and 20% of forest is 6 and 40 ha, respectively, which is consistent with the surface area of the woodlot where [49] trapped Cx. pipiens. Our work highlights that only small percentage of forest contributes to a high HSI. An important observation on the HS map and in Figure 4a is that the interface between forest and mixed areas indicates a high HSI. This is linked to the feeding preference of Cx pipiens-restuans for birds, especially the American robin (Turdus migratorius) [5,50], that roosts mostly in trees in urban environments [51]. The study by [52] highlighted that, after the breeding period, American robins leave their nest, implying a diminution of the number of birds in urban areas. This diminution leads Cx. pipiens to feed on humans, implying a greater number of human WNV cases. These studies showed the importance of birds, their breeding sites, and human presence for these vectors. However, the paved areas, which are linked to human presence, are less suitable for Cx. pipiens-restuans than mixed areas. In Figure 4b, the larger the paved area is, the lower the HSI. Likewise, [11]'s study found that even though Cx. pipiens and Cx. restuans are urban mosquitoes, their numbers were very low in highly urbanized areas. While they could be present in paved areas, the lack of resting and feeding sites due to their ornithophilic nature, as mentioned above, make this environment somewhat less suitable than mixed areas. Figures 3k and 4c show that agricultural areas have a low HSI, unlike ref. [9], who found Cx. pipiens in agricultural land. This can be linked to the dataset used to build the models. Captures occurred in an agricultural area located in proximity to a mixed area, which explains how in Figure 4c the pixels corresponding to agriculture have a low HSI, while edges with mixed area have a higher HSI.
Wetlands have a very low HSI and the closer a pixel is to a wetland, the lower its HSI (Figure 3h,k), whereas [53] found Cx. pipiens-restuans in wetlands. However, it is important to note that in their study, they only mention larva presence, whereas in the present study, only adult data was available and very little sampling was done close to wetlands. Further captures should be made in this environment.
In Figure 3j, poorly or imperfectly drained soils have a higher HSI than moderately/well or rapidly drained soils, which is coherent with [9,54] who found that poorly and imperfectly drained can create temporary breeding sites for these vectors. Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 14  Figures 3k and 4c show that agricultural areas have a low HSI, unlike ref. [9], who found Cx. pipiens in agricultural land. This can be linked to the dataset used to build the models. Captures occurred in an agricultural area located in proximity to a mixed area, which explains how in Figure 4c the pixels corresponding to agriculture have a low HSI, while edges with mixed area have a higher HSI.
Wetlands have a very low HSI and the closer a pixel is to a wetland, the lower its HSI (Figure 3h,k), whereas [53] found Cx. pipiens-restuans in wetlands. However, it is important to note that in their study, they only mention larva presence, whereas in the present study, only adult data was available and very little sampling was done close to wetlands. Further captures should be made in this environment.
In Figure 3j, poorly or imperfectly drained soils have a higher HSI than moderately/well or rapidly drained soils, which is coherent with [9,54] who found that poorly and imperfectly drained can create temporary breeding sites for these vectors.
In this study, hydrological features are not suitable for Cx. pipiens-restuans and layers derived from the hydrological classes do not contribute to the model building. Although this is consistent with [55], who found a weak correlation between the proportion of open bodies of water surrounding their traps and Cx. pipiens populations, in our study no capture was made in or near water bodies, then the HSI in these areas should be taken with precaution. In the study by [9], they found a positive relationship between the distance to hydrographic features and Cx. pipiens-restuans populations, but they highlighted that is linked to the increased urban density in their study area. As these vectors can breed in artificial containers or temporary standing water [12,55,56], and because they prefer artificial habitats [13], their presence is related to breeding sites that humans create.
In [26,27] the HS of Cx. pipiens was observed or estimated with climatic variables across Canada. On their maps, the HSI is high on all the current study area, whereas north In this study, hydrological features are not suitable for Cx. pipiens-restuans and layers derived from the hydrological classes do not contribute to the model building. Although this is consistent with [55], who found a weak correlation between the proportion of open bodies of water surrounding their traps and Cx. pipiens populations, in our study no capture was made in or near water bodies, then the HSI in these areas should be taken with precaution. In the study by [9], they found a positive relationship between the distance to hydrographic features and Cx. pipiens-restuans populations, but they highlighted that is linked to the increased urban density in their study area. As these vectors can breed in artificial containers or temporary standing water [12,55,56], and because they prefer artificial habitats [13], their presence is related to breeding sites that humans create.
In [26,27] the HS of Cx. pipiens was observed or estimated with climatic variables across Canada. On their maps, the HSI is high on all the current study area, whereas north is mostly composed of forest and hydrological features (see in Figure A1 of Appendix A), a less favorable habitat for Cx. pipiens-restuans. These differences may be explained by the difference in spatial scale used in the three studies ( [26,27] and this current study) and the fact that, with a finer spatial scale, as the one used here, it is possible to capture local variation of habitat suitability which is more adapted to target locally vector control activities.

Conclusions
This study permits to have a habitat suitability map for Culex pipiens-restuans in southern Quebec and eastern Ontario. It highlights the importance of characterizing the environment not only by using land use and land cover map but also derived layers associated with proximity (distance to a class) or zonal statistics (percentage of the class in a given window) variables. The paper shows the importance of vegetation presence in urban areas for Cx. pipiens-restuans. Finally, this study permits to complete the current knowledge of the spatial distribution of Cx. pipiens-restuans and the habitat suitability map can be used by Quebec and Ontario provinces to target future vector control activities.  Figure A1. Resulted Land use land cover map. Table A2. Environmental data. Derived layers with an asterisk were not used to build the models.