Towards a Semi-Automatic Early Warning System for Vector-Borne Diseases

The emergence and spread of vector-borne diseases (VBDs) is a function of biotic, abiotic and socio-economic drivers of disease while their economic and societal burden depends upon a number of time-varying factors. This work is concerned with the development of an early warning system that can act as a predictive tool for public health preparedness and response. We employ a host-vector model that combines entomological (mosquito data), social (immigration rate, demographic data), environmental (temperature) and geographical data (risk areas). The output consists of appropriate maps depicting suitable risk measures such as the basic reproduction number, R0, and the probability of getting infected by the disease. These tools consist of the backbone of a semi-automatic early warning system tool which can potentially aid the monitoring and control of VBDs in different settings. In addition, it can be used for optimizing the cost-effectiveness of distinct control measures and the integration of open geospatial and climatological data. The R code used to generate the risk indicators and the corresponding spatial maps along with the data is made available.


Introduction
Vector-borne diseases (VBDs) are caused by parasites, viruses or bacteria that are transmitted by a vector [1]. Mosquitoes, ticks, sandflies, triatomine bugs, tsetse flies, fleas, black flies, aquatic snails and lice are known to act as vectors of this kind of disease [2]. The emergence and spread of VBDs in Europe is a function of biotic, abiotic and socio-economic drivers of disease [3]. Public health decision-making generally requires early warning output from frameworks that are based on uncertain information [4,5]. Globalization and climate changes, along with certain socio-demographic determinants can be critical drivers of VBDs. Hence, observing changes in these drivers can help envision, or even predict, fluctuations in many infectious diseases [3,6].
Among all the hematophagous arthropod vectors, mosquitoes (Diptera: Culicidae) are the leading vector of several human diseases. Mosquito-borne diseases (MBDs) include malaria, West Nile virus, chikungunya and Zika virus [7,8]. Blood meals are necessary for mosquito females as the protein source to produce and mature their eggs [9]. This activity allows mosquitoes to transmit several pathogens and parasites, becoming extremely important insects for human health [7]. Taking into account the invasive success of several mosquito species due to worldwide human mobility and trade [10], the development of early warning and control methods seems imperative [11].
Tropical VBDs such as malaria have attracted much attention due to their impact on human health [12,13]. Malaria is, perhaps, the most well-studied VBD in terms of its transmission dynamics and the potential for adaptation. It is known that five species of Plasmodium cause disease in humans [14]. According to the 2019 World malaria report, an estimated 228 million cases of malaria occurred worldwide [15]. Malaria is endemic in more than 100 countries around the world, mainly in sub-Saharan Africa and Asia, and is transmitted through the bite of the infected female Anopheles mosquito.
Considering the importance of vector-borne diseases in human health, it is imperative to work towards creating a suitable framework for an early warning system that would improve our understanding of the connectivity between existing and potential vectorborne risk areas. In this study, we focus upon malaria in Greece, a previously endemic disease that was eradicated in the 1950s by coordinated efforts of the WHO and the local authorities. Sporadic local P. vivax malaria outbreaks during the last decade indicate that targeted vector control remains imperative for preventing re-emergence. We firstly present a host-vector spatially explicit model which integrates entomological, geographical, social and environmental evidence from 12 municipalities of the prefecture of central Greece (see Table A1 in Appendix A for a description of the population and density information of the municipalities in the study), in order to guide the mosquito control efforts. To this end, we focused on the estimation of parameters related to the Anopheles' population dynamics and the corresponding risk measures derived from them. In addition, we presented the basis for an appropriate (semi)-automated, open-source, early warning tool based upon the above estimates of suitable risk indicators. The ultimate goal of this paper in terms of public health would be to use the proposed model, as well as the corresponding spatial mapping of the derived estimates from this model, as an early warning system with meteorological inputs, thus facilitating improved decision making and disease prevention. The R code calculating the risk measures and depicting them on spatial maps of this risk is made freely available on GitHub.

Study Area and Data Collection
The study took place in 12 municipalities of the prefecture of central Greece ( Figure 1). The study area was about 406.000 ha.
Considering mosquito data collection, we used CO 2 traps. Therefore, our data consisted of mosquito female adults, which are the malaria vectors. The traps were set up in 10 areas, consisting of a total of 393 sample-collecting stations in the outskirts of towns and villages of the prefecture of central Greece (see Figure 1 for a visual inspection of all collection places). Breeding sites included canals, rice pads, tanks, etc., most of them previously checked for the presence of mosquitoes using drones. The traps were placed and checked every 10 to 15 days between 6 March 2018 and 29 August 2018. Genus identification was performed using the morphological examination of the mosquito samples. The species Anopheles sacharovi, An. maculipennis, An. superpictus, An. claviger and An. hyrcanus are the most important species in the study region, accounting for approximately 90% of all Anopheles species in the region [11]. We, therefore, suppose that all the positive samplings for Anopheles mosquitoes belong to species that are malaria vectors.
Temperature data were recorded from the meteorological stations of the National Observatory of Athens (NOA). There were 10 meteorological stations of the NOA recording temperature data at the study area. For our modeling, we utilized the average temperature values with a 30-day interval, starting from the first Saturday of each month, between the March and August of 2018. The inverse distance weighting (IDW) interpolation method [16] was additionally used to estimate the average temperatures at the locations where no measurements were available.

The Spatial Predictive Model
In this paper, we present a host-vector model that combines entomological, social, environmental and geographical data to provide estimates on the average infection number due to malaria in central Greece (Table A2). The model has been already presented [11] and here we provide a short summary. In addition to the standard entomological parameters which are used within the well-established Ross-Macdonald mathematical model [17], our model takes into account the potential host population in the central Greece region, related to migrants from regions endemic to malaria [18]. Their prevalence in terms of the P. vivax is determined from the latest World Malaria Report of the WHO [19].
The transmission risk measures obtained by the model are the basic reproduction rate of the disease, namely R 0 , which represents a natural threshold parameter appropriate for disease control since an outbreak may occur only when R 0 > 1. This is indicative of the potential for disease spread should one infective individual starts an outbreak. In addition, we estimate the probability of an individual getting infected, τ = Pr(infection), and the number of expected infections, say E(infections) given by:E(infections) = Pr(infection) × (# of susceptibles). Those two risk measures are conditional upon an outbreak taking off and are only meaningful in the event of introducing infected individuals in the area, as is R 0 .
We denote withR 0 ,τ the corresponding model-based point (typically median) estimates of these measures and we calculate local R 0 estimates for each sample-collecting station using:R where Vec is the vectorial capacity, that is, the expected number of infective mosquito bites that would eventually arise from all the mosquitoes that would bite a single fully infectious person on a single day [20] and is given by: In (1) and (2), m i denotes the numbers of mosquitoes in each station i; α i the biting rate, that is, the percentage of mosquitoes that feed on humans each day; b i the probability a bite produces infection to a human; r i the average daily recovery rate per day; v i the mosquito latent period, that is, the number of days from infection to infectiousness; g i the mosquito mortality rate per day. Especially parameter g i has been shown in previous research [20] to be dependent on the temperature levels, hence we also utilize recorded and/or interpolated temperatures for our calculations. Finally, with c we denote the probability a bite turns a susceptible mosquito to infected, which for our analysis is set to the constant value of 0.5. The parameters α i , b i , r i and v i were sampled from suitable distributions according to the relevant literature [17,20,21], whereas g i changes with the temperature levels [20], which are currently represented by monthly means of temperature [11]. An analytical description of parameters and relevant distributions utilized for the calculations of risk measures is included in Table A3 in Appendix A.
In addition to the entomological part of the proposed model, estimation of the external host component due to the migration is embedded into the risk parameter calculations, by utilizing an exponential kernel function, W ik , (k = 1,2,3) of the form: W ik is used to model the spatial part of the potential hosts, with θ being a weight factor. In (3) d ik denotes the distances from larvae areas, measured during the three periods of potential hosts' monitoring, k. Subsequently, the estimation of the external host component due to the migration is approximated byμ 0i = 3 ∑ k=1 µ 0ik · W ik . This estimated proportion of initially infected hosts,μ 0i , is then multiplied by a pre-specified incidence rate. A deterministic sensitivity analysis was conducted [11] and the overall qualitative picture remained unchanged although the quantitative scale varied somewhat.

Towards a Semi-Automatic Early Warning System Tool
The semi-automatic early warning system tool was developed using the open-access R software environment [22], which is free and compatible with most common programming languages. In particular, R was used for all the steps of the process, including constructing R 0 and other risk measure maps. Similar maps, in the case of other infectious viruses, have been presented elsewhere [23]. A large collection of static maps from various online sources (e.g., Google Maps provided by Google LLC in California, USA and Stamen Maps provided by data visualization and cartography design studio in San Francisco, USA) can be used within R for the effective visualization of the parameters of interest.
The calculations were facilitated using a variety of software packages. Specifically, the dplyr package is used to make data frame manipulation in an efficient way and the ggmap package [24] provides the main methods for accessing and downloading Google and Stamen maps as well as for generating the data graphs of this work. Furthermore, the tools used to construct the graphs presented here can be separated into two main categories by graph form. The first category includes the qmplot function which makes a quick overview of maps along with data points. Herein, the data points are spatial points that reflect the measurements upon both terrain-background based maps (Figures 2-4) and toner-background based maps (Figures A1-A3) and vary in size and in color according to the values of (Ro), (τ) and (E) respectively. The second category includes the stat_density2d function which performs a two-dimensional kernel density estimation with an axis-aligned bivariate normal kernel function (Figures A4-A6). This density function creates a continuous surface by measuring the contribution of each data point on a map area. This contribution is smoothed out from a single point into a region of space surrounding the point. The kernel density estimate evaluated on a grid is given by: where density ϕ is the standard normal distribution and diag σ x 2 , σ y 2 is the bandwidth diagonal matrix which controls the amount and orientation of smoothing induced. The bandwidth plays the role of the covariance matrix of the bivariate normal kernel [25]. The output surface shows where point features are concentrated, by measuring the accumulated intersections of the individual areas. The way of density calculation depends on the bandwidth that uses a default search radius. In addition, the bins parameter is a control parameter that defines the number of contour levels. We used 100 bins for the construction of Figures A4-A6. Code for implementing all the methods in the paper is provided at https://github.com/valadis/EWS_spatial_maps_R_code.git.

Results
In Figures 2-4, the risk maps for the risk parameters of interest are depicted (Table A4 in Appendix A presents descriptive statistics of the estimated risk measures). These maps are based on the coordinates of the data points with the intensity of each risk measure varying with color.
The risk map in Figure 2 depicts the risk as expressed by the median basic reproduction number R 0 for each area, whereas Figure 3 presents the estimated probability of getting infected, τ. Figure 4 shows the number of expected infections E, as estimated by the spatial predictive model.
These visualization maps were created using the ggmap function of ggmap package. Alternative presentations based on other functions such as the qmplot option are also available in the ggmap package. Appendix A includes alternative visualizations of our results, including maps of point estimates using alternative terrain representations (Figures A1-A3) or heat-maps ( Figures A4-A6) and different options may suit different users based upon their needs.
Values of the R 0 above 1 (Figure 2), indicate where the greatest potential for risk is located. The highest risk is primarily located in the area of Lamia, with R 0 reaching values as high as 4. The probability of getting infected from low to high (zero to 0.75) was depicted in the map of Figure 3 and the Lamia region was found to have the highest risk. However, the map also suggests the non-negligible probability of infection (τ estimates between 0.25 and 0.5) in the largely dispersed rural areas of the prefecture. As these rural areas have low populations, the expected number of infections is relatively low as shown in Figure 4, revealing the complementary characteristics of the different risk measures. The highest number of potential infections is concentrated in the wider region of Lamia.
In addition to the risk maps based upon the estimated parameters such as the basic reproduction number R 0 , our approach enables the presentation of the corresponding uncertainty of the estimated parameters (or functions thereof) by way of depicting the associated variability, for example, via the variance or standard deviation. For example, the risk map of malaria transmission based on R 0 ( Figure 2) can naturally be combined with a variability map of the parameter (see Figure 5) in order to provide a more robust tool for the monitoring of transmission of VBDs. This type of combined reporting can be potentially applied to the other measures of risk assessment. Perhaps more importantly, it reveals knowledge gaps since high uncertainty suggests that further sampling is required in those areas in order to reduce this variability.

Discussion
Our study introduces a model-based framework that integrates entomological, geographical, social and environmental evidence in order to examine the potential for malaria resurgence in Greece. This results towards a semi-automatic open-source tool that can be used for the monitoring and early warning of mosquito-borne diseases, such as malaria and West Nile virus. The early warning system takes into account the spatial distribution of risk via suitable mathematical modeling to generate appropriate maps that can adequately describe the spatial variation in risk.
The results suggest that five to six distinct risk areas in the spatial maps with potential for malaria resurgence can be identified by inspecting the generated graphs. Specifically, the areas of higher risk are those close to the municipality of Lamia as expected due to the local rice fields and to a lesser extent in the lowlands around Levadia and Thivae municipalities wherever the obsolete irrigation system is responsible for a large number of Anopheles mosquito breeding sites. Generally, the risk of malaria resurgence is greatly facilitated by the coexisting of people from countries where malaria is endemic and who are engaged in agricultural work in areas where there are Anopheles mosquito breeding habitats such as areas with paddies (e.g., municipality of Lamia) and irrigation canals. These areas can serve as hot spots for the resurgence of malaria.
Considering VBDs as an emerging public health threat worldwide [6,14], we expect our methodology to have a bearing on the evaluation of the epidemic risks of such diseases in a given area. Recently, refugee populations are being hosted throughout Europe, including Greece [26], potentially increasing the risk of vector-borne disease transmission. In addition, several disease vectors in refugee camps in Greece have been recorded [27], indicating a possible risk factor for disease transmission. An early warning system for vector-borne diseases of the sort presented in this paper could be suitable for preventing disease spread.
Our approach could be also used as a tool for the efficient control of mosquito species, indicating time control periods, preventing their expansion and, therefore, the potential for disease transmission. Understanding the mosquito spatial distribution is of importance for public health and a cornerstone for studies aiming to understand their expansion [10,[28][29][30][31]. Studies dealing with species distribution models may set aside important factors that drive these models such as the quality of the training data, as well as critical abiotic factors [10]. Our mosquito sampling method consisted of CO 2 traps. Consequently, our data involved only female Anopheles individuals which serve as vectors of malaria, so that we overcame issues of sex-biased data that may arise by larvae sampling.
Our model accounts, in turn, for temperature fluctuations, which are particularly important in insect performance [32][33][34]. Temperature is the main abiotic factor that determines insect distribution, affecting critical aspects of their life history such as development, survival, reproduction and life span [35][36][37]. This results in a further effect on insect fitness, determining their population dynamics [38][39][40]. Therefore, via accounting for temperature fluctuations our model is tailored towards the precise estimation of mosquito population growth.
Our method has some limitations. It relies on detailed evidence of various types. Data of this kind may or may not be readily available and some sort of approximation is often used. However, such issues are reflected in the uncertainty of the risk measures and the corresponding maps are a natural by-product of the proposed tool. In fact, these maps offer an opportunity since they suggest where additional sampling should occur in order to reduce this uncertainty. In addition, most such data are observational data and do not form part of a randomized controlled experiment. This study represents a typical example, and appropriate counterfactual scenarios are common in the area, including what would have happened had no vector control method been applied. These scenarios are based on established theory though and including the corresponding uncertainty facilitates scientifically honest reporting and leads to additional data collection as described above.

Conclusions
The threat of a rapidly changing planet includes precarious spatial-temporal change dynamics associated with VBDs [41]. For example, disease transmission may vary strongly and unimodally with temperature [6]. This poses new conceptual and practical challenges in relation to sustainable health resilience, and therefore timely control of VBDs may act towards this direction. We expect that our attempt towards a semi-automatic early warning system tool we developed can potentially aid the monitoring and control of VBDs in different settings. In addition, it can be used for optimizing the cost-effectiveness of distinct control measures and the integration of open geospatial and climatological data. Perhaps more importantly, this system can enhance our ability to predict the risk of disease outbreaks in different climatic conditions. The developed methods can be adapted for usage in countries with similar vector-borne disease potential. To this end, the R code producing and depicting the risk indicators is provided in the supplement.

Data Availability Statement:
The data presented in this study are openly available in https://github. com/valadis/EWS_spatial_maps_R_code.git.

Conflicts of Interest:
The authors declare no competing interests.

Appendix A
Alternative representations of risk through spatial maps in R.