Monthly Entomological Inoculation Rate Data for Studying the Seasonality of Malaria Transmission in Africa

A comprehensive literature review was conducted to create a new database of 197 field surveys of monthly malaria Entomological Inoculation Rates (EIR), a metric of malaria transmission intensity. All field studies provide data at a monthly temporal resolution and have a duration of at least one year in order to study the seasonality of the disease. For inclusion, data collection methodologies adhered to a specific standard and the location and timing of the measurements were documented. Auxiliary information on the population and hydrological setting were also included. The database includes measurements that cover West and Central Africa and the period from 1945 to 2011, and hence facilitates analysis of interannual transmission variability over broad regions. Dataset: https://doi.org/10.1594/PANGAEA.892682 Dataset License: CC-BY


Introduction
Despite increasing international efforts to reduce its burden, malaria is still a major health problem and a significant cause of mortality in low-income countries. In 2018 for instance, Africa alone accounted for 93% of the worldwide malaria cases and 94% of the global malaria deaths, of which 67% occurred in children under five years of age, and nearly 85% of the global malaria burden was concentrated in 20 countries in Sub-Saharan Africa and India [1]. To improve this situation and make effective progress towards malaria control and eradication, a good understanding of the characteristics and drivers of malaria seasonality and interannual variability is required. For this, the availability of reliable datasets of a range of malaria indicators that span multiple months and seasons is fundamental. Such datasets would allow the evaluation and development of improved models of malaria transmission, which can then be used to study malaria predictability, eradication or response to changing external forcing.
One key metric of malaria is counts of clinical cases. These data can be weekly but are usually aggregated at a monthly timescale. One issue with such data is the quality, since often suspected malaria cases are not clinically confirmed in the laboratory by slide film or rapid diagnostic test kit, and the percentage and method of confirmation may change over time [2]. Changes in reporting rate may occur due to access to health facilities and according to the transmission setting, which can be impacted by interventions. In addition to these issues, such data usually spans a relatively short period, covering at most the decade since digital health management information systems started replacing paper-based records [3], confounding efforts to study the impact of interannual and decadal climate variability. Long series of high quality, clinically confirmed cases are usually restricted to isolated locations such as high land tea plantations or sentinel clinics [4].
The second metric of malaria can be derived from cross-sectional surveys of populations to determine the ratio that has malaria parasites present in blood samples, referred to as the parasite ratio (or parasite rate, PR). This metric provides a time-integrated picture of the malaria transmission setting, since it identifies all population members infected over the extended period equivalent to the mean time for parasite clearance [5]. Parasite rate data have been collected over a wide range of locations and span multiple decades although individual surveys usually only cover shorter periods. The data have considerable uncertainties, with sample sizes often less than 20 individuals and with frequent false negative results derived from slide analysis of blood samples taken from infected individuals [6]. Efforts have been made to aggregate literature published and country-lead surveys into databases at country [7] or continental scales. The Malaria Atlas Project (MAP) provides the benchmark database of global PR across the globe, with the majority of the survey data made freely available to the research and operational communities [8,9]. Analogous to the PR for the malaria vector is the CircumSporozoite Protein Rate (CSPR), which is also subject to measurement uncertainties [10] and remains sparse [11].
A useful supplement to the PR and case number is the Entomological Inoculation Rate (EIR) as it provides a direct measure of the intensity of transmission. The EIR is the number of infective bites per person per unit time, and is usually calculated as the product of the Human Biting Rate (HBR) and the CSPR. The former has often been calculated using Human Landing Catches (HLC), and again is subject to considerable uncertainties. Of particular interest for research purposes are longer term records of monthly EIR to study the seasonal cycle and interannual variability of transmission intensity. However, the complexities involved in organizing field campaigns to take EIR measurements over an extended period implies that the availability of this data on monthly resolved timescales is relatively sparse. At best, for individual locations surveys usually cover a few months or at most one or two years. Nevertheless, there are now a considerable number of field surveys from the past three decades reported in open literature, which could collectively describe malaria transmission seasonality and interannual variability and provide a useful supplement to existing malaria databases. This article describes an effort to collect available monthly-resolved EIR data for surveys with a duration of at least a year, available as a public resource for malaria research. It first discusses how the monthly EIR data were compiled from different sources and collated into a database for public access on the internet. It then provides some application use cases of the data and discusses the merits and limitations of the archived data.

Compiling Sources of Monthly EIR Data
An all-inclusive literature review was conducted using Google Scholar and PubMed search facilities for articles containing monthly EIR (denoted EIR m hereafter) directly, or measurements of vector related quantities that could be used to derive EIR in Sub-Saharan Africa. Keywords of Hay et al. [12] such as "entomological inoculation rate", "biting rate", "sporozoite index/rate", "human landing catches", "light trap catches", "Pyrethrum spray catches", "malaria transmission", "Anopheles gambiae", "Anopheles funestus", "malaria vectors" and "vectorial capacity" were used. The papers found from the search engines were scrutinized for EIR m data. The papers were also searched for article references with potential EIR m data. The titles of these potential references were re-entered into the search engines and the manuscripts recovered. The search strategy was repeated until no new information was obtained. A list of publications with EIR m data from the online search was then compiled. The search also consulted the entomological and parasitological-related papers database compiled by Ermert et al. [13]. From this database, only articles containing EIR m data that were not recovered from the online search were compiled.

Recording EIR m Data from Articles
The EIR m data from the compiled articles were obtained the following ways: Instances where the EIR m data in the articles were graphically displayed (plotted), they were digitized using an R package "digitize" [14]. This tool is designed to efficiently extract data from graphs whose sources were not available. The package allows the user to load the graphical plot, calibrate it and extract the data from it. Cases where the EIR m data were presented in tabular forms were manually recorded. There were also articles where only monthly HBR and their corresponding CSPR were available. For such cases, the EIR m is obtained by multiplying HBR with the respective CSPR monthly value as defined by Macdonald [15]. Olivier J. T. Briët of the Swiss Tropical and Public Health Institute (Swiss TPH), Switzerland who also compiled some entomological parameters into a database by digitizing them from published articles shared the data for use in this work. The EIR m data utilized from his database were cross-validated from their original articles.

Inclusion and Exclusion Criteria
The final EIR m database was built by employing some selection criteria as explained in Beier et al. [16] and Hay et al. [12]. That is, each study location EIR m data recorded were subjected to all of the following conditions:

1.
That the mosquito sampling activity at the location lasted for at least a year; 2.
That the mosquitoes were sampled monthly throughout the study period or the transmission season; 3.
That the biting rates were estimated from standard methods such as Pyrethrum Spray Catches (PSC), Light Trap Catches (LTC) and HLC; 4.
That the proportion of sporozoite-infected mosquitoes were determined using either dissection or Enzyme-Linked Immune Sorbent Assay (ELISA) methods; 5.
That the study took place at the time mosquito control operations were not in effect.
EIR m data from locations that satisfied all of the above conditions were retained and formed the final database; otherwise they were excluded from the final database.

Study Location Information and Classification
Names and geographical coordinates (latitude and longitude) of locations where mosquitoes were sampled for EIR m data were generally obtained from the source articles. In the event that the geographical coordinate of a location was not provided in the source article, the location name was searched in www.bing.com/maps and www.google.com online mapping facilities for the coordinates.
Previous work has shown how malaria transmission relates strongly to population density (Urban (U), Peri-Urban (PU) and Rural (R) settings) in Africa [17]. For this reason, the database includes a classification of the study locations as either R, PU or U. There were minority of cases where a description of the field location as either R, PU or U was included in the source articles. In such cases, the information was taken directly from the source articles. Majority of the locations could not be identified in the source articles as R, PU or U. At such instances, the location population density data was extracted from the version 3 of the Gridded Population Density data of the World (GPDWv3, [18]) using the nearest grid point and a linear interpolation/extrapolation from the three time slices of 1990, 1995 and 2000. Using the extracted population density data, the location was then classified as R, PU or U based on population density thresholds of 250 km −2 and 1000 km −2 as suggested by Hay et al. [19].
Surface hydrology and land cover are also relevant in malaria transmission. Hence, a hydrological classification of each study site was also included in the database. Each location was identified with No Water Body (N), an Ongoing Irrigation (I) or Permanent Water Body (PWB). Study locations characterized by marshlands, lakes, rivers, streams, dams or swamps were considered as PWB areas. Irrigated locations were regarded as locations with irrigation activities going on in the area or double cropping. Locations without water bodies or an ongoing irrigation activity were regarded as Neutral or No Water Body. The types of hydrology characterizing a study location were obtained directly from the source articles of digitized EIR m data.

Data Records
The spatial coverage of the data and distribution of study locations in Sub-Saharan Africa are displayed in Figure 1. The colored circles indicate the maximum EIR at each location and show that in most cases the EIR does not exceed 50 bites per person per month. The temporal coverage of the database is summarized in Table 1 for four key zones as demarcated in Figure 1 namely: Sahel (lat ≥ 10 • , −20 • < lon < 40 • ), Guinea (5 • ≥ lat < 10 • , −20 • < lon < 20 • ), equatorial West Africa (WA) (lat < 5 • , 10 • ≥ lon < 20 • ) and equatorial East Africa (EA) (lat < 5 • , 28 • < lon < 40 • ). The table shows that 127 records were found that covered between 1 and 2 years. A similar number of records were found in each of the zones, allowing for their intercomparison. A much smaller number of studies cover a multiple year period, indicating that studies of interannual variability must resort to the use of multiple sites aggregated over climatic zones. Details of each location record with article references are presented in Tables 2-4. The full data is archived in an online repository at https://doi.org/10.1594/PANGAEA.892682 [20]. The full database includes: the name of country and site of data survey, the geographical location (longitude and latitude) and elevation of the site, the land use type (urban, periurban or rural), the hydrology of the area (permanent water bodies or irrigation activities), vector species identified at the site, the starting year and month of data record. Value1 to value12 are the months with EIR m records, with value1 corresponding to the starting month and value12 the end month. The data is available for use by researchers and can be freely accessed from the online repository. However, it is important that users duly reference this paper and the repository documentation [20] in their works.      3.2. Application/case use of the Data

EIR Seasonality
A qualitative examination of the data is performed by showing a cursory analysis of the EIR seasonality to confirm if broad relationships with precipitation observed in case data at specific sites are observed. For example, a previous work in Niger, where rains are seasonally associated with the West African monsoon, shows cases following rainfall with a lag of one to two months, confirming general experience of malaria seasonality in the region, which contrasts with that of Central Africa where intense year-round transmission can be sustained [102]. A preliminary examination of the relationship between rainfall and EIR is made in Figure 2. EIR variability is seen to follow that of rainfall closely in the Sahel, Guinea and equatorial EA zones, with a lag of about 2 months in the former two locations, while a longer lag of 3 months is notable in the equatorial EA region. The slower response of EIR is expected there due to the cooler temperatures at the higher altitudes in the latter region. There is little seasonality in the EIR in equatorial WA region, where persistent rainy conditions and warm temperatures sustain year-round transmission. These characteristics of the EIR seasonality in the database confirm findings from previous analysis and provide a qualitative evaluation of its reliability. Further detailed analysis of the link between the observed EIR and climate on seasonal and multi-annual timescales will be pursued in a separate article. The rainfall dataset used in the validation of the dataset is the daily African Rainfall Climatology, version 2 (ARC 2 [103]), a satellite infrared based gridded precipitation product for Africa available from 1983 to date at 0.1 • spatial resolution. The temperature data used in the validation is that of the European Center for Medium-Range Weather Forecasts Interim Reanalysis (ERAI) temperature dataset [104] and available from 1979 to date at a spatial resolution of 0.75 • .

Vector Type
The geographical distributions of the main malaria vectors identified from the publications were also examined and displayed in Figure 3. The major vectors observed include Anopheles gambiae (AG), Anopheles funestus (AF), Anopheles arabiensis (AA), Anopheles nili (AN) and Anopheles moucheti (AM). The most dominant and sympatric vectors were found to be AG, AF and AA. These vectors are known to live long and remain stable in the part of Africa where they exist. Anopheles arabiensis were observed mostly in markedly seasonal rainfall areas such as the Sahel and in dry Savannah zones of East Africa. While the AG vectors were found adaptive to all the variable ecology, AF vectors were mostly confined to relatively humid areas in West and East Africa. Anopheles nili and AM were mostly limited to moist areas around Central and East Africa. The dominant malaria vectors identified corroborate previous works [105][106][107]. The ecological confinements of the malaria vectors may be an indication of climatic factors having an influence on their choice of habitat. The preference of AA for sunlit breeding sites with limited vegetation may explain their isolation to drier Savannah areas [105,106]. The sympatric association of the observed malaria vectors is also supported by existing literature. The AG vectors for instance mostly live in sympatry with AA and AF with all of them sustaining the perennial inoculation of malaria parasite [54,108]. In such sympatry, AG and AF mostly dominate other vectors year-round with peak population in the rainy and dry season respectively [109,110].

Discussion
The dataset is based on a literature review of field surveys and thus the data is secondary and cross-validation of original samples is not possible. Nevertheless, to ensure that selected surveys were inter-comparable, strict control was made on the methods used in the field studies (see Section 2.3 in the methods above). To guard against errors that may occur during the manual procedures of the digitization process, all database entries were subjected to blind confirmation by a second individual who referred only to the original reference to ensure all EIR values, location coordinates, and population/hydrological classifications were correctly registered.
It is worth noting that the study relied on EIR m data that could be obtained during the online search and contacts with researchers. It is, therefore, not claiming to have identified all the EIR m data available in Sub-Saharan Africa. Again, the EIR m estimates were obtained from WHO recommended standard mosquito sampling techniques namely HLC, PSC and LTC. These sampling techniques are not standardized [111], hence, estimates of HBR from each method differs and may not represent the exact individual exposure levels in a study area [112]. The study also acknowledges that the time series of the EIR m data are spatially and temporally limited (see Table 1) since they were unavailable for many settings (see Figure 1). The entomological surveys appear to be concentrated at locations where malaria is prevalent. Future estimates of EIR m should focus on areas with scarce data (see Figure 1) for spatial homogeneity of the EIR m data distribution. The spatial and temporal limitations of the data is due to the fact that the mosquito sampling methods are both labor and capital intensive. For this reason, the daily HBR and CSPR estimations are usually not conducted each day of the month but mostly limited to just one or two days in the month. An average daily value is then determined and scaled up for the month by multiplying the average daily value by the number of days of the month. In areas where mosquitoes are rarely infected or rare, it is disadvantageous to limit the mosquito sampling to just a few days in a month. It is also worth noting that the EIR m estimates are also subjected to mosquito collector skills, their attraction to mosquitoes or instrumental errors [113]. These biases and the uncertainties associated with the digitization processes may have an impact on the accuracy of the EIR m data.
Despite the data collection uncertainties, the results consolidate evidence of the usefulness of the archived EIR m data for research purposes. The data can inform our understanding of how climate and environment may have an influence on the intensity of seasonal malaria transmission, clinical disease and human mortality risks as well as on malaria vector biology. The data are useful for evaluation, validation and improvement of seasonal malaria outcomes simulated by weather-driven dynamical malaria models in Africa. The data can also serve as a supplement to previous works that have described patterns of clinical malaria and morbidity in Sub-Saharan Africa. Information from the data can support decision makers to design robust frameworks for combating malaria. For instance, the data can inform our understanding of perennial and markedly seasonal malaria transmission settings. This knowledge can serve as a guide in the implementation of control measures such as malaria chemo-prevention in children and pregnant women in such malaria transmission settings [114]. It can also help in spatial targeting of control techniques and resource allocation optimization especially in areas where the diversity of the climate and environment may result in seasonal malaria transmission heterogeneity.  Acknowledgments: Sincere gratitude to Katholischer Akademischer Ausländer-Dienst (KAAD) and the University of Cologne, Germany for their support during the PhD studies.

Conflicts of Interest:
The authors declare no conflict of interest.