An Evaluation of the OpenWeatherMap API versus INMET Using Weather Data from Two Brazilian Cities: Recife and Campina Grande

: Certain weather conditions are inadvertently related to increased population of various mosquitoes. In order to predict the burden of mosquito populations in the Global South, it is imperative to integrate weather-related risk factors into such predictive models. There are a lot of online open-source weather platforms that provide historical, current and future weather forecasts which can be utilised for general predictions, and these electronic sources serve as an alternate option for weather data when physical weather stations are inaccessible (or inactive). Before using data from such online source, it is important to assess the accuracy against some baseline measure. In this paper, we therefore evaluated the accuracy and suitability of weather forecasts of two parameters namely temperature and humidity from the OpenWeatherMap API (an online weather platform) and compared them with actual measurements collected from the Brazilian weather stations (INMET). The evaluation was focused on two Brazilian cites, namely, Recife and Campina Grande. The intention is to prepare an early warning model which will harness data from OpenWeatherMap API for mosquito prediction.


Background and Summary
Certain meteorological parameters are risk factors for increased mosquito presence and, thus, are needed for integration into such early warning detection system. The use of statistical forecasts of meteorological indicators of the likes of temperature, humidity, atmospheric pressure and many more, in near-or real-time, are made by collecting as much data as possible about the existing state of the atmosphere. Generally, this information is often applied to many avenues of research for understanding their impacts on other ecosystems, environmental and anthropogenic processes [1][2][3][4][5]. This information can be pulled from online digital sources, such as automatic weather stations, satellite and remote sensing, and through open source weather channels that are online via Application Programming Interfaces (APIs) [6,7].
From an epidemiological perspective, with regional focus on the Global South where most tropical illnesses are endemic, these meteorological indicators play an immense role in the areas of neglected tropical diseases as risk factors for spreading parasitic illnesses such as soil-transmitted helminths, Onchocerciasis, Lymphatic Filariasis and Schistosomiasis [8][9][10]. These weather parameters play a substantial role in vector-or mosquito-borne diseases and are prominent risk factors for illnesses such as Malaria, West Nile Virus and Yellow Fever. The impact of these weather factor plays in two ways: (1) they create an environment or climate that is conducive and habitable for such vectors to thrive (i.e., environmental suitability) [11,12]; and (2) it allows vectors to breed in dwellings inhabited by humans and animals thereby posing a threat to humans [13,14].
In the context of mosquito surveillance and monitoring in vulnerable human populations in Brazil, one of the most sought-after tools is the early warning detection system and prediction of increased mosquito population to support the effort of environmental health officers to pre-emptively combat the burden of mosquito-borne arboviruses. The models developed for such tasks are usually either mechanistic (i.e., compartment or differential equations) models, or spatial spatiotemporal models for inferential purposes, which allows for adjustments for other risk factors, temporality and spatial structure. Such an early warning system can be integrated into a smartphone and web application. Therefore, and to this end, researchers from the University College London's Centre for Digital Public Health and Emergencies (London, UK) are working together with Brazilian academics and stakeholders from the Federal University of Pernambuco (Recife, State of Pernambuco, Brazil), Federal University of Campina Grande (Campina Grande, State of Paraiba, Brazil), University of São Paulo (State of São Paulo, Brazil) and Brazilian environmental agencies to design and develop such application [15][16][17][18][19][20] for it to be piloted in two major cities in the Northeast Brazil that were hit hard by the 2015/16 Zika virus epidemic, namely, Recife [21] and Campina Grande [22]. In this context, considerations are taken for the selection of weather-related risk factors which will be integrated into the application (containing the GPS of properties as well as property-level physical, environmental characteristics and indicators for socioeconomic deprivation and anthropogenic activities collected by agents) to make spatial prediction of current and future lead-time occurrence of mosquitoes. The meteorological parameters obtained from local weather stations from the Brazilian National Institute of Meteorology (INMET) provide historical and current weather data to include in geostatistical models (e.g., spatiotemporal Bayesian or general additive models (GAMs)) for early warning detection of mosquito populations [23,24]. Weather forecast-based sources, such as the OpenWeatherMap API, also provide through an API large amounts of data for the current time (analysis) and for future days based on real-time generated forecasts [25]. Both sources, i.e., INMET and OpenWeatherMap API, are being considered for integration into the smartphone and web application that will be developed. Before doing so, we must Data 2022, 7, 106 3 of 13 evaluate the accuracy and suitability of the OpenWeatherMap API resource versus the INMET database, since the former is a recent resource.
The next section describes the methodology for assessing the accuracy for a selected set of meteorological parameters (namely, mean temperature and relative humidity only) by making comparisons between OpenWeatherMap API and INMET. Here, we will first see how representative the two sources are by assessing the residual difference between the current INMET observations and those of OpenWeatherMap API. A second analysis will also examine the forecast errors within OpenWeatherMap API by comparing those at 1-day, 2-day, 3-day and 4-day against the current measurements (i.e., 0-day) (so as to detect any evidence of forecast drift with the lead-times). Both analyses will take into consideration two broader seasons (i.e., Rainy and Dry). The analysis uses data from two cities, namely, Recife and Campina Grande. The main reasons why we have restricted the analysis to these two cities is the fact that Recife and Campina Grande are two study areas we are currently engage with now. We are conducting collaborative research with the environmental health agencies at these two study cities to combat mosquitoarborviruses, and we intend to develop an early warning system for mosquito population outbreaks. Hence, we are channelling various data streams, especially those that are weather related variables. Thus, we endeavour to compare how similar OpenWeatherMap is with some baseline (i.e., INMET) which the former provides lead time forecast and while the latter provides historical measure, which both, in turn, can be harnessed for such early warning predictions.

Methods
In this section, we describe in detail how the datasets were extracted from two electronic meteorological sources: OpenWeatherMap API (or OpenWeather Global Services) and the Brazilian National Institute of Meteorology (in Portuguese: Instituto Nacional de Meteorologia, INMET), for the two cities in Northeast Brazil. Here, we used data from 1 May 2020 to 28 March 2021 (as this is where we have data available at the time, when we carried out the API extraction from OpenWeatherMap). For Recife, it should be noted that March, April, May, June, July and August were defined as the period of Raining season; while the remaining months were defined as the period of Dryness [26]. Campina Grande, on the other hand, experiences shorter periods of rain during March, April, May and June (whereas the remaining months are dry) [27]. The OpenWeatherMap API is an online meteorological service which provides weather data that includes forecasts and current analysis data to researchers and developers of web-based services and mobile applications. For data sources, it harnesses meteorological broadcast services particularly using raw data from airport weather stations as well as those from radar stations, and data from other official weather stations [25]. The OpenWeath-erMap API, in particular, processes all the data using machine learning to enhance the numerical weather prediction models provided by several data sources e.g., NOAA, Met Office, ECMWF and Environmental Canada (see https://openweathermap.org/technology (accessed on 18 November 2021)). The idea behind the development of OpenWeath-erMap API was inspired by the platforms known as OpenStreetMap and Wikipedia, which make information freely available to everyone; OpenWeatherMap API typically utilises the OpenStreetMap to visualise its spatially referenced weather predictions. It also provides an API with JSON endpoints to make free and unlimited calls which is updated on every hour, to get current values for weather indicators and 3-hourly forecast values stretching up to five days (i.e., these are observed (now) measures, with accompanied by 3-hourly "consecutive" predictions on lead time estimates (going up-to five days in the future) on what these weather variable measurements will be ahead of what was currently observed) and we have compiled these data from OpenWeatherMap Data 2022, 7, 106 4 of 13 API. Note that the current and forecasted values are updated and released every hourly. For our research, which is situated in Recife and Campina Grande, we automated the process via RStudio and Python for extracting the current and 3-hourly weather information for the following parameters: temperature, relative humidity, pressure, cloud cover and weather description, and automatically into MongoDB using the API address: http://api.openweathermap.org/data/2.5/forecast?id=[ID]&APPID=[KEY] (accessed on 21 July 2022).
Given the API key provided by the OpenWeatherMap API services and then setting the station IDs by inserting the values of 3390760 (longitude: −34.8811 and latitude: −8.0539) and 3403642 (longitude: −35.8811 and latitude: −7.2306) to the above connection, we are able to extract the weather parameters for Recife and Campina Grande, respectively, and continuously compile the records into MongoDB. The dataset from OpenWeatherMap API were downloaded as a JSON which have a nested structure (i.e., 3-hourly weather estimates nested within city-level information) ( Figure 1). was currently observed) and we have compiled these data from OpenWeatherMap API. Note that the current and forecasted values are updated and released every hourly. For our research, which is situated in Recife and Campina Grande, we automated the process via RStudio and Python for extracting the current and 3-hourly weather information for the following parameters: temperature, relative humidity, pressure, cloud cover and weather description, and automatically into MongoDB using the API address: http://api.openweathermap.org/data/2.5/forecast?id=[ID]&APPID= [KEY]. Given the API key provided by the OpenWeatherMap API services and then setting the station IDs by inserting the values of 3390760 (longitude: −34.8811 and latitude: −8.0539) and 3403642 (longitude: −35.8811 and latitude: −7.2306) to the above connection, we are able to extract the weather parameters for Recife and Campina Grande, respectively, and continuously compile the records into MongoDB. The dataset from Open-WeatherMap API were downloaded as a JSON which have a nested structure (i.e., 3hourly weather estimates nested within city-level information) (Figure 1).  The INMET database comprises an extended network of automatic and conventional weather stations since the year 1961 and it is the one most recognised [7,23,24]. As of May 2000, there are 584 stations in operation throughout the country and each station may provide hourly, daily, and/or monthly values for the measured meteorological parameters of temperature (average, minimum and maximum), relative humidity, pressure and many more ( Figure 2). We downloaded the full available data for the INMET automatic weather stations located in Recife and Campina Grande with station IDs A301 (with longitude: −34.959239 and latitude: −8.05928) and A313 (with longitude: −35.904831 and latitude: −7.225574), respectively; using the following link: https://tempo.inmet.gov.br/TabelaEstacoes/# (accessed on 21 July 2022).

Statistical Analysis
In this work, we are mostly interested in analysing two weather parameters: temperature and humidity, as these variables are the most important predictors for mosquito presence, which we intend to use for future work, and so the analysis in this paper limited to these two variables only. To iterate, we begin the analysis by examining how representative the two sources are by assessing the residual difference in two steps: First, by comparing the day-by-day (according to time stamp as well) OpenWeather-Map API current measurements with baseline measurements from INMET stratified by season type (i.e., dry or wet). Here, we measure the mean difference between samples and use a 2-sampled unpaired t-test to determine whether the two sources are significantly different from each other. A p-value exceeding the 0.05 is an indication the day-by-day differences are not zero, and, thus, the OpenWeatherMap being different from INMET.
A second analysis focuses on comparing OpenWeatherMap API's future forecasts made for 1-day, 2-day, 3-day and 4-day lead times against its own current baseline measurement for 0-day (so as to detect any evidence of forecast drift with the lead-times) using Taylor diagrams without stratification.
To reiterate, in order to account for broader variations caused by seasonality, the analyses in outlined in 1 are stratified by "Wet" and "Dry" season. In step one, it will involve calculating the mean of the difference between the current and baseline measurements of OpenWeatherMap API and INMET and summarising this information using a series of time series based residual plots, and then computing their overall mean differences and standard deviations accordingly by season type (i.e., "Wet" and "Dry") for each city. The analysis is performed on day-by-day and time stamp level (on a 6-h interval instead of a 3-h-we chose this for easy data management purposes) (e.g., 1 May 2020 00:00:00 UTC, 1 May 2020 06:00:00 UTC, 1 May 2020 12:00:00 UTC and so on) and stratified by season type, i.e., "dry" and "wet", to account for any type of effect modification [4]. Normality for each of these differences was ensured through visual inspection of histograms, and by usage of the Kolmogorov-Smirnov test (where the p-value is greater than 0.05 to ensure that the distribution does not deviate from a normal distribution). Further analysis was conducted through the use of Taylor diagrams to compare each lead time

Statistical Analysis
In this work, we are mostly interested in analysing two weather parameters: temperature and humidity, as these variables are the most important predictors for mosquito presence, which we intend to use for future work, and so the analysis in this paper limited to these two variables only. To iterate, we begin the analysis by examining how representative the two sources are by assessing the residual difference in two steps: First, by comparing the day-by-day (according to time stamp as well) OpenWeath-erMap API current measurements with baseline measurements from INMET stratified by season type (i.e., dry or wet). Here, we measure the mean difference between samples and use a 2-sampled unpaired t-test to determine whether the two sources are significantly different from each other. A p-value exceeding the 0.05 is an indication the day-by-day differences are not zero, and, thus, the OpenWeatherMap being different from INMET.
A second analysis focuses on comparing OpenWeatherMap API's future forecasts made for 1-day, 2-day, 3-day and 4-day lead times against its own current baseline measurement for 0-day (so as to detect any evidence of forecast drift with the lead-times) using Taylor diagrams without stratification.
To reiterate, in order to account for broader variations caused by seasonality, the analyses in outlined in 1 are stratified by "Wet" and "Dry" season. In step one, it will involve calculating the mean of the difference between the current and baseline measurements of OpenWeatherMap API and INMET and summarising this information using a series of time series based residual plots, and then computing their overall mean differences and standard deviations accordingly by season type (i.e., "Wet" and "Dry") for each city. The analysis is performed on day-by-day and time stamp level (on a 6-h interval instead of a 3-h-we chose this for easy data management purposes) (e.g., 1 May 2020 00:00:00 UTC, 1 May 2020 06:00:00 UTC, 1 May 2020 12:00:00 UTC and so on) and stratified by season type, i.e., "dry" and "wet", to account for any type of effect modification [4]. Normality for each of these differences was ensured through visual inspection of histograms, and by usage of the Kolmogorov-Smirnov test (where the p-value is greater than 0.05 to ensure that the distribution does not deviate from a normal distribution). Further analysis was conducted through the use of Taylor diagrams to compare each lead time forecast from OpenWeatherMap API against INMET and to ensure that the results are consistent. The results presented in these diagrams will graph out Root Mean Squared Error (RMSE), Correlation coefficient and Standard deviation. All statistical analysis and visualisations were carried out and generated using RStudio Desktop (version 2022.07.0+548 for MAC).

Technical Validation
By computing the mean difference and its variability by city and season between OpenWeatherMap's and INMET (i.e., reference database), their summary statistics are presented in Table 1. The stratified plots show the point-to-point residual differences between the two sources, it gives an overarching description of the temporal patterns for temperature and humidity, and how the databases differ from each other (i.e., values approximate to zero is an indication that the readings are the same) (Figures 3 and 4). The differences are marginal for temperature with p-values being above 0.05; conversely, the results indicate a significant difference when considering humidity, the mean difference and variability between OpenWeatherMap API and INMET are quite large with SD exceeding ±7 with most p-values being less than 0.01 (Table 1). Figures 5 and 6, as well as results in Table 2 inclusive, provides a graphical (and statistical) summary of how closely the lead time forecasts of "what it will be" observations predicted by OpenWeatherMap API closely matches its own baseline data. Let us consider the output of Figure 6C; the lead time predictions (versus day-0) for humidity in Campina Grande are poor as they jointly show RSME increase from ±7.42 to ±8.15, and its correlation decreasing from 0.76 to 0.70.  Table 2. Shows the reported standard deviation, correlation coefficient and RMSE differences derived from Taylor diagram analysis for the comparison between 1-day (purple), 2-day (green), 3-day (orange) and 4-day (blue) forecasts against baseline measurements (day-0) of OpenWeatherMap API.

Temperature a Wet Season Dry Season
Day-1 (Purple)

Recife
(See Figure 6A) (See Figure 6B Figure 6C) (See Figure 6D Campina Grande (See Figure 6C) (See Figure 6D     Recife is represented by panels (A,B), which show analysis for temperature stratified wet and dry, respectively; Campina Grande is represented by panels (C,D), which show analysis for temperature stratified wet and dry, respectively. The x-y axes are standard deviation, arc represents correlation between the lead time prediction and baseline, and circular contours are the root-mean-squared error differences.

Figure 6.
Taylor diagram illustrates the statistical comparison between 1-day (purple), 2-day (green), 3-day (orange) and 4-day (blue) against baseline measurements of OpenWeatherMap API (i.e., day-0). Recife is represented by panels (A,B), which show analysis for humidity stratified wet and dry, respectively; Campina Grande is represented by panels (C,D), which show analysis for humidity stratified wet and dry, respectively. The x-y axes are standard deviation, arc represents correlation between the lead time prediction and baseline, and circular contours are the root-mean-squared error differences.

Usage Notes
The two datasets described here, (1) INMET, which provides observed historical and current/near real-time measurements for weather, and (2) OpenWeatherMap API, which provides up to five days of future weather predictions, can be linked to city-and/or neighbourhood-level mosquito-borne arbovirus surveillance data to investigate the temporal patterns of mosquito populations in Recife and Campina Grande. It should be noted that such evaluation on accuracy presented in this manuscript is specifically focussed on these two Brazilian cities, and, therefore, the results are strictly representative to Recife and Campina Grande (and not applicable to other cities).
With that in mind, the overarching purpose for drawing weather-related information (in particular temperature and relative humidity) from INMET and OpenWeatherMap API is for their integration into a predictive model for making current and lead-time temporal predictions of mosquito population on a city-or neighbourhood(s) scale. INMET is an established source for weather data and is considered the reference standard to compare OpenWeatherMap API against. In terms of accuracy, we noticed that the overall residual difference for temperature between the two data sources, regardless of season type, are somewhat marginal, which is an indication that readings for temperature from OpenWeatherMap API are marginally similar to those automatically recorded in INMET. However, OpenWeatherMap API's data for humidity differ substantially. The outcome of our examination showed that temperature as a measure from OpenWeatherMap has the least residual mean difference and variability (see Table 1) while humidity is opposite. However, when carrying out the day-by-day comparison, we find the forecasted readings in OpenWeatherMap API to be substantially different from the baseline measures-with this in mind, using OpenWeatherMap API for modelling and making predictions about mosquito population is viable option for temperature; however, for humidity, users should be wary and examine that error's distribution on how far OpenWeatherMap API forecasts deviate from actual observed baseline weather data. This paper limits the comparisons to only two resources (because at the time, the authors were considering OpenWeatherMap API for making lead-time prediction for mosquito populations [15][16][17][18][19][20] with a specific focus on two cities-what would be an interesting possibility is to extend such analysis with other forecasts from other global weather models which might be better than OpenWeatherMap API, such examples include the Global Forecast System (GFS), which is one of many models produced by the US Government's agency National Centres for Environmental Prediction (NCEP). On a final note, a small caution is raised when considering the use of relative humidity for mosquito prediction. It should be noted that it is a function of both water and temperature; hence, its usage in a statistical model should be "standalone" and not with temperature treated as an adjustment to avoid potential issues such as multicollinearity [28].

Data Availability Statement:
The dataset used to produce the results for this paper are ready prepared and available through figshare (https://figshare.com/s/08449337eb8194848c72, accessed on 21 July 2022). Download is not restricted, and usage is regulated by CC BY 4.0 (https://creativecommons. org/licenses/by/4.0/, accessed on 21 July 2022) license. Hence, the original sources i.e., OpenWeath-erMap API (https://openweathermap.org/, accessed on 21 July 2022) and the Brazilian National Institute of Meteorology (INMET) (http://portal.inmet.gov.br/, accessed on 21 July 2022) must be given full attribution. opportunity to thank the Space and Aeronautics Research Institution (National Center for Satellite Technology, King Abdul-Aziz City for Science and Technology, Saudi Arabia) for their support in funding in PhD research conducted by author A.A. in the UK.

Conflicts of Interest:
The authors declare no conflict of interest.