Climate: An R Package to Access Free In-Situ Meteorological and Hydrological Datasets for Environmental Assessment

: Freely available and reliable meteorological datasets are highly demanded in many scientiﬁc and business applications. However, the structure of publicly available databases is often difﬁcult to follow, especially for users who only deal with this kind of dataset on occasion. The “ climate ” R package aims to ﬁll this gap with an easy-to-use interface for downloading global meteorological data in a fast and consistent way. The package provides access to different sources of in-situ meteorological data, including the Ogimet website, atmospheric vertical sounding gathered at the University of Wyoming’s webpage, and hydrological and meteorological measurements collected by the Institute of Meteorology and Water Management—National Research Institute (i.e., Polish Met Ofﬁce). This article also provides a quick overview of the key functionalities available within the climate R package, and gives examples of an efﬁcient and tidy workﬂow of meteorological data within the R based environment. The automation procedures included in the packages allow one to download data in a user-deﬁned time resolution (from hourly to annual), for a user-deﬁned time span, and for a speciﬁed group of stations or countries. The package also contains metadata, including a list of available stations, their geospatial information, and measurement descriptions with their units. Finally, the obtained datasets can be processed in R or exported to external tools (e.g., spreadsheets or GIS software).


Introduction
Meteorological conditions are key factors in many areas of human activity such as agriculture, transport, power engineering, insurance and risk assessment [1], industrial and marketing planning [2], tourism, sport, mass events [3,4], national security, and many more where atmospheric conditions may have a direct or indirect impact [5][6][7][8]. Besides the financial and safety relevance of meteorological and hydrological datasets [9], this kind of information is very often crucial to reliably answer a scientific problem [10], which heavily relies on the quality of meteorological dataset used in this kind of research.
National meteorological agencies collect in-situ measurements of the highest quality according to the standards of the World Meteorological Organization (WMO). They are simultaneously responsible for maintaining and sharing their archived databases. A significant part of the meteorological data is available for free from the global exchange of the surface synoptic observations (SYNOP), meteorological information used by aircraft pilots (METAR), or upper air soundings (TEMP) reports. of its origin in a tidy tabular form [20] that is suitable for various visualization and processing applications. Abbreviations of the variables are specified according to the WMO standards and were added to the package documentation. Relevant dictionaries attached to the climate package can be read by imgw_meteo_abbrev or imgw_hydro_abbrev commands. The created package also contains a database that clarifies the variables' metadata and geographical coordinates of each stations' location. Thanks to this feature, users can directly use the output data in geospatial analysis using R [21] programming language or external GIS software.

Methods and Materials
The climate package is distributed under the MIT license. However, users are obliged to follow the regulations provided on the respective webpages, as the package only provides an interface to the official repositories. The most stable version of the climate package is available at the Comprehensive R Archive Network (CRAN), while its developer version is hosted on the GitHub platform at http://rclimate.ml (mirrored to: https://github.com/bczernecki/climate), where third-party users can contribute to its further development.

Installation and User Guide
The climate package can be installed and run on any modern computer with the R environment version 3.1 or higher. The package was tested on a wide span of Windows instances and several Linux and Mac OS X distributions, and has positively undergone numerous tests before being published in the CRAN repository. The authors also deliberately avoided using external libraries in order to reduce possible dependencies or installation issues. The stable version of the climate package hosted on the official CRAN repository can be installed with the R's install.packages("climate") and activated using the library(climate) commands respectively. The development version is hosted on the GitHub platform at (https://github.com/bczernecki/climate), where all instructions for installing and using the package are provided. Additionally, users are encouraged to contribute, leave feedback, or suggest their own ideas for further improvements that may be added in future releases.

Datasets
Archived data stored at (1) www.ogimet.com, (2) the University of Wyoming's atmospheric sounding database and (3) in the official IMGW-PIB's repository, constitute the primary sources for the data in the climate package ( Figure 1).  The historical sounding (i.e., upper air from the University of Wyoming's repository) observations are not available on the Ogimet website. Therefore, this capability was added to the climate package due to the high demand for this kind of information among severe weather community, where it is commonly used for analyzing thermodynamic and kinematic atmospheric parameters [22,23]. This is also crucial information for identifying the atmospheric processes responsible for air quality problems [24]. The measurement interval is in most cases 12 hours (i.e., at 00 and 12 UTC, occasionally on some stations at 06 UTC and 18 UTC) and the data are usually available a few hours after beginning of the measurements. The sounding (also known as "rawinsonde") data has 11 columns representing the instantaneous measurement of the atmospheric vertical profile for a single station and time. The IMGW-PIB (i.e., Polish hydro-meteorological) dataset contains measurements back to the 1950s, and the database is continually being updated, usually on a monthly basis. The meteorological data in the repository is divided, according to the hierarchy of stations, into (1) synoptic, (2) climatological, and (3) precipitation data. The synoptic and climatological stations consist of (1) hourly, (2) daily, and (3) monthly time intervals. The precipitation stations have no measurements at an hourly interval. The synoptic data are the most extensive and contain over 100 meteorological parameters. The climate data describes four essential meteorological components: air temperature [ • C], wind speed [m · s −1 ], relative humidity [%], and cloudiness [octants]. The precipitation data consist of the amount of precipitation with a description of the phenomena or surface precipitation type (i.e., rain, snow, snow cover height). Due to a relatively broad range of parameters obtainable for the meteorological data, the authors have thus decided to include a "vocabulary" that contains column names (i.e., meteorological parameters) in a (1) short, (2) more descriptive, or (3) original (Polish) forms. The hydrological data in the IMGW-PIB repository contains (1) daily, (2) monthly, and (3) semi-annual/annual measurements. All hydrological data uses the hydrological year, which begins on November 1st and ends on October 31st. Regardless of the temporal resolution, the hydrological data contains measurements of the maximum, mean, and minimum for the following: water flow [m 3 · s −1 ], water temperature [ • C], and water level [cm]. Additionally, the daily dataset includes characteristics of the ice and overgrowth phenomena observed at the station. Similar to the meteorological dataset, a user can decide whether to add an extra description to the column names.

Core Functionality of the Climate R Package
The climate package currently consists of 21 functions with ten of them visible for the end-user ( Table 1). Three of them are intended for downloading meteorological data, one for hydrological data, and four are auxiliary functions to improve the legibility and improve data exploration capabilities. Despite a relatively large number of functions that might be potentially used, there are four main functions called meteo_ogimet, sounding_wyoming, meteo_imgw and hydro_imgw that are generic wrappers for other functions. They allow for simplified downloading of any requested data in a convenient way. All available functions are documented on the package website and inside the built-in R help system where the exemplary code is also provided.

Ogimet Meteorological Data
The generic function for downloading decoded SYNOP reports from the Ogimet repository requires defining a set of arguments according to the schema provided below for the most generic meteo_ogimet function. meteo_ogimet(interval, date, coords, station, precip_split) where: • interval -temporal resolution of the data ("hourly", "daily") (argument not valid for: ogimet_hourly and ogimet_daily functions) • date -start and finish dates (e.g., date = c("2018-05-01", "2018-07-01") ) -character or Date class object • coords -logical argument (TRUE or FALSE); if TRUE coordinates are added • station -WMO ID of meteorological station(s). Character or numeric vector • precip_split -whether to split precipitation fields into 6/12/24 h, numeric fields (logical value = TRUE (default) or FALSE); valid only for an hourly time step

Sounding Data
The proposed solution is based on the decoded TEMP sounding (radiosonde) reports hosted on the University of Wyoming (http://weather.uwyo.edu) server. It contains archived data for all upper air profiling stations working globally in the WMO network. The syntax for downloading the single sounding is as follows: sounding_wyoming(wmo_id, yy, mm, dd, hh) This function requires a few numeric arguments: The returned object contains a list of two data frames. The first consists of measurements in a tabular form for 11 meteorological elements, while the second consists of metadata and the most fundamental thermodynamic and atmospheric instability indices.

IMGW-PIB Meteorological Data
The extended range of meteorological near-surface measurements can be achieved, usually from the regional met offices' repositories. The publicly available Polish historical meteorological dataset comprises of two sections: meteorological and actinometrical data. Each of these sections is divided into subsections depending on the observational interval. The actinometric data was not implemented in the climate package due to ongoing changes to the data storage, and it will be added after the final format is determined.
The climate package contains an interface to the Polish IMGW-PIB dataset, which can be downloaded with a very similar syntax to the global dataset described previously in a simplified way. The schema shown below describes the use of the most generic meteo_imgw function and contains all arguments that can be used to define requested data. meteo_imgw(interval, rank, year, status, coords, station, col_names) where: • interval -temporal resolution of the data ("hourly", "daily", "monthly") • rank -type of the stations to be downloaded ("synop", "climate", or "precip") • year -vector of years (e.g., 1966:2000) • status -logical argument (TRUE or FALSE); for removing status of the measurements • coords -logical argument (TRUE or FALSE); if TRUE coordinates are added • station -vector of stations; it can be an ID of a station (numeric) or a name of a stations (capital letters) • col_names -three types of column names possible: "short" -default, values with shortened names, "full" -full English description, "Polish" -original names in the dataset It is also worth noting that most of the arguments have predefined default values to support less experienced users. For example, if the station argument is not given, then all available datasets (here: data for all stations) are automatically downloaded. Only the interval, rank and year arguments are mandatory. In case any of them is not defined, the user is given a hint on the correct syntax.

IMGW-PIB Hydrological Data
The hydrological data is available in daily, monthly, and semiannual/annual temporal resolutions. The definition of the arguments in hydro_imgw is an analogue to the previously described for the meteorological data, with the syntax described below: hydro_imgw(interval, year, coords, value, station, col_names) where: • interval -temporal resolution of the data ("daily", "monthly", "semiannual_and_annual") • year -vector of years (e.g., 1966:2000) • coords -logical argument TRUE or FALSE; if TRUE coordinates are added • value -type of data (can be: state -"H" , flow -"Q", or temperature -"T").
• station -vector of stations; it can be an ID of a station (numeric) or a name of a stations (capital letters) • col_names -three types of column names possible: "short" -default, values with shortened names, "full" -full English description, "polish" -original names in the dataset

Results
The purpose of this section is to show the capabilities of the created R package. The following subsections provide examples for types of analyses that can be performed using the climate R package together with other R packages available on CRAN.

Ogimet Meteorological Data-Use Case
The meteorological dataset use case provided below was based on hourly data from the Ogimet repository for the defined time frame, i.e., 2018/01/01 -2018/12/31, for the location of Svalbard Lufthavn. The meteo_ogimet command allowed us to download 8761 observations for 22 variables (Listing 1). The dplyr and openair packages [25] were used to analyze and visualize part of downloaded results. After aggregating the data by the wind directions (the "ddd" column, Listing 2), converting directions into angles given in degrees, and reformatting dates' classes, it was possible to align it to format required by external packages and plot the seasonal wind roses (Figure 2). Listing 1. Example of the data download using the climate package. library(climate) df <-meteo_ogimet(interval = "hourly", date = c("2018-01-01", "2018-12-31"), station = "01008") #> [1]

Searching for the Nearest Stations
The user can also use the climate package without knowing the station's WMO ID. The nearest synoptic stations can be found with the nearest_ogimet_stations function (Listing 3). It requires users to provide a pair of geographical coordinates that point to the centroid of our area of investigations. We can specify how many nearest meteorological stations an user wants to find. As a result, we get a data frame with stations metadata and distance to given coordinates. Additionally a simple map can be added with the argument add_map = TRUE. Exemplary results and the code is given below Figure 3.

Sounding Data-Use Case
Downloading data for a single vertical profile of the atmosphere requires providing date, hour, and station's name (Listing 4). The chosen use case showed an atmospheric sounding started at 00UTC on 4th April 2019 in Łeba, Poland ( Figure 4). The returned data frame from the measurements allowed users to plot temperature and humidity profiles on the Skew-T diagram generated thanks to the RadioSonde package [26]. It showed a strong thermal inversion up to 800-850 m a.g.l. which may strongly impact the air quality conditions in a near-surface layers [24]. The metadata and thermodynamic calculations stored in the second element of the returned list were omitted on purpose as no severe weather parameters related to atmospheric convection were detected.

IMGW-PIB-Use Case
Another use case shows the possibilities of the climate package when coupled with the GIS and statistical capabilities of the R programming language ( Figure 5). The downloaded data comprised 30 years of monthly mean air temperatures derived from the main meteorological stations in Poland. Due to the missing or suspicious diagnosed values, some data were excluded, e.g., stations' location changes during the analyzed period or having a monthly mean air temperature during the summer season of 0°C. The next step was to create a function for calculating the slope coefficient of the linear regression model that was later applied to the whole dataset. The obtained results (Listing 5) were later transformed into a spatial object using the sf package [27] and visualized in the form of the map using the tmap package [28]. The created vector layer can later be saved in any GIS format supported by the sf package interfacing between R and the geospatial data abstraction library drivers (GDAL). One of the major advantages of using the R programming language is being able to keep everything in one environment instead of the typical situation where three different tools are applied for (1) data preprocessing, (2) statistical analysis, and (3) spatial data visualization. Such an approach makes it possible to reduce the required time for the entire research significantly and to focus more on the obtained results. However, the user must be aware that the provided tool is only an interface for downloading the data, and that the obtained results may inherit errors from the source repositories.

Conclusions
The climate R package allows users to obtain historical and most up-to-date meteorological information from both: ground and upper parts of the atmosphere. Data downloaded by climate gives possibilities for applying atmospheric data collected according to the WMO standards in an intuitive and fully automated way. The package is designed to be user-friendly and envisages, for the most part, environmental scientists wanting to obtain hydrological or meteorological data for research purposes in an convenient and programmable way within the R programming language. The usefulness and simplicity of the proposed solution can be especially valuable for many non-atmospheric scientists struggling with typically sophisticated and time-consuming mechanisms for accessing in-situ atmospheric data in a ready-to-use structure. The proposed solution with the climate package lets to save time for typical data flow in data science projects where a significant amount of time is spent on data preparation, while a core part of the computation is usually a magnitude shorter when compared to data cleaning and preprocessing [29].
Therefore for future improvements, it is planned to enlarge the climate R package with new local repositories so that more countries can conduct interdisciplinary research on meteorological data using a single tool, which can be targeted on a local scale in combination with global meteorological information. Also, new products (e.g., actinometric data in Poland) will be included once the IMGW-PIB repository has a mature form.