Preliminary Data Validation and Reconstruction of Temperature and Precipitation in Central Italy

: This study provides a unique procedure for validating and reconstructing temperature and precipitation data. Although developed from data in Middle Italy, the validation method is intended to be universal, subject to appropriate calibration according to the climate zones analysed. This research is an attempt to create shared applicative procedures that are most of the time only theorized or included in some software without a clear deﬁnition of the methods. The purpose is to detect most types of errors according to the procedures for data validation prescribed by the World Meteorological Organization, deﬁning practical operations for each of the ﬁve types of data controls: gross error checking, internal consistency check, tolerance test, temporal consistency, and spatial consistency. Temperature and precipitation data over the period 1931–2014 were investigated. The outcomes of this process have led to the removal of 375 records (0.02%) of temperature data from 40 weather stations and 1286 records (1.67%) of precipitation data from 118 weather stations, and 171 data points reconstructed. In conclusion, this work contributes to the development of standardized methodologies to validate climate data and provides an innovative procedure to reconstruct missing data in the absence of reliable reference time series.


Introduction
Climate analysis is taking on an increasingly central role in the life of mankind. Climate has a great impact on many environmental issues and requires reliable, as well as complete, data. The procedure for deleting possible errors from the data is called validation, while the completion of missing data in a time series is called reconstruction. In this context, the aim of the present study is to define a practical method of data validation and reconstruction that, in the future, could be automated by software. The issue of validation and reconstruction of missing data has been analysed by computer since the 1950s [1]. A growing awareness of the need for more accurate and truthful analyses led the scientific community to considerable development in this field. On the one hand, studies have been focused on the identification of the different types of errors [2], while, on the other hand, the goal has been the reconstruction of missing data. The quality control and climate data processing methods are developed and standardised through the work of the World Meteorological Organization (WMO), which has been active on this theme since the early1960s, publishing important reports (for example, [3]) and adopting the most relevant advances in this theme. The study of quality control is very complex and has gone through a constant refinement of techniques. Temporal consistency of observations and attributing flags to data took hold in the 1990s [4]. Almost simultaneously, other important concepts, such as duplicate profile and range checks, were introduced [5]. Subsequently, spatial consistency and the detection of false positives took a leading role [6]. Furthermore, there were many efforts to start a possible standardization of quality control rules with the increased number of automatic weather stations, and investigations of high-low range limit, rate of change limit, or persistence analyses [7]. To date, the increasing development of computer technology has generated automated systems for the analysis of meteorological and climatological data. Some investigations have considered data in real-time at hourly or semi-hourly scale (for example, [8][9][10]) in order to detect the error immediately, while other studies of automatic analyses for quality control were based on daily data [11]. There are objective difficulties in quality control through daily precipitation data because of the spatial discontinuity of the variable. However some studies have obtained good results [12][13][14][15]. Moreover, some software developed in the 'R' environment not only check the quality of the data, but also calculate extreme climate indices [16]. In this context the WMO has the aim of summarizing the latest improvement in atmospheric science by creating standards for the international community, and identifies some quality control procedures [17]. These quality control procedures are based on five different tests [18] that analyse spatial, temporal, and absolute relationships of climate time series. For data reconstruction, the studies have been developed more recently, due to software that allows spatial interpolations [19]. In particular, geostatistics have played a key role in the reconstruction of climate data, with extensive use of neural networks [20] and kriging methods [21]. Thus, the present study aims to contribute to quality control by providing an operational procedure, starting from WMO prescriptions. A system based on five different tests for validation of daily and monthly climate data has been adopted. Quality controls are planned through a procedure that differs for temperature and precipitation because of their inner diversity in data range and variability. This research is innovative because it emphasizes relations between neighbouring weather stations, in order to detect errors in the data, even if they belong to different climates, as in this case. Moreover, this analysis implements a method to reconstruct missing data in the absence of a reliable reference time series. This method does not take into account validation of weather station data to obtain the average ratios with the raw ones to reconstruct missing data, but it interpolates many values of temperature and precipitation of the weather stations surrounding the missing one.

Geographical Boundaries of the Analysis
The study site is located between the Adriatic Sea and the Appennine Mountains ( Figure 1) in the province of Macerata (Marche, Middle Italy) and some of the surrounding territories. The elevation gradient ranges from sea level on the Adriatic coast to 2233 m asl (above sea level) (the Porche Mountain). This difference in altitude makes quality control of climate data very difficult and requires a method to compare mountain weather stations. This area is characterized by heterogeneous environments. On the basis of the classification of Köppen-Geiger [22] it is possible to identify three main climate zones [23]: 'Cs' (C-temperate climate with s-dry summer) in the coastal area and its surroundings, 'Cf' (f-humid) until 1400 m, and above this elevation up to the highest peak the climate type is 'H' (high altitude climates).
The wide diversity of climate conditions in Macerata province means that it is increasingly difficult to perform data validation tests common to all the weather stations because quality controls should work for different types of climate.

Climate Data
The climate data have been supplied by the 'Annali Idrologici' (Hydrological Yearbooks http://www.acq.isprambiente.it/annalipdf/), the 'Dipartimento della Protezione Civile', Regione Marche' (Dept. of Civil Protection http://www.protezionecivile.gov.it/jcms/en/home.wp), the 'Centro Funzionale dell'Umbria' (Functional Center of Umbria http://www.cfumbria.it/), and the Agenzia Servizi Settore Agroalimentare delle Marche (Agency for Agro-food Sector Services of the Marche Region http://www.assam.marche.it/en/). The data cover the years from 1931 to 2014, however, the analysis is divided into three standard periods of 30 years: 1931-1960; 1961-1990; and 1991-2014. The division into periods allows a good continuity of weather stations that must have at least 15 years of continuous data to be part of the analysis. The total number of weather stations is 40 for temperature data and 118 for precipitation data (Table 1). Their numbers have changed during the period of analysis (1931-2014), due to changes of instruments or removal of weather stations. The instruments were initially mechanical, above all in the period when the data were recorded by 'Annali Idrologici'. Since the 1990s, almost all weather stations have been automated with an integrated wireless telemetry system. Finally, mean daily values of temperature were calculated from hourly and half-hourly data at each station when possible, only if at least 75% of the data in a given day were available. For precipitation, the monthly data value is considered only if all daily observations in a month are available. If these conditions regarding temperature and precipitation are not satisfied, the data are considered missing. For temperature, daily data were analysed because this variable shows a gradual distribution in the environment, i.e., it follows Tobler's Law [24] with gradients typical of each area; daily precipitation, on the other hand, is often not correlated with nearby rain gauges, due to atmospheric dynamics, although on a monthly scale the correlation returns.

Data Analysis
The analysis was performed by using the spreadsheet and GIS (Geographic Information Systems) software. A spreadsheet was used to carry out the sequence of controls and GIS was used for data reconstruction by applying geostatistical methods. Concerning data reconstruction, each candidate weather station was reconstructed with some neighbouring ones. The clustering of the sample was primarily investigated with the "average nearest neighbour" tool, which returned a good result with a p-value higher than 95% [25]: where d i is the distance between feature i and its nearest feature, n corresponds to the total number of features, and A is the total study area. Subsequently, the data have been evaluated through a Voronoi diagram based on clustering, with altitude as an attribute, in order to identify the similarity between a candidate weather station and surrounding neighbours [26]. The Empirical Bayesian Kriging method is a geostatistical method which has been used for interpolation, reconstructing the missing data at the exact co-ordinates of the candidate weather station.
The control procedure is more complicated than the reconstruction one and required that values be ranked on the basis of some quality control flags (QC). For example, missing datum (QC = −1), correct or verified datum (QC = 0), datum under investigation (QC = 1), datum removed after the analysis (QC = 2), and datum reconstructed through interpolation or by estimating the errors of digitization (QC = 3).
There are five main tests both for temperature and precipitation: 1.
Spatial consistency 'Gross error checking' was performed for both temperature and precipitation in the same way; each daily or monthly data outside the established threshold was deleted. At the end of this part of the analysis of only two QC values are allowed: 0 or 2 (if it is not possible to solve the error by using the metadata of the source). The threshold was analysed in order to check for both digitizing errors and values exceeding the measurement range for a sensor problem. The accepted range is from +50 • C to −40 • C for daily temperature [27], while 2000 mm is the limit in monthly precipitation and it represents the maximum annual amount of precipitation in Marche Region. In these data there are no gross errors.
The 'internal consistency check' is a type of control that assesses the consistency of climate data. For example, when temperature has a maximum value lower than minimum one is an error of consistency, and when there is a negative rainfall value. Any values outside these ranges were removed when it was not possible to correct them through the metadata analysis. The internal consistency check, in the same way of gross error checking, led to corrected or deleted data (QC flag 0 or 2).
Before applying the remaining three tests, the normality of data distribution was assessed in order to choose the most suitable statistical instrument for each parameter (temperature, precipitation). The Gaussian distribution was verified in all the weather stations by using statistical indicators of normality as: • 'QQ plot' performed with ArcGis to evaluate graphically the normality of data distribution [28]; • The 'Kolmogorov-Smirnov test', set with a confidence interval of 95% [29]; • Calculation of skewness; if skewness values are between 2 and −2 the distribution of values is considered 'normal' [30].
The tolerance test was applied to check each weather station on the basis of its historical time series. The test investigates the upper and lower thresholds that are 3σ ± µ (where σ is standard deviation of the time series, and µ is the mean of the time series) for daily temperature (maximum, mean, minimum, and difference between maximum and minimum) and monthly precipitation. Moreover, the months with 0 mm of precipitation were further investigated, because the method detects them as lower values even though 0 mm can be a real value in summer months. It can be concluded that the tolerance test defines 'data under investigation' (QC = 1) and 'correct data' (QC = 0). Subsequently, the data under investigation were analysed in more detail by applying the following controls. They were tested by spatial consistency, which takes into account the neighbouring weather stations to identify if there are at least two of them that exceed the threshold of 2σ, as this would provide a clear indication of the suitability of data. The data were previously analysed by using the "Nearest Neighbour" tool to analyse their distribution (if random or cluster) with an interval of confidence (p-value) above 95%. Instead, the Voronoi map, with altitude as attribute, was used to group similar weather stations. Spatial consistency of temperatures take into account daily data, while for precipitation monthly and annual ones. Precipitation was analysed to an annual scale because it is easier to highlight the differences between neighbouring stations before the monthly analysis. The formula below was used to set the threshold [31]: where µ is the mean of five neighbouring weather stations, σ is the standard deviation of them, and n is the number of samples. Precipitation and temperature data outside the established range were assigned a VC = 1 (data under investigation) after they were analysed in the temporal consistency test. Temporal consistency differs between temperature and precipitation because of the difference of data in the continuity of temperature and precipitation. Temperatures were analysed for persistence by removal (QC = 2) of the values that occur for more than one following day unless it was confirmed by at least two neighbouring weather stations, with a difference lower than 0.2 • C between contiguous days; while for precipitation there is persistence if the same value to one decimal place occurred for more than one following day without the need to investigate any neighbouring weather stations. The maximum difference between contiguous days was analysed by applying a mean of all differences between the maximum and minimum values for the entire duration of the data time series. Thus, the limits were calculated by using the median of variations and summed or subtracted to three times the standard deviation (µ ± 3σ), in order to verify if the investigated weather station exceed the established thresholds [32].
Temporal consistency of precipitation is composed of two main points: 1.
The rain gauges that show QC = 1 after the spatial consistency because of very low precipitation were analysed through a test composed by the calculation of the squared correlation coefficient (R 2 ) [33]: R 2 was calculated for the investigated rain gauge and the most similar one differentiated in four cases: • R 2 > 0.7: the rain gauge value is accepted for all months only if it is above its minimum limit as calculated by the time series, for at least 9 out of 12 months; • R 2 < 0.7: the months below the lower threshold of the time series are removed only if at least 9 out of 12 months are above this limit; • If there are less than nine months above the lower limit but the value of R 2 is greater than 0.7; it is necessary to calculate the median of each month and of each year in the five nearby rain gauges in the lifetime of the investigated one and subtract 1.5 times the standard deviation, thus obtaining another threshold value. When the rain recording station shows three years or more below the lower threshold the whole year is deleted, otherwise it is accepted completely without removing any months; • When there are less than nine months above the minimum limit and R 2 < 0.7 the whole suspect year is deleted.

2.
The rain gauges, which had a QC = 1 after the spatial consistency analysis due to the exceeding of 3σ threshold for annual values, required use of a procedure slightly different from the gauges with very little precipitation. The monthly data of the weather station under investigation were analysed with its historical time series and accepted if they were lower than 2σ + µ (QC = 0), investigated if they were between 2σ + µ and 3σ + µ (QC = 1), or removed if they were above 3σ + µ (QC = 3). The suspect rainfall stations with at least 10 years of observations and no more than 20, were analysed in comparison with the neighbouring stations through the following procedure: • If the similarity is greater than 0.7 (R 2 ), the rain gauges would remain for all the months if they are below the threshold value for at least 9 out of 12 months. If the threshold value is above the limit for more than four months, it should be compared with five nearby rain gauges. This comparison allowed calculation of a median that should be multiplied by two times the standard deviation: Th.Max.Neigh.pt = Me + 2σ. When the record exceeded this limit for more than three months the whole year is removed (QC = 3): otherwise, only the months above the threshold would be deleted (QC = 3); • When R 2 was <0.7, the records were deleted for all the values above the set limit if at least 9 out of 12 months were below the limit (Th.Max.Neigh.pt = Me + 2σ); however, if there were four months above the limit, data were removed for the whole year.
After the temporal consistency check was completed, it was necessary to assess again the spatial consistency by taking into consideration the monthly data (previously this procedure was based on annual values) in order to have a scaling up and achieve a higher accuracy. The same method of three standard deviations above/below the mean was used to remove the data out of the threshold (QC = 3): the data inside this were accepted (QC = 0). Finally, it is necessary to specify that any data are accepted (QC = 0) if an extreme climatic event was historically documented in the metadata, and only three errors solved in this way. The complex procedure adopted is summarized in the mind-map graph (Figure 2).


When R 2 was < 0.7, the records were deleted for all the values above the set limit if at least 9 out of 12 months were below the limit (Th.Max.Neigh.pt=Me+2σ); however, if there were four months above the limit, data were removed for the whole year.
After the temporal consistency check was completed, it was necessary to assess again the spatial consistency by taking into consideration the monthly data (previously this procedure was based on annual values) in order to have a scaling up and achieve a higher accuracy. The same method of three standard deviations above/below the mean was used to remove the data out of the threshold (QC=3): the data inside this were accepted (QC=0). Finally, it is necessary to specify that any data are accepted (QC=0) if an extreme climatic event was historically documented in the metadata, and only three errors solved in this way. The complex procedure adopted is summarized in the mind-map graph (Figure 2

Reconstruction of Missing Data
The reconstruction of missing data (VC = −1) was analysed on the basis of 10 day intervals for temperature and on monthly intervals for precipitation. The procedure of reconstruction of missing data was divided into two phases [34]: the investigation of the difference between reference and candidate time series [35]; 2.
the reconstruction of data through the addition of the difference to the reference time series in order to reconstruct the candidate one [36].
The method of the reconstruction of data can be classified as indirect. As there is no reference time series that could be considered reliable with reasonable certainty, the reconstruction has been created with at least five neighbouring weather stations as reference time series through the comparison of three statistical techniques with GIS software: Z(s 0 ) = predicted value. N = number of neighboring point used to predictẐ(s 0 ). λ i = weight assigned to each point considered for the prediction. It depends from the distance of each point toẐ(s 0 ). Z i (s i ) = observed value in the location (s i ).
d i0 = distance between predicted and measured location. p = reduction factor of the weight of each data in function of the increasing distance from the predicted location.

•
Empirical Bayesian Kriging(EBK) allows an automatic estimate of the semivariogram through GIS software. It is possible to set the number of attempts, 1000 in this case with 60 points in each subset and an overlap factor equal to 1 (empirically demonstrated assessing the greatest minimization of the error). This method is very convenient when the data are non-stationary and with a great extension in the territory, because it uses a local model and, with 1000 attempts, it is possible to obtain the best fit for each value [38]. • ordinary co-kriging method [37]: β k = a vector of parameters for the k-th type of variable with the following assumptions: Y 1 (s 0 ) = a smooth second order stationary process whose range of autocorrelation is detectable with an empirical semivariogram or covariance. η 1 (s 0 ) = a smooth second order stationary process whose variogram range is so close to zero that it is shorter than all practical distances between real and predicted data.
Co-kriging in geostatistical analysis is obtained from the linear predictor: c k = Cov(z k , S 1 (s 0 )) It's the covariance of z k that it's the vector observed to the location S 1 (s 0 ). m = resolution of the matrix between the two Lagrange multipliers. X = matrix of regression. Replacing λ gives:σ if (this condition shows evidence that the ordinary co-kriging can be seen as a particular case of the universal co-kriging): v = nugget effect, it is composed of microscale variations added to the measurement error (this tool measuring the starting point of the semivariogram is far from the origin of the axis that is the point of null error). π = this coefficient multiplied by v allows the definition of σ 2 . Empirical Bayesian Kriging was chosen compared to the IDW and co-kriging based on altitude, which is the most correlated topographical parameter [39]. EBK was chosen compared to IDW because of a lower statistical error, while it was chosen for different reasons in comparison with co-kriging. In fact, EBK gives worse results than co-kriging, even if it was much faster in the application ( Table 2). The Empirical Bayesian Kriging function was used to calculate the reference time series of both precipitation and temperature. It takes up to the maximum 10 neighbouring weather stations and the simulations were set to 1000. The values of the reference time series were calculated through the interpolation of the neighbouring values, in the same point of the candidate one for each interval of sampling (10 days temperature and monthly precipitations) [40]. This reconstructed reference time series was subtracted from the candidate one for each value of temperature or precipitation in the period of study. Thus, the resulting values were averaged to identify a mean difference between reference and candidate time series for the period of study in each interval of sampling. Lastly, the difference between reference and candidate time series was subtracted from the reference one to predict the values of the candidate in the time intervals where data are missing.

Results
Gross error checking and internal consistency checking detected 75 erroneous data points for temperature and 200 for precipitation. Some of these were typographical errors which have been corrected: thus, only 47 temperature and 152 precipitation data points have been removed ( Table 3). The tolerance test has detected several errors in the data, even if in this test there is the chance to have QC = 1 (data under investigation) QC = 0 (correct data). The same codes are detected from the first spatial consistency and the temporal consistency. Finally, with the last spatial consistency the codes are QC = 2 or QC = 0, in order to know if the data under investigation should be deleted or accepted (Figure 3).  Therefore, it is useful to assess (Table 4) how many false positive and real positive results were detected in the analysis. Some data after the tolerance test and temporal consistency have been   Therefore, it is useful to assess (Table 4) how many false positive and real positive results were detected in the analysis. Some data after the tolerance test and temporal consistency have been placed under investigation, although most of the QC = 2 have been detected after the spatial consistency. The outcome of this analysis is the elimination of 375 records from 1,821,054 (0.02%) temperature data points and 1286 out of 77,021 (1.67%) precipitation data points during the period 1931-2014 in the province of Macerata. Table 5 shows the distribution of temperatures and precipitation in each standard period with the higher amount of incorrect data in the last period, although the most recent period lacks sixyears of data to complete the new reference standard period, as prescribed by the WMO (1991-2020). However, whilst this augmentation of incorrect data in the last period could be caused by the greater number of weather stations, it could also be due to some weather stations being affected by systematic errors for several years. The increase of incorrect data with the number of rain gauges has also been observed. Furthermore, one of the most important goals is represented by the reconstruction of 112 data points for temperature and 59 for precipitation. In this case, after the definition of the reference weather stations for each candidate one, the Empirical Bayesian Kriging (EBK) process was carried out. The EBK obtained good results after the cross-validation with a test dataset needed to compare the measured value with the predicted one. The difference between the predicted data value and the measured one at the location of the candidate weather station, analysed with statistical operators (mean error (ME), root mean square error (RMSE), average standardized error (ASE), mean standardized error (MSE), root mean square error standardized (RMSSE)) allowed an estimation of the goodness of the interpolation [41]. Temperature and precipitation are both well interpolated by the EBK (Table 6), although the temperature result is definitely better than for precipitation because daily data temperature have been tested, instead of monthly ones for precipitation.

Conclusions
This procedure may contribute a standard way to validate and reconstruct climate data. The WMO prescribes some procedures for quality control without specific sequences and operational processes.
However, if a standard procedure for each climate or geographical condition was established it should be possible to produce more reliable data for climate analysis. Instead, the data reconstruction can be considered as a standard process that can be used in each region without calibration, provided that an appropriate proximity of weather stations is available. In this case, on the basis of root mean square error observations, the presence of at least five weather stations within a distance of 10 km from the reconstructed one for precipitation, and 20 km for temperature, can be considered adequate. A limit of the quality control method is that it can be applied only in regions with temperate climate, as the thresholds used to analyse the data take into account the variability of typical temperate zones. However, this procedure can be a useful tool to validate data under different climate patterns after an accurate calibration. It is also important to note that spatial consistency analysis can adequately assess the values of mountain weather stations. In fact, the percentage of data with QC = 2 is the same for all weather stations and for mountain weather stations as far as temperatures are concerned. For precipitation, the percentage with code QC = 2 is clearly increasing, probably due to strong winds thatdo not allow a correct calculation of the rain, which is always underestimated. In fact, the precipitation values of mountain weather stations are, in some cases, lower than those of the hills and this may be a point to investigate further. In conclusion, these procedures are indispensable for climate and for all sciences in which data can be affected by errors, to obtain an analysis of proven accuracy.