Next Article in Journal
Response of Soils and Soil Ecosystems to the Pennsylvanian–Permian Climate Transition in the Upper Fluvial Plain of the Dunkard Basin, Southeastern Ohio, USA
Previous Article in Journal
Global Warming Can Lead to Depletion of Oxygen by Disrupting Phytoplankton Photosynthesis: A Mathematical Modelling Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Preliminary Data Validation and Reconstruction of Temperature and Precipitation in Central Italy

1
School of Sciences and Technologies, University of Camerino, 62032 Camerino, Italy
2
Department of Earth Sciences, Sapienza University of Rome, 00185 Rome, Italy
3
Department of Agriculture, Health and Environment, Natural Resources Institute, University of Greenwich at Medway, Chatham, Kent ME4 4TB, UK
4
CREA, Research Centre for Forest and Wood, 52100 Arezzo, Italy
*
Author to whom correspondence should be addressed.
Geosciences 2018, 8(6), 202; https://doi.org/10.3390/geosciences8060202
Submission received: 27 March 2018 / Revised: 30 May 2018 / Accepted: 1 June 2018 / Published: 3 June 2018

Abstract

:
This study provides a unique procedure for validating and reconstructing temperature and precipitation data. Although developed from data in Middle Italy, the validation method is intended to be universal, subject to appropriate calibration according to the climate zones analysed. This research is an attempt to create shared applicative procedures that are most of the time only theorized or included in some software without a clear definition of the methods. The purpose is to detect most types of errors according to the procedures for data validation prescribed by the World Meteorological Organization, defining practical operations for each of the five types of data controls: gross error checking, internal consistency check, tolerance test, temporal consistency, and spatial consistency. Temperature and precipitation data over the period 1931–2014 were investigated. The outcomes of this process have led to the removal of 375 records (0.02%) of temperature data from 40 weather stations and 1286 records (1.67%) of precipitation data from 118 weather stations, and 171 data points reconstructed. In conclusion, this work contributes to the development of standardized methodologies to validate climate data and provides an innovative procedure to reconstruct missing data in the absence of reliable reference time series.

1. Introduction

Climate analysis is taking on an increasingly central role in the life of mankind. Climate has a great impact on many environmental issues and requires reliable, as well as complete, data. The procedure for deleting possible errors from the data is called validation, while the completion of missing data in a time series is called reconstruction. In this context, the aim of the present study is to define a practical method of data validation and reconstruction that, in the future, could be automated by software. The issue of validation and reconstruction of missing data has been analysed by computer since the 1950s [1]. A growing awareness of the need for more accurate and truthful analyses led the scientific community to considerable development in this field. On the one hand, studies have been focused on the identification of the different types of errors [2], while, on the other hand, the goal has been the reconstruction of missing data. The quality control and climate data processing methods are developed and standardised through the work of the World Meteorological Organization (WMO), which has been active on this theme since the early1960s, publishing important reports (for example, [3]) and adopting the most relevant advances in this theme. The study of quality control is very complex and has gone through a constant refinement of techniques. Temporal consistency of observations and attributing flags to data took hold in the 1990s [4]. Almost simultaneously, other important concepts, such as duplicate profile and range checks, were introduced [5]. Subsequently, spatial consistency and the detection of false positives took a leading role [6]. Furthermore, there were many efforts to start a possible standardization of quality control rules with the increased number of automatic weather stations, and investigations of high-low range limit, rate of change limit, or persistence analyses [7]. To date, the increasing development of computer technology has generated automated systems for the analysis of meteorological and climatological data. Some investigations have considered data in real-time at hourly or semi-hourly scale (for example, [8,9,10]) in order to detect the error immediately, while other studies of automatic analyses for quality control were based on daily data [11]. There are objective difficulties in quality control through daily precipitation data because of the spatial discontinuity of the variable. However some studies have obtained good results [12,13,14,15]. Moreover, some software developed in the ‘R’ environment not only check the quality of the data, but also calculate extreme climate indices [16]. In this context the WMO has the aim of summarizing the latest improvement in atmospheric science by creating standards for the international community, and identifies some quality control procedures [17]. These quality control procedures are based on five different tests [18] that analyse spatial, temporal, and absolute relationships of climate time series. For data reconstruction, the studies have been developed more recently, due to software that allows spatial interpolations [19]. In particular, geostatistics have played a key role in the reconstruction of climate data, with extensive use of neural networks [20] and kriging methods [21]. Thus, the present study aims to contribute to quality control by providing an operational procedure, starting from WMO prescriptions. A system based on five different tests for validation of daily and monthly climate data has been adopted. Quality controls are planned through a procedure that differs for temperature and precipitation because of their inner diversity in data range and variability. This research is innovative because it emphasizes relations between neighbouring weather stations, in order to detect errors in the data, even if they belong to different climates, as in this case. Moreover, this analysis implements a method to reconstruct missing data in the absence of a reliable reference time series. This method does not take into account validation of weather station data to obtain the average ratios with the raw ones to reconstruct missing data, but it interpolates many values of temperature and precipitation of the weather stations surrounding the missing one.

Geographical Boundaries of the Analysis

The study site is located between the Adriatic Sea and the Appennine Mountains (Figure 1) in the province of Macerata (Marche, Middle Italy) and some of the surrounding territories. The elevation gradient ranges from sea level on the Adriatic coast to 2233 m asl (above sea level) (the Porche Mountain). This difference in altitude makes quality control of climate data very difficult and requires a method to compare mountain weather stations.
This area is characterized by heterogeneous environments. On the basis of the classification of Köppen–Geiger [22] it is possible to identify three main climate zones [23]: ‘Cs’ (C-temperate climate with s-dry summer) in the coastal area and its surroundings, ‘Cf’ (f-humid) until 1400 m, and above this elevation up to the highest peak the climate type is ‘H’ (high altitude climates).
The wide diversity of climate conditions in Macerata province means that it is increasingly difficult to perform data validation tests common to all the weather stations because quality controls should work for different types of climate.

2. Methodology

2.1. Climate Data

The climate data have been supplied by the ‘Annali Idrologici’ (Hydrological Yearbooks http://www.acq.isprambiente.it/annalipdf/), the ‘Dipartimento della Protezione Civile’, Regione Marche’ (Dept. of Civil Protection http://www.protezionecivile.gov.it/jcms/en/home.wp), the ‘Centro Funzionale dell’Umbria’ (Functional Center of Umbria http://www.cfumbria.it/), and the Agenzia Servizi Settore Agroalimentare delle Marche (Agency for Agro-food Sector Services of the Marche Region http://www.assam.marche.it/en/). The data cover the years from 1931 to 2014, however, the analysis is divided into three standard periods of 30 years: 1931–1960; 1961–1990; and 1991–2014. The division into periods allows a good continuity of weather stations that must have at least 15 years of continuous data to be part of the analysis. The total number of weather stations is 40 for temperature data and 118 for precipitation data (Table 1). Their numbers have changed during the period of analysis (1931–2014), due to changes of instruments or removal of weather stations. The instruments were initially mechanical, above all in the period when the data were recorded by ‘Annali Idrologici’. Since the 1990s, almost all weather stations have been automated with an integrated wireless telemetry system. Finally, mean daily values of temperature were calculated from hourly and half-hourly data at each station when possible, only if at least 75% of the data in a given day were available. For precipitation, the monthly data value is considered only if all daily observations in a month are available. If these conditions regarding temperature and precipitation are not satisfied, the data are considered missing. For temperature, daily data were analysed because this variable shows a gradual distribution in the environment, i.e., it follows Tobler’s Law [24] with gradients typical of each area; daily precipitation, on the other hand, is often not correlated with nearby rain gauges, due to atmospheric dynamics, although on a monthly scale the correlation returns.

2.2. Data Analysis

The analysis was performed by using the spreadsheet and GIS (Geographic Information Systems) software. A spreadsheet was used to carry out the sequence of controls and GIS was used for data reconstruction by applying geostatistical methods. Concerning data reconstruction, each candidate weather station was reconstructed with some neighbouring ones. The clustering of the sample was primarily investigated with the “average nearest neighbour” tool, which returned a good result with a p-value higher than 95% [25]:
A N N = D ¯ O D ¯ E A N N = D ¯ O D ¯ E
D ¯ O = i = 1 n d i n D ¯ O = i = 1 n d i n
D ¯ E = 0.5 n A D ¯ E = 0.5 n A
where d i is the distance between feature i and its nearest feature, n corresponds to the total number of features, and A is the total study area.
Subsequently, the data have been evaluated through a Voronoi diagram based on clustering, with altitude as an attribute, in order to identify the similarity between a candidate weather station and surrounding neighbours [26]. The Empirical Bayesian Kriging method is a geostatistical method which has been used for interpolation, reconstructing the missing data at the exact co-ordinates of the candidate weather station.
The control procedure is more complicated than the reconstruction one and required that values be ranked on the basis of some quality control flags (QC). For example, missing datum (QC = −1), correct or verified datum (QC = 0), datum under investigation (QC = 1), datum removed after the analysis (QC = 2), and datum reconstructed through interpolation or by estimating the errors of digitization (QC = 3).
There are five main tests both for temperature and precipitation:
  • Gross error checking
  • Internal consistency check
  • Tolerance test
  • Temporal consistency
  • Spatial consistency
‘Gross error checking’ was performed for both temperature and precipitation in the same way; each daily or monthly data outside the established threshold was deleted. At the end of this part of the analysis of only two QC values are allowed: 0 or 2 (if it is not possible to solve the error by using the metadata of the source). The threshold was analysed in order to check for both digitizing errors and values exceeding the measurement range for a sensor problem. The accepted range is from +50 °C to −40 °C for daily temperature [27], while 2000 mm is the limit in monthly precipitation and it represents the maximum annual amount of precipitation in Marche Region. In these data there are no gross errors.
The ‘internal consistency check’ is a type of control that assesses the consistency of climate data. For example, when temperature has a maximum value lower than minimum one is an error of consistency, and when there is a negative rainfall value. Any values outside these ranges were removed when it was not possible to correct them through the metadata analysis. The internal consistency check, in the same way of gross error checking, led to corrected or deleted data (QC flag 0 or 2).
Before applying the remaining three tests, the normality of data distribution was assessed in order to choose the most suitable statistical instrument for each parameter (temperature, precipitation). The Gaussian distribution was verified in all the weather stations by using statistical indicators of normality as:
  • ‘QQ plot’ performed with ArcGis to evaluate graphically the normality of data distribution [28];
  • The ‘Kolmogorov-Smirnov test’, set with a confidence interval of 95% [29];
  • Calculation of skewness; if skewness values are between 2 and −2 the distribution of values is considered ‘normal’ [30].
The tolerance test was applied to check each weather station on the basis of its historical time series. The test investigates the upper and lower thresholds that are 3σ ± µ (where σ is standard deviation of the time series, and µ is the mean of the time series) for daily temperature (maximum, mean, minimum, and difference between maximum and minimum) and monthly precipitation. Moreover, the months with 0 mm of precipitation were further investigated, because the method detects them as lower values even though 0 mm can be a real value in summer months. It can be concluded that the tolerance test defines ‘data under investigation’ (QC = 1) and ‘correct data’ (QC = 0). Subsequently, the data under investigation were analysed in more detail by applying the following controls. They were tested by spatial consistency, which takes into account the neighbouring weather stations to identify if there are at least two of them that exceed the threshold of 2σ, as this would provide a clear indication of the suitability of data. The data were previously analysed by using the “Nearest Neighbour” tool to analyse their distribution (if random or cluster) with an interval of confidence (p-value) above 95%. Instead, the Voronoi map, with altitude as attribute, was used to group similar weather stations. Spatial consistency of temperatures take into account daily data, while for precipitation monthly and annual ones. Precipitation was analysed to an annual scale because it is easier to highlight the differences between neighbouring stations before the monthly analysis. The formula below was used to set the threshold [31]:
Th = µ ± 1.96 σ n
where µ is the mean of five neighbouring weather stations, σ is the standard deviation of them, and n is the number of samples.
Precipitation and temperature data outside the established range were assigned a VC=1 (data under investigation) after they were analysed in the temporal consistency test. Temporal consistency differs between temperature and precipitation because of the difference of data in the continuity of temperature and precipitation. Temperatures were analysed for persistence by removal (QC = 2) of the values that occur for more than one following day unless it was confirmed by at least two neighbouring weather stations, with a difference lower than 0.2 °C between contiguous days; while for precipitation there is persistence if the same value to one decimal place occurred for more than one following day without the need to investigate any neighbouring weather stations. The maximum difference between contiguous days was analysed by applying a mean of all differences between the maximum and minimum values for the entire duration of the data time series. Thus, the limits were calculated by using the median of variations and summed or subtracted to three times the standard deviation (µ ± 3σ), in order to verify if the investigated weather station exceed the established thresholds [32].
Temporal consistency of precipitation is composed of two main points:
1.
The rain gauges that show QC = 1 after the spatial consistency because of very low precipitation were analysed through a test composed by the calculation of the squared correlation coefficient (R2) [33]:
c o e f f o f c o r r . ( x ; y ) = ( x x ¯ ) ( y y ¯ ) ( x x ¯ ) 2 ( y y ¯ ) 2
R2 was calculated for the investigated rain gauge and the most similar one differentiated in four cases:
  • R2 > 0.7: the rain gauge value is accepted for all months only if it is above its minimum limit as calculated by the time series, for at least 9 out of 12 months;
  • R2 < 0.7: the months below the lower threshold of the time series are removed only if at least 9 out of 12 months are above this limit;
  • If there are less than nine months above the lower limit but the value of R2 is greater than 0.7; it is necessary to calculate the median of each month and of each year in the five nearby rain gauges in the lifetime of the investigated one and subtract 1.5 times the standard deviation, thus obtaining another threshold value. When the rain recording station shows three years or more below the lower threshold the whole year is deleted, otherwise it is accepted completely without removing any months;
  • When there are less than nine months above the minimum limit and R2 < 0.7 the whole suspect year is deleted.
2.
The rain gauges, which had a QC = 1 after the spatial consistency analysis due to the exceeding of 3σ threshold for annual values, required use of a procedure slightly different from the gauges with very little precipitation. The monthly data of the weather station under investigation were analysed with its historical time series and accepted if they were lower than 2σ + µ (QC = 0), investigated if they were between 2σ + µ and 3σ + µ (QC = 1), or removed if they were above 3σ + µ (QC = 3). The suspect rainfall stations with at least 10 years of observations and no more than 20, were analysed in comparison with the neighbouring stations through the following procedure:
  • If the similarity is greater than 0.7 (R2), the rain gauges would remain for all the months if they are below the threshold value for at least 9 out of 12 months. If the threshold value is above the limit for more than four months, it should be compared with five nearby rain gauges. This comparison allowed calculation of a median that should be multiplied by two times the standard deviation: Th.Max.Neigh.pt = Me + 2σ. When the record exceeded this limit for more than three months the whole year is removed (QC = 3): otherwise, only the months above the threshold would be deleted (QC = 3);
  • When R2 was < 0.7, the records were deleted for all the values above the set limit if at least 9 out of 12 months were below the limit (Th.Max.Neigh.pt = Me + 2σ); however, if there were four months above the limit, data were removed for the whole year.
After the temporal consistency check was completed, it was necessary to assess again the spatial consistency by taking into consideration the monthly data (previously this procedure was based on annual values) in order to have a scaling up and achieve a higher accuracy. The same method of three standard deviations above/below the mean was used to remove the data out of the threshold (QC = 3): the data inside this were accepted (QC = 0). Finally, it is necessary to specify that any data are accepted (QC = 0) if an extreme climatic event was historically documented in the metadata, and only three errors solved in this way. The complex procedure adopted is summarized in the mind-map graph (Figure 2).

2.3. Reconstruction of Missing Data

The reconstruction of missing data (VC= −1) was analysed on the basis of 10 day intervals for temperature and on monthly intervals for precipitation. The procedure of reconstruction of missing data was divided into two phases [34]:
  • the investigation of the difference between reference and candidate time series [35];
  • the reconstruction of data through the addition of the difference to the reference time series in order to reconstruct the candidate one [36].
The method of the reconstruction of data can be classified as indirect. As there is no reference time series that could be considered reliable with reasonable certainty, the reconstruction has been created with at least five neighbouring weather stations as reference time series through the comparison of three statistical techniques with GIS software:
  • inverse distance weighted (IDW) [37]:
    Z ^ ( s 0 ) = i = 1 N λ i Z i ( s i )
Z ^ ( s 0 ) = predicted value.
N = number of neighboring point used to predict Z ^ ( s 0 ) .
λ i = weight assigned to each point considered for the prediction. It depends from the distance of each point to Z ^ ( s 0 ) .
Z i ( s i ) = observed value in the location (si).
λ i = d i 0 p i = 1 N d i 0 p
d i 0 = distance between predicted and measured location.
p = reduction factor of the weight of each data in function of the increasing distance from the predicted location.
  • Empirical Bayesian Kriging(EBK) allows an automatic estimate of the semivariogram through GIS software. It is possible to set the number of attempts, 1000 in this case with 60 points in each subset and an overlap factor equal to 1 (empirically demonstrated assessing the greatest minimization of the error). This method is very convenient when the data are non-stationary and with a great extension in the territory, because it uses a local model and, with 1000 attempts, it is possible to obtain the best fit for each value [38].
  • ordinary co-kriging method [37]:
    S ^ 1 = [ x 1 ( s 0 ) ] β 1 + Y 1 ( s 0 ) + η 1 ( s 0 )
β k = a vector of parameters for the k-th type of variable with the following assumptions:
Y 1 ( s 0 ) = a smooth second order stationary process whose range of autocorrelation is detectable with an empirical semivariogram or covariance.
η 1 ( s 0 ) = a smooth second order stationary process whose variogram range is so close to zero that it is shorter than all practical distances between real and predicted data.
Co-kriging in geostatistical analysis is obtained from the linear predictor:
S ^ 1 ( s 0 ) = λ 1 z 1 + λ 2 z 2
λ = z 1 ( c X m )
c k = Cov(zk, S1(s0)) It’s the covariance of zk that it’s the vector observed to the location S1(s0).
m = resolution of the matrix between the two Lagrange multipliers.
X = matrix of regression.
Replacing λ gives:
σ ^ S 1 ( s 0 ) = C y 11 ( 0 ) + ( 1 π 1 ) v 1 λ ( c + X m )
if (this condition shows evidence that the ordinary co-kriging can be seen as a particular case of the universal co-kriging):
X = ( 1 0 0 1 )
v = nugget effect, it is composed of microscale variations added to the measurement error (this tool measuring the starting point of the semivariogram is far from the origin of the axis that is the point of null error).
π = this coefficient multiplied by v allows the definition of σ 2 .
Empirical Bayesian Kriging was chosen compared to the IDW and co-kriging based on altitude, which is the most correlated topographical parameter [39]. EBK was chosen compared to IDW because of a lower statistical error, while it was chosen for different reasons in comparison with co-kriging. In fact, EBK gives worse results than co-kriging, even if it was much faster in the application (Table 2).
The Empirical Bayesian Kriging function was used to calculate the reference time series of both precipitation and temperature. It takes up to the maximum 10 neighbouring weather stations and the simulations were set to 1000. The values of the reference time series were calculated through the interpolation of the neighbouring values, in the same point of the candidate one for each interval of sampling (10 days temperature and monthly precipitations) [40].
This reconstructed reference time series was subtracted from the candidate one for each value of temperature or precipitation in the period of study. Thus, the resulting values were averaged to identify a mean difference between reference and candidate time series for the period of study in each interval of sampling. Lastly, the difference between reference and candidate time series was subtracted from the reference one to predict the values of the candidate in the time intervals where data are missing.

3. Results

Gross error checking and internal consistency checking detected 75 erroneous data points for temperature and 200 for precipitation. Some of these were typographical errors which have been corrected: thus, only 47 temperature and 152 precipitation data points have been removed (Table 3). The tolerance test has detected several errors in the data, even if in this test there is the chance to have QC = 1 (data under investigation) QC = 0 (correct data). The same codes are detected from the first spatial consistency and the temporal consistency. Finally, with the last spatial consistency the codes are QC = 2 or QC = 0, in order to know if the data under investigation should be deleted or accepted (Figure 3).
Therefore, it is useful to assess (Table 4) how many false positive and real positive results were detected in the analysis. Some data after the tolerance test and temporal consistency have been placed under investigation, although most of the QC = 2 have been detected after the spatial consistency.
The outcome of this analysis is the elimination of 375 records from 1,821,054 (0.02%) temperature data points and 1286 out of 77,021 (1.67%) precipitation data points during the period 1931–2014 in the province of Macerata. Table 5 shows the distribution of temperatures and precipitation in each standard period with the higher amount of incorrect data in the last period, although the most recent period lacks sixyears of data to complete the new reference standard period, as prescribed by the WMO (1991–2020).
However, whilst this augmentation of incorrect data in the last period could be caused by the greater number of weather stations, it could also be due to some weather stations being affected by systematic errors for several years. The increase of incorrect data with the number of rain gauges has also been observed. Furthermore, one of the most important goals is represented by the reconstruction of 112 data points for temperature and 59 for precipitation. In this case, after the definition of the reference weather stations for each candidate one, the Empirical Bayesian Kriging (EBK) process was carried out. The EBK obtained good results after the cross-validation with a test dataset needed to compare the measured value with the predicted one. The difference between the predicted data value and the measured one at the location of the candidate weather station, analysed with statistical operators (mean error (ME), root mean square error (RMSE), average standardized error (ASE), mean standardized error (MSE), root mean square error standardized (RMSSE)) allowed an estimation of the goodness of the interpolation [41]. Temperature and precipitation are both well interpolated by the EBK (Table 6), although the temperature result is definitely better than for precipitation because daily data temperature have been tested, instead of monthly ones for precipitation.

4. Conclusions

This procedure may contribute a standard way to validate and reconstruct climate data. The WMO prescribes some procedures for quality control without specific sequences and operational processes. However, if a standard procedure for each climate or geographical condition was established it should be possible to produce more reliable data for climate analysis. Instead, the data reconstruction can be considered as a standard process that can be used in each region without calibration, provided that an appropriate proximity of weather stations is available. In this case, on the basis of root mean square error observations, the presence of at least five weather stations within a distance of 10 km from the reconstructed one for precipitation, and 20 km for temperature, can be considered adequate. A limit of the quality control method is that it can be applied only in regions with temperate climate, as the thresholds used to analyse the data take into account the variability of typical temperate zones. However, this procedure can be a useful tool to validate data under different climate patterns after an accurate calibration. It is also important to note that spatial consistency analysis can adequately assess the values of mountain weather stations. In fact, the percentage of data with QC = 2 is the same for all weather stations and for mountain weather stations as far as temperatures are concerned. For precipitation, the percentage with code QC = 2 is clearly increasing, probably due to strong winds thatdo not allow a correct calculation of the rain, which is always underestimated. In fact, the precipitation values of mountain weather stations are, in some cases, lower than those of the hills and this may be a point to investigate further. In conclusion, these procedures are indispensable for climate and for all sciences in which data can be affected by errors, to obtain an analysis of proven accuracy.

Author Contributions

M.G., M.B., P.B. and F.’. analyzed data. M.G. and M.B. conceived and designed the experiments. M.G. and F.D’. wrote the paper. P.B. checked the language.

Acknowledgments

No funds of any kind have been received for this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cressman, G.P. An operational objective analysis system. Mon. Weather Rev. 1959, 87, 367–374. [Google Scholar] [CrossRef]
  2. Zahumenský, I. Guidelines on Quality Control Procedures for Data from Automatic Weather Stations; WMO (World Meteorological Organization): Geneva, Switzerland, 2004. [Google Scholar]
  3. Filippov, V.V. Quality Control Procedures for Meteorological Data; Tech. Rep. 26; WMO (World Meteorological Organization): Geneva, Switzerland, 1968. [Google Scholar]
  4. Eischeid, J.K.; Bruce Baker, C.; Karl, T.R.; Diaz, H.F. The Quality Control of Long-Term Climatological Data Using Objective Data Analysis. J. Appl. Meteorol. 1995, 34, 2787–2795. [Google Scholar] [CrossRef] [Green Version]
  5. Boyer, T.; Levitus, S. Quality Control and Processing of Historical Oceanographic Temperature, Salinity, and Oxygen Data; NOAA Technical Report NESDIS 81: Washington, DC, USA, 1994. [Google Scholar]
  6. Peterson, T.C.; Vose, R.; Schmoyer, R.; Razuvaëv, V. Global historical climatology network (GHCN) quality control of monthly temperature data. Int. J. Climatol. 1998, 18, 1169–1179. [Google Scholar] [CrossRef] [Green Version]
  7. Meek, D.W.; Hatfield, J.L. Data quality checking for single station meteorological databases. Agric. For. Meteorol. 1994, 36, 85–109. [Google Scholar] [CrossRef]
  8. Cheng, A.R.; Lee, T.H.; Ku, H.I.; Chen, Y.W. Quality Control Program for Real-Time Hourly Temperature Observation in Taiwan. J. Atmos. Ocean. Technol. 2016, 33, 953–976. [Google Scholar] [CrossRef]
  9. Qi, Y.; Martinaitis, S.; Zhang, J.; Cocks, S. A Real-Time Automated Quality Control of Hourly Rain Gauge Data Based on Multiple Sensors in MRMS System. J. Hydrometeor. 2016, 17, 1675–1691. [Google Scholar] [CrossRef]
  10. Svensson, P.; Björnsson, H.; Samuli, A.; Andresen, L.; Bergholt, L.; Tveito, O.E.; Agersten, S.; Pettersson, O.; Vejen, F. Quality Control of Meteorological Observations. Available online: https://www.researchgate.net/publication/238738578_Quality_Control_of_Meteorological_Observations_Description_of_potential_HQC_systems (accessed on 3 June 2018).
  11. Boulanger, J.P.; Aizpuru, J.; Leggieri, L.; Marino, M. A procedure for automated quality control and homogenization of historical daily temperature and precipitation data (APACH): Part 1: Quality control and application to the Argentine weather service stations. Clim. Chang. 2010, 98, 471–491. [Google Scholar] [CrossRef]
  12. Acquaotta, F.; Fratianni, S.; Venema, V. Assessment of parallel precipitation measurements networks in Piedmont, Italy. Int. J. Climatol. 2016, 36, 3963–3974. [Google Scholar] [CrossRef]
  13. Mekis, E.; Vincent, L. An overview of the second generation adjusted daily precipitation dataset for trend analysis in Canada. Atmos. Ocean 2011, 2, 163–177. [Google Scholar] [CrossRef]
  14. Sciuto, G.; Bonaccorso, B.; Cancelliere, A.; Rossi, G. Probabilistic quality control of daily temperature data. Int. J. Climatol. 2013, 33, 1211–1227. [Google Scholar] [CrossRef]
  15. Wang, X.; Chen, H.; Wu, Y.; Feng, Y.; Pu, Q. New techniques for the detection and adjustment of shifts in daily precipitation data series. J. Appl. Meteorol. Climatol. 2010, 49, 2416–2436. [Google Scholar] [CrossRef]
  16. Alexander, L.; Yang, H.; Perkins, S. ClimPACT—Indices and Software in User Manual. In Guide to Climatological Practices; WMO (World Meteorological Organization): Geneva, Switzerland, 2009; Available online: http://www.wmo.int/pages/prog/wcp/ccl/opace/opace4/meetings/documents/ETCRSCI_software_documentation_v2a.doc (accessed on 3 June 2018).
  17. Aguilar, E.; Auer, I.; Brunet, M.; Peterson, T.C.; Wieringa, J. Guidance on Metadata and Homogenization; WMO (World Meteorological Organization): Geneva, Switzerland, 2003. [Google Scholar]
  18. Jeffrey, S.J.; Carter, J.O.; Moodie, K.B.; Beswick, A.R. Using spatial interpolation to construct a comprehensive archive of Australian climate data. Environ. Model Softw. 2001, 16, 309–330. [Google Scholar] [CrossRef]
  19. Coulibaly, P.; Evora, N.D. Comparison of neural network methods for infilling missing daily weather records. J. Hydrol. 2007, 341, 27–41. [Google Scholar] [CrossRef]
  20. Eccel, E.; Cau, P.; Ranzi, R. Data reconstruction and homogenization for reducing uncertainties in high-resolution climate analysis in Alpine regions. Theor. Appl. Climatol. 2012, 110, 345–358. [Google Scholar] [CrossRef] [Green Version]
  21. Mitchel, A. The ESRI Guide to GIS analysis, volume 2: Spatial measurements and statistics. In ESRI Guide GIS Analysis; FAO: Rome, Italy, 2005. [Google Scholar]
  22. Kolahdouzan, M.; Shahabi, C. Voronoi-based k nearest neighbor search for spatial network databases. In Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30; VLDB Endowment: San Jose, CA, USA, 2004; pp. 840–851. [Google Scholar]
  23. Köppen, W. Versuch einer Klassifikation der Klimate, vorzugsweise nach ihren Beziehungen zur Pflanzenwelt. Geogr. Zeitschr. 1900, 6, 593–611. [Google Scholar]
  24. Geiger, R. Landolt-Börnstein—Zahlenwerte und Funktionenaus Physik, Chemie, Astronomie, Geophysik und Technik; alte Serie Vol. 3; der Klimatenach, C.K., Köppen, W., Eds.; Springer: Berlin, Germany, 1954; pp. 603–607. [Google Scholar]
  25. Fratianni, S.; Acquaotta, F. Landscapes and Landforms of Italy; Marchetti, M., Soldati, M., Eds.; Springer: Berlin, Germany, 2017; pp. 29–38. [Google Scholar]
  26. Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46, 234–240. [Google Scholar] [CrossRef]
  27. Grykałowska, A.; Kowal, A.; Szmyrka-Grzebyk, A. The basics of calibration procedure and estimation of uncertainty budget for meteorological temperature sensors. Meteorol. Appl. 2015, 22, 867–872. [Google Scholar] [CrossRef] [Green Version]
  28. Martin, W.B.; Gnanadesikan, R. Probability plotting methods for the analysis for the analysis of data. Biometrika 1968, 55, 1–17. [Google Scholar]
  29. Lilliefors, H.W. On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Am. Stat. Assoc. 1967, 62, 399–402. [Google Scholar] [CrossRef]
  30. Hae-Young, K. Statistical notes for clinical researchers: Assessing normal distribution (2) using skewness and kurtosis. Restor. Dent. Endod. 2013, 38, 52–54. [Google Scholar]
  31. Hackshaw, A. Statistical Formulae for Calculating Some 95% Confidence Intervals. In A Concise Guide to Clinical Trials; Wiley-Blackwell: West Sussex, UK, 2007; pp. 205–207. [Google Scholar]
  32. Omar, M.H. Statistical Process Control Charts for Measuring and Monitoring Temporal Consistency of Ratings. J. Educ. Meas. 2010, 47, 18–35. [Google Scholar] [CrossRef]
  33. Schönwiese, C.D. Praktische Methoden für Meteorologen und Geowissenschaftler; Schweizerbart Science Publishers: Stuttgart, Germany, 2006; pp. 232–234. [Google Scholar]
  34. Bono, E.; Noto, L.; La Loggia, G. Tecniche di analisi spaziale per la ricostruzione delle serie storiche di dati climatici. In Atti del Convegno 9a Conferenza Nazionale ASITA; CINECA IRIS: Catania, Italy, 2005. [Google Scholar]
  35. Easterling, D.R.; Peterson, T.C. A new method for detecting undocumented discontinuities in climatological time series. Int. J. Clim. 1995, 15, 369–377. [Google Scholar] [CrossRef]
  36. Jung-Woo, K.; Pachepsky, Y.A. Reconstructing missing daily precipitation data using regression trees and artificial neural networks for SWAT stream flow simulation. J. Hydrol. 2010, 394, 305–314. [Google Scholar]
  37. Johnston, K.; VerHoef, J.M.; Krivoruchko, K.; Lucas, N. Appendix A. In Using ArcGIS Geostatistical Analyst; ESRI: Redlands, CA, USA, 2001; pp. 247–273. [Google Scholar]
  38. Krivoruchko, K. Empirical Bayesian Kriging; Esri: Redlands, CA, USA, 2012. [Google Scholar]
  39. Gentilucci, M.; Bisci, C.; Burt, P.; Fazzini, M.; Vaccaro, C. Interpolation of Rainfall Through Polynomial Regression in the Marche Region (Central Italy). In Lecture Notes in Geoinformation and Cartography; Mansourian, A., Pilesjö, P., Harrie, L., van Lammeren, R., Eds.; Springer: Cham, Switzerland, 2018. [Google Scholar]
  40. Vicente-Serrano, S.M.; Beguería, S.; López-Moreno, J.I.; García-Vera, M.A.; Stepanek, P. A complete daily precipitation database for northeast Spain: Reconstruction, quality control, and homogeneity. Int. J. Climatol. 2010, 30, 1146–1163. [Google Scholar] [CrossRef] [Green Version]
  41. Robinson, T.P.; Metternicht, G. Testing the performance of spatial interpolation techniques for mapping soil properties. Comput. Electron. Agric. 2006, 50, 97–108. [Google Scholar] [CrossRef]
Figure 1. Area of study (without weather stations outside study area), province of Macerata, Central Italy.
Figure 1. Area of study (without weather stations outside study area), province of Macerata, Central Italy.
Geosciences 08 00202 g001
Figure 2. Mind-map graph of data validation.
Figure 2. Mind-map graph of data validation.
Geosciences 08 00202 g002
Figure 3. An example of spatial consistency QC = 2 for Pollenza OGSM in the period 1961–1990.
Figure 3. An example of spatial consistency QC = 2 for Pollenza OGSM in the period 1961–1990.
Geosciences 08 00202 g003
Table 1. Weather stations for precipitation and temperature (St. N. = number of each weather station; PDA = period of data availability; Sensor = if weather station detect temperatures, precipitation, or both).
Table 1. Weather stations for precipitation and temperature (St. N. = number of each weather station; PDA = period of data availability; Sensor = if weather station detect temperatures, precipitation, or both).
St. N.PDASensorWeather StationLat.Long.Altitude (m)
11931–2007PAcquasanta42°46′13°25′392
21931–2014PAmandola42°59′13°22′550
31931–2012PAmatrice42°38′13°17′954
41951–2014PAncona Baraccola43°34′13°31′37
51931–2014PAncona Torrette43°36′13°27′6
61931–2009PApiro43°23′13°8′516
72009–2014P-TApiro 243°25′13°5′270
81931–1956PAppennino42°59′13°5′798
91931–1976PAppignano43°22′13°21′199
101999–2014PAppignano 243°22′13°20′195
111931–2014PArquata del Tronto42°46′13°18′720
121931–2013PAscoli Piceno42°51′13°36′136
131931–2006PBolognola Paese42°59′13°14′1070
141967–2014P-TBolognola Pintura RT20143°00′13°14′1352
151931–1950PCaldarola43°8′13°13′314
161931–1996PCamerino43°8′13°4′664
171999–2014P-TCamerino 243°8′13°4′581
181931–2014PCampodiegoli43°18′12°49′507
191931–2007PCapo il Colle42°50′13°28′539
201931–2007PCapodacqua42°44′13°14′817
211931–2014PCase San Giovanni43°23′13°2′620
221999–2014P-TCastelraimondo43°13′13°2′410
231931–1963PCastelraimondo43°13′13°2′307
241931–1963PChiaravalle43°36′13°20′25
251931–2008P-TCingoli43°22′13°13′631
261999–2014P-TCingoli 243°25′13°10′494
271999–2014P-TCingoli 342°23′13°15′265
281997–2014PCivitanova Marche OGSM43°17′13°44′10
291931–2009PCivitella del Tronto42°46′13°40′589
301931–1976PCorridonia43°15′13°30′255
311951–2014PCroce di Casale42°55′13°26′657
321931–2007PCupramontana43°27′13°7′506
331934–2007PDiga di Carassai43°2′13°41′130
341967–2006PDiga di Talvacchia42°47′13°31′515
351931–1951PDignano43°1′12°56′873
361931–1976PElcito43°19′13°5′824
371999–2014P-TEsanatoglia43°15′12°56′608
381931–2008P-TFabriano RM181043°20′12°54′357
391964–1989PFalconaraAeroporto43°38′13°22′9
401933–2007P-TFermo RM222043°10′13°43′280
411931–2007PFilottrano43°26′13°21′270
421999–2014PFiastra43°02′13°16′747
431931–2007PFiume di Fiastra43°2′13°10′618
441931–2014PGelagna Alta43°5′13°0′711
451931–1989PGrottazzolina43°6′13°36′200
461931–1949P-TGualdoTadino43°14′12°47′535
471931–2007P-TJesi43°31′13°15′96
481932–2008PLoreto RM194043°26′13°36′127
491932–2008P-TLornano43°17′13°25′232
501931–2007PLoro Piceno43°10′13°25′435
511970–2014P-TMacerata OGSM43°18′13°25′303
521999–2014P-TMacerata Montalbano43°18′13°25′294
531999–2014P-TMacerata 343°14′13°24′146
541999–2014P-TMatelica43°18′13°0′325
551951–2013P-TMoie43°30′13°8′110
561999–2014P-TMonte BoveSud42°55′13°11′1917
571931–2007PMontecarotto43°31′13°4′388
581931–2006PMontecassiano43°22′13°26′215
591999–2014P-TMontecavallo42°59′12°59′960
601999–2014P-TMontecosaro43°17′13°38′45
611999–2014PMontecosaro 243°17′13°38′50
621931–1951P-TMontefano43°24′13°26′242
631999–2014PMontefano 243°25′13°27′144
641999–2014PMontelupone43°22′13°35′29
651931–2007P-TMontemonaco RM223042°54′13°19′987
661999–2014P-TMonteprata42°54′13°13′1813
671931–2007PMonterubbiano43°5′13°43′463
681931–2014PMorrovalle43°19′13°35′246
691999–2014P-TMuccia43°4′13°4′430
701931–2014P-TNocera Umbra43°07′12°47′535
711931–2014P-TNorcia42°48′13°06′691
721931–2012POsimo città RM192043°29′13°29′265
731931–2007PPedaso43°6′13°51′4
741932–2002PPetriolo43°13′13°28′271
751931–2007PPié del Sasso42°59′13°0′711
761931–2013PPievebovigliana43°3′13°5′451
771931–2013PPioraco RM197043°11′12°59′441
781931–1957P-TPoggioSorifa43°9′12°52′552
791970–2014PPollenza OGSM43°15′13°24′158
801999–2014P-TPollenza 243°16′13°19′170
811999–2014P-TPorto Recanati43°25′13°40′0
821936–2007P-TPorto Sant’Elpidio RM216043°15′13°46′3
831931–1984PPreci42°53′13°02′907
841935–1991PRagnola42°55′13°53′10
851975–2014PRecanati OGSM ITIS43°25′13°32′243
861931–2006PRecanati RM202043°24′13°33′235
871931–2014PRipatransone43°00′13°46′494
881931–1946PSan Gregorio di Camerino43°9′13°0′754
891931–1961PSan Maroto43°5′13°8′555
901953–2002PSan Martino42°44′13°27′783
911931–1963PSan Severino Marche RM199843°14′13°11′344
921964–1984PSan Severino OGSM43°15′13°14′180
931931–1989PSant’Angelo in Pontano RM215043°6′13°24′473
941999–2014P-TSant’Angelo in Pontano 243°6′13°23′373
951931–2008PSanta Maria di Pieca43°4′13°17′467
961931–2007P-TSarnano43°2′13°18′539
971999–2014PSassotetto43°1′13°14′1365
981931–2014PSassoferrato43°26′12°52′312
991950–1987PSellano42°53′12°55′604
1001933–2000PSerralta RM200043°19′13°11′546
1011938–1976P-TSerrapetrona43°11′13°11′450
1021999–2014PSerrapetrona 243°11′13°13′437
1031931–2008PSerravalle di Chienti RM203043°4′12°57′647
1041999–2014P-TSerravalle di Chienti 243°0′12°54′925
1051932–2014P-TServigliano RM219043°5′13°30′215
1061931–2008PSorti43°7′12°57′672
1071931–2008PTolentino RM209043°12′13°17′244
1081999–2014P-TTolentino 243°14′13°23′183
1091998–2014PTolentino 343°13′13°17′224
1101931–1964P-TTreia43°17′13°18′230
1111999–2014PTreia 243°18′13°18′342
1121931–1951PUrbisaglia43°12′13°22′311
1131931–1979PUssita42°57′13°08′744
1142000–2014P-TUssita 242°57′13°08′749
1152000–2014PVilla Potenza43°20′13°26′133
1161931–2007PVille Santa Lucia43°11′12°51′664
1171931–1971PVisso42°56′13°05′607
1181999–2014P-TVisso 243°0′13°07′978
Table 2. Example of comparison between three interpolation methods for reconstruction of daily temperatures.
Table 2. Example of comparison between three interpolation methods for reconstruction of daily temperatures.
IDWEBKCo-Kriging
Regression function0.6221x + 6.83660.6813x + 5.71130.9400x + 1.2166
Mean0.01190.03110.0566
Root-mean-square1.68701.64291.2465
Mean standardized −0.00020.0237
Rootmeansquarestandardized 0.95140.9890
Average standard error 1.73661.5278
Table 3. Temperature and precipitation data removed after the last spatial consistency (example) from 1931–1960; data removed from temperature QC = 2 T; data removed from precipitation QC = 2 P.
Table 3. Temperature and precipitation data removed after the last spatial consistency (example) from 1931–1960; data removed from temperature QC = 2 T; data removed from precipitation QC = 2 P.
Weather StationQC = 2 TQC = 2 PWeather StationQC = 2 TQC = 2 P
Amandola 1Nocera Umbra 1014
Apiro 2111Norcia111
Appennino 1Osimo città RM1920 3
Arquata del Tronto 2Petriolo 1
Bolognola Paese Pievebovigliana 2
Bolognola Pintura RT2011 Pioraco 2
Camerino7 Pollenza OGSM 4
Cingoli9 Pollenza 23
Cingoli 2112Sant’Angelo in Pontano 1
Civitanova Marche OGSM 2Recanati 1
Dignano3 Sarnano 1
Fabriano RM1810312S. Severino M. RM1998 1
Fermo511Sellano 5
Gualdo Tadino1 Serravalle di C. RM2030 1
Jesi94Serravalle di C. 2 2
Loro Piceno 1Servigliano RM219012
Lornano101Sorti 1
Matelica3 Tolentino OGSM 4
M. Bove Sud RT20751Tolentino 24
Montecassiano 25Urbisaglia 6
Montefano 2Ussita 251
Montemonaco61Ville Santa Lucia 1
Monteprata RT20681Visso 4
Muccia ST266 Visso 29
Table 4. Summarizing table temperature and precipitation data removed after the spatial consistency check.
Table 4. Summarizing table temperature and precipitation data removed after the spatial consistency check.
QC = 0 TQC = 1 TQC = 2 TQC = 0 PQC = 1 PQC = 2 P
Gross error1,821,039-1576,981-40
Internal consistency1,821,007-3276,869-112
Tolerance Test1,820,92582-76,662207-
Temporal consistency1,820,767240-76,489380-
Spatial Consistency1,820,679-32875,735-1134
Table 5. Data deleted for each WMO standard period.
Table 5. Data deleted for each WMO standard period.
1931–19601961–19901991–2014
Deleted temperature data78 (0.017%)137 (0.023%)160 (0.021%)
Deleted precipitation data351 (1.52%)363 (1.34%)572 (1.89%)
Table 6. Average results for goodness of reconstruction: statistical operators.
Table 6. Average results for goodness of reconstruction: statistical operators.
MERMSEASEMSERMSSE
Temperature−0.151.461.62−0.0210.97
Precipitation−0.229.7710.01−0.0130.98

Share and Cite

MDPI and ACS Style

Gentilucci, M.; Barbieri, M.; Burt, P.; D’Aprile, F. Preliminary Data Validation and Reconstruction of Temperature and Precipitation in Central Italy. Geosciences 2018, 8, 202. https://doi.org/10.3390/geosciences8060202

AMA Style

Gentilucci M, Barbieri M, Burt P, D’Aprile F. Preliminary Data Validation and Reconstruction of Temperature and Precipitation in Central Italy. Geosciences. 2018; 8(6):202. https://doi.org/10.3390/geosciences8060202

Chicago/Turabian Style

Gentilucci, Matteo, Maurizio Barbieri, Peter Burt, and Fabrizio D’Aprile. 2018. "Preliminary Data Validation and Reconstruction of Temperature and Precipitation in Central Italy" Geosciences 8, no. 6: 202. https://doi.org/10.3390/geosciences8060202

APA Style

Gentilucci, M., Barbieri, M., Burt, P., & D’Aprile, F. (2018). Preliminary Data Validation and Reconstruction of Temperature and Precipitation in Central Italy. Geosciences, 8(6), 202. https://doi.org/10.3390/geosciences8060202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop