Filling Gaps in Hourly Air Temperature Data Using Debiased ERA 5 Data

Missing data in hourly and daily temperature data series is a common problem in long-term data series and many observational networks. Agricultural and environmental models and climate-related tools can be used only if weather data series are complete. To support user communities, a technique for gap filling is developed based on the debiasing of ERA5 reanalysis data, the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalyses of the global climate. The debiasing procedure includes in situ measured temperature. The methodology is tested for different landscapes, latitudes, and altitudes, including tropical and midlatitudes. An evaluation of results in terms of root mean square error (RMSE) obtained using hourly and daily data is provided. The study shows very low average RMSE for all gap lengths ranging from 1.1 ◦C (Montecristo, Italy) to 1.9 ◦C (Gumpenstein, Austria).


Introduction
Gaps in measured meteorological data are a common problem in measurement networks and experimental campaigns.If data are measured using an automated weather station (AWS), the most frequent causes of data gaps are related to data transfer, data logging and/or sensor malfunctioning, exceptional equipment maintenance, or the removal of erroneous or unreasonable recorded data.If the purpose of a measurement network is not only to monitor microclimate but to provide input data for assessment studies and models in, for example, agriculture, hydrology, or urban modelling, then complete data series are necessary.
There are many methods for filling gaps in near-surface air temperature data (up to 2 m) based on different statistical techniques that use historical data, in situ measurements, and objective analysis for the spatial interpolation of data [1].A comprehensive overview of such methods can be found in, for example, Henn et al. [2] and Vuichard and Papale [3].Thesegap-filling methods can be separated into two groups: spatial [4][5][6][7][8][9][10][11][12][13] and temporal [3,[14][15][16][17][18][19][20].Spatial gap-filling for temperature data is often based on interpolation techniques such as inverse-distance weighting (IDW) [4], kriging [6][7][8], multiple regressions [9], and thin-plate splines [10].Gap-filling techniques that rely on the correct representation of the local observed lapse rate require a certain number of measuring stations to demonstrate skill [5,[11][12][13].On the other hand, temporal gap-filling methods rely on the autocorrelation of the meteorological time series.Claridge and Chen [14] used polynomial fitting and simple linear interpolation.In another study [15], missing data was reconstructed using a linear combination of forecasts and hindcasts.Methods such as empirical orthogonal functions (EOF), which employ a singular value decomposition algorithm, combine spatial and temporal interpolation [16,17], however also strongly depend on the number of surrounding stations and the size of the data gap [2].The latest of these methods [3] also offers a comprehensive technique for filling weather data gaps in continuous data measurements at FLUXNET sites with the ERA-Interim global atmospheric reanalysis of the European Centre for Medium-Range Weather Forecasts (ECMWF).
This study tests a procedure for filling data gaps based on ERA5 reanalysis and bias correction performed using in situ measurements.The method's performance is evaluated by systematically removing available temperature data, producing gaps of different lengths, and then evaluating the technique's ability to reconstruct the gaps.Additionally, the performance is measured in terms of the root-mean-square error (RMSE) as a function of day of the year (DOY) and gap length.
The main objective of this paper is to present methods and tools used for gap filling and estimate uncertainties in gap-filled data and offer a step-by-step procedure that can be applied by professionals who are not necessarily meteorologists but have certain IT and data management knowledge.After a presentation of the data sets used, the developed method for gap filling, and the performance measurement results, the Supplementary Materials offer a more detailed presentation and visualization of the debiasing technique.

ERA5 Reanalysis Data
Reanalysis is a scientific method that aims to provide weather and climate conditions at regular intervals over long time periods.This method is based on a combination of data assimilation and numerical models.The data assimilation method is a powerful mathematical technique that combines millions of irregularly placed observations from different sources and the equations governing the numerical model, to provide an estimate of the state of the atmosphere-surface system.Available historical observations are assimilated providing the initial state for the forecast model run-an analysis which is a best fit of the numerical model to the available data.Reanalysis outputs are meteorological fields in a regular grid with reasonable temporal resolution during long time periods that can provide insight into climate conditions.It should be stressed that a combination of possible errors and approximations in observation and models can introduce variability and biases into reanalysis output.Therefore, even if a given model ideally simulates atmospheric processes, the interpolation technique produces deviations between calculated and real meteorological conditions.Therefore, while using reanalysis data, it is important to keep in mind that the global observation network is sparser: (a) over water than over land; and (b) over the tropics than over high-and midlatitudes.These features have two important consequences: (1) less observational points are used to produce data for the NWP model grid points, thus reducing the representativeness of NWP; and (2) due to the lack of data, inadequate parameterization of physical processes in NWP leads to errors [21], which are more emphasized over the sea and in the tropics than over land at high-and midlatitudes.
Atmospheric flow is strongly influenced by different features on the Earth's surface (e.g., mountains, vegetation, and buildings) which act as sources of friction and flow obstacles and affect energy and water balances.In NWP models, the mean orography or vegetation type is representative of orography/vegetation types with scales equal to or larger than the horizontal resolution of the model; therefore, mean orography/vegetation is directly considered in the basic model equations.Processes related to orography/vegetation with scales below the grid scales (sub-grid scale) are subject to special sub-grid parameterization modules, which can be a source of uncertainties in NWP model outputs, i.e., reanalysis climatology [22,23].For example, Lim et al. [24] stressed that NCEP-NCAR reanalysis surface data do not assimilate surface observations over land, and are therefore not sensitive to land properties [24].Over more than three decades, sophisticated data sources (satellite observations and atmospheric soundings) have been introduced in atmospheric reanalysis to improve the final product.However, the tropics and seas are domains where a lower quality of data is expected, while the sub-grid parameterization of orography and vegetation remains a possible source of uncertainties.
ERA5 is the fifth generation of the ECMWF's atmospheric reanalyses of the global climate [25].ERA5 was produced using 4D-Var data assimilation in CY41R2 of the ECMWF's Integrated Forecast System (IFS), with 137 hybrid sigma/pressure model levels in the vertical, with the top level at 0.01 hPa.CY41R2 was launched in 2016 and allowed a substantial improvement in horizontal resolution to 9 km, e.g., up to 904 million numerical points.
The assimilation of weather observations into the forecasting system was also improved in CY41R2.The 4D-Var data assimilation uses 12-h assimilation windows from 0900 to 2100 UTC and 2100 to 0900 UTC (the following day) based on a 10-member ensemble at a horizontal resolution of 62 km.Observations are organized in half-hourly timeslots.Combining vast sources of historical observations with numerical models and advanced data assimilation systems, ERA5 provides hourly global atmospheric data estimates at a grid horizontal resolution of 30 km.Data produced in ERA5 are most valuable as a substitute for observational data where such data is limited or unavailable.

Measured Weather Data
To test the proposed gap-filling method, we used time series of hourly temperature data for five weather stations in different landscapes: lowland, mountain region, desert, and island (Table 1).The origin of ERA5 data suggest that high uncertainties can be expected while dealing with:

•
Canopy measurements due to the impact of plants on micrometeorological conditions that cannot be identified by ERA5 due to the resolution of the land use map and the model itself;

•
Mountain regions due to difficulties resulting from the distribution of orography over grid elements and the representation of mountains in the model used to produce ERA5 data;

•
Islands due to the coupling of sea surface processes and atmospheric processes and, in the case of small islands, their "visibility" on a 30 km resolution grid, which is highly questionable; and • Desert oases, due to the position of these oases in tropical and subtropical areas where the meteorological measurement network is sparse.Lowland datasets were measured within the AWS network of the System of the Forecasting and Warning Service of Plant Protection of Serbia (PIS) (Novi Sad, Serbia).This AWS network was designed and set up to provide information about micrometeorological conditions in plant canopies, orchards, and vineyards to analyze the environmental conditions in which host plants and harmful organisms develop.The AWSs were equipped with sensors for the measurement of air and soil temperature, air relative humidity, soil moisture, precipitation amount, and leaf wetness duration.The temperature sensors were installed in fruit orchards, commonly at the mid-crown level (a height of approximately 1.2 m).All selected locations were in orchards situated in the Vojvodina (Northern Serbia) region, which is a part of the Pannonian lowlands (Figure 1).The data used in the study were measured during the 2014-2017 growing season (1 March-31 October).
Meteorology and Geodynamics) (Vienna, Austria).The Gumpenstein weather station is located in the Austrian province of Styria at an altitude of 700 m a.s.l.(Figure 2).The temperature measurements were made at a height of 2m in a standard meteorological shelter.Selected data series were measured during the 2014−2017 period.For the purpose of the study, the 01 May−30 November 2017 period was selected.The desert dataset was measured in the el-Bahariya oasis in Egypt in the framework of the MosqDyn project AWS network.This AWS network was established and set up to monitor micrometeorological conditions and their suitability for the abundance and activity of local mosquito populations.The AWS was equipped with sensors for air temperature, precipitation, and relative humidity measurements.El-Bahariya is a depression and oasis in the Western Desert of Egypt located in the Giza Governorate.The oasis is situated 370 km southwest of Cairo.The valley hosts many guava, mango, and date groves and has a roughly oval shape covering an area of 2000 km².The sensors were installed at a height of 2 m in the northern part of the oasis (Figure 3).
Island datasets were measured at the islands of Montecristo and Pianosa, Italy (Figures 4 and 5) in the framework of the Environmental Modelling and Monitoring Laboratory for Sustainable Development (LAMMA) consortium activities [26].Weather stations were located on stony elevated terrain that is highly exposed to atmospheric circulations and performed the following measurements: air temperature, humidity, wind direction, wind speed, precipitation, and the intensity of incoming solar radiation.Data used in this study were measured during the 1 May-15 November 2016 period.The desert dataset was measured in the el-Bahariya oasis in Egypt in the framework of the MosqDyn project AWS network.This AWS network was established and set up to monitor micrometeorological conditions and their suitability for the abundance and activity of local mosquito populations.The AWS was equipped with sensors for air temperature, precipitation, and relative humidity measurements.El-Bahariya is a depression and oasis in the Western Desert of Egypt located in the Giza Governorate.The oasis is situated 370 km southwest of Cairo.The valley hosts many guava, mango, and date groves and has a roughly oval shape covering an area of 2000 km 2 .The sensors were installed at a height of 2 m in the northern part of the oasis (Figure 3).Island datasets were measured at the islands of Montecristo and Pianosa, Italy (Figures 4  and 5) in the framework of the Environmental Modelling and Monitoring Laboratory for Sustainable Development (LAMMA) consortium activities [26].Weather stations were located on stony elevated terrain that is highly exposed to atmospheric circulations and performed the following measurements: air temperature, humidity, wind direction, wind speed, precipitation, and the intensity of incoming solar radiation.Data used in this study were measured during the 1 May-15 November 2016 period.

Gap Filling Method Description
The designed gap filling method is based on the following assumptions:

Gap Filling Method Description
The designed gap filling method is based on the following assumptions:

Gap Filling Method Description
The designed gap filling method is based on the following assumptions:

•
In the case of temperature data, due to the high autocorrelation of the time series, important information is contained within the time series before and after the gap [27].Therefore, a portion of the time series that is not missing is used for the bias correction of ERA5 data; • It is important to limit the amount of data used for the gap filling method (learning period) to avoid seasonal changes in temperature data; however, it is also important to use enough data to be able to exploit the high autocorrelation in the time series.The determination coefficient R2 was used to validate that enough data was present in the fitting process; • Diurnal temperature biases [23] are recognized as a possible problem and new technique for filtering the learning data was applied to improve this methodology.

•
To efficiently fill gaps in large datasets, a simple, fast, and reliable approach of linear regression was selected.
To fill the gaps in the time series, it was necessary to find the beginning of the gap and the gap length (shown in Figure 6a by the black left-to-right arrow).Gap filling was performed step by step by applying the debiasing technique for each missing observation in the gap period.Older missing observations were filled first.We proceeded step by step toward the gap end.During this process already filled values in the gap also influence the gap-filling process for later missing observations in the same gap.

•
Diurnal temperature biases [23] are recognized as a possible problem and new technique for filtering the learning data was applied to improve this methodology.

•
To efficiently fill gaps in large datasets, a simple, fast, and reliable approach of linear regression was selected.
To fill the gaps in the time series, it was necessary to find the beginning of the gap and the gap length (shown in Figure 6a by the black left-to-right arrow).Gap filling was performed step by step by applying the debiasing technique for each missing observation in the gap period.Older missing observations were filled first.We proceeded step by step toward the gap end.During this process already filled values in the gap also influence the gap-filling process for later missing observations in the same gap.
To fill one missing observation in the gap, it was necessary to determine the relationship between the existing observations and the reanalysis for a certain period of time before the missing observation (learning period shown in Figure 6a by the orange left-to-right arrow).The learning period used in this study was 30 days and observations were hourly.The length of the learning period was chosen empirically so that it would be short enough not to be influenced by seasonal changes but long enough to contain an appropriate amount of data for linear regression.
Due to uncertainties in radiation, surface, and Planetary Boundary Layer (PBL) physics schemes in numerical models, diurnal variations in temperature biases can occur.To avoid this problem, a ± 3-h filter was applied to the time series.This way, data from the same part of the day were used.After reducingthe influence of the diurnal bias changes on the debiasing process with the filter, we calculated the linear regression coefficients (Figure 6b) and applied them to the reanalysis values for the missing observation time step.Thus, we reconstructed one-time step in the gap period.Linear regression was chosen as it is a computationally efficient and fast process that satisfies the requirements of this methodology.To fill one missing observation in the gap, it was necessary to determine the relationship between the existing observations and the reanalysis for a certain period of time before the missing observation (learning period shown in Figure 6a by the orange left-to-right arrow).The learning period used in this study was 30 days and observations were hourly.The length of the learning period was chosen empirically so that it would be short enough not to be influenced by seasonal changes but long enough to contain an appropriate amount of data for linear regression.
Due to uncertainties in radiation, surface, and Planetary Boundary Layer (PBL) physics schemes in numerical models, diurnal variations in temperature biases can occur.To avoid this problem, a ± 3-h filter was applied to the time series.This way, data from the same part of the day were used.After reducingthe influence of the diurnal bias changes on the debiasing process with the filter, we calculated the linear regression coefficients (Figure 6b) and applied them to the reanalysis values for the missing observation time step.Thus, we reconstructed one-time step in the gap period.Linear regression was chosen as it is a computationally efficient and fast process that satisfies the requirements of this methodology.

Results
The described methodology was applied to all data sets for different gap lengths and seasons.To assess the effects of the bias correction, the bias score (B), expanded bias uncertainty (U(B)), and root mean square error (RMSE) were calculated for both ERA5-only (RMSE ERA5 ) and ERA5-debiased (RMSE DEB )temperature gaps.Bias (B) is the systematic difference between a model predicted value (P) and a measured reference (M): B = P − M. The expanded uncertainty U(B) is calculated as: where k is the coverage factor and u c (B) is the combined bias uncertainty which is calculated using the first-order uncertainty propagation formula in the following form [28]: where u(P) and u(M) are the standard uncertainties of the predicted and measured values and PCC(P,M) is the Pearson's correlation coefficient (PCC) which is calculated as the determination coefficient for all gap filling steps and for all experiment locations.More details about the expanded bias uncertainty (U(B)) calculation can be found in Appendix A, and plots of B and U(B), for all selected locations and gaps can be found in Appendix B. D'Agostino and Pearson's normality test [29,30] were used to estimate whether the dataset obeys a standard distribution, as the convergence factor of 1.96 is commonly used for normal Gaussian distributions or near-Gaussian distributions for the 95% confidence interval(CI).Since the tests showed that the data is not normally distributed, and for randomly selected parts of the dataset the skewness was above 1/3 and the kurtosis parameter values were negative, we could not calculate a variable coverage factor based on the distribution of each gap, and had to treat the distribution as non-normal and unknown.Therefore, the coverage factor was calculated from Chebyshev's inequality giving k = 4.472 for the 95% CI [31].
Site-specific analysis was performed using RMSE DEB values, the differences between RMSE DEB and RMSE ERA5 and the standard deviation of the observed data used for bias correction (Figures 7-12 and 14-16) as well as the bias and bias uncertainties calculated for each location (Figures A1-A5).

Lowland Data Sets
The RMSE DEB values varied between 0.5 and 2 • C, with commonly high values for up to 5-day gap lengths.Although the blue-dominated plot in the right panel of Figure 7 indicates that the debiased dataset produced lower RMSE values than the ERA5-only dataset, some red areas associated with gaps that were better filled by ERA5-only data are present in the figure.A possible source of this effect is indicated by the standard deviation of the observed data (Figure 8), as periods with relatively high standard deviations (more than 5 • C) can be identified beginning in March (DOY 50-65) and from the end of May until the end of June (DOY 142-182).During both indicated periods, positive bias (Figure A1) indicates that both data sets overestimate measurements; however, it is more pronounced for ERA5-only data.During the growing season (DOY 92-275), the highest bias uncertainties are obtained in the middle of the season when temperatures are highest as well as the exchange of energy and water vapour fluxes between canopy and atmosphere.An air temperature sensor was installed at a height corresponding to the middle of the crown, apple growth was an important factor affecting the variation in the measured air temperature; specifically, when the first leaves appeared (in March), they strongly affected the energy balance of the canopy by increasing the latent heat flux and energy absorbed and emitted by the leaves.Consequently, this affected the canopy air temperature and its variation over the period of intensive plant area development.When crowns formed (in May), the weekly or monthly variation in air temperature caused synoptic-scale atmospheric processes.In that period, intensive or sudden ("squall") winds could instantaneously affect air temperature by increasing air canopy mixing, increasing the variation and standard deviation of the measured air temperature at the hourly scale.Since wind speed measurements were not performed at this location, we can suppose that this is the cause of the increase (albeit impermanent) in the standard deviation during the DOY 142-182 period for gap filling.

Mountain Data Sets
The superiority of the debiasing technique was expected in the case of Gumpenstein station.Figure 2 and Table 1 clearly show that the ERA5 point for this location was 380 m higher in the Alps, which is the result of the sub grid scales of the flat areas where the Gumpenstein AWS is located and the applied parameterization of sub grid-scale processes.Therefore, significant temperature differences between the observations and reanalysis were expected.The obtained bias for ERA5-only data (Figure A2) is more than 6 °C higher than for ERA5-debias data, while bias uncertainties for the debiased data set are much lower for all gap lengths.An interesting increase of bias uncertainty during the cold part of the year can be connected with late and early snowfall in mountains, which can affect temperature measurements and the accuracy of model simulations.High RMSEDEB at DOY 150 (Figure 9, left panel) is partly result of high standard deviation of observed data (Figure 10) which were used for bias correction.Differences in RMSE decreased towards winter (Figure 9), since the vertical temperature gradient in the Alps decreases from April to December [32].Exceptions to this trend occurred in DOY 320-321 and DOY 327-329, when heavy snowing affected measurements and

Mountain Data Sets
The superiority of the debiasing technique was expected in the case of Gumpenstein station.Figure 2 and Table 1 clearly show that the ERA5 point for this location was 380 m higher in the Alps, which is the result of the sub grid scales of the flat areas where the Gumpenstein AWS is located and the applied parameterization of sub grid-scale processes.Therefore, significant temperature differences between the observations and reanalysis were expected.The obtained bias for ERA5-only data (Figure A2) is more than 6 °C higher than for ERA5-debias data, while bias uncertainties for the debiased data set are much lower for all gap lengths.An interesting increase of bias uncertainty during the cold part of the year can be connected with late and early snowfall in mountains, which can affect

Mountain Data Sets
The superiority of the debiasing technique was expected in the case of Gumpenstein station.Figure 2 and Table 1 clearly show that the ERA5 point for this location was 380 m higher in the Alps, which is the result of the sub grid scales of the flat areas where the Gumpenstein AWS is located and the applied parameterization of sub grid-scale processes.Therefore, significant temperature differences between the observations and reanalysis were expected.The obtained bias for ERA5-only data (Figure A2) is more than 6 • C higher than for ERA5-debias data, while bias uncertainties for the debiased data set are much lower for all gap lengths.An interesting increase of bias uncertainty during the cold part of the year can be connected with late and early snowfall in mountains, which can affect temperature measurements and the accuracy of model simulations.High RMSE DEB at DOY 150 (Figure 9, left panel) is partly result of high standard deviation of observed data (Figure 10) which were used for bias correction.Differences in RMSE decreased towards winter (Figure 9), since the vertical temperature gradient in the Alps decreases from April to December [32].Exceptions to this trend occurred in DOY 320-321 and DOY 327-329, when heavy snowing affected measurements and increased the standard deviation of measured data over 2017.

Desert Data Sets
During the prevailing part of the period of interest, the applied bias correction gap-filing technique provided better results than the ERA5-only data (Figure 11) with significantly lower bias and bias uncertainty for all gap widths (Figure A3).However, there were some exceptions to this tendency, which deserve our attention.The desert landscape surrounding the Bahariya oasis AWS is very uniform (see Figure 3), with neither mountains nor significant vegetation to serve as a source of uncertainties in the ERA5 reanalysis.However, the constant inflow of desert air can produce significant variation in air temperature at an hourly time scale.Figure 12 clearly shows that a high standard deviation of the bias correction data for gaps between DOY 120 and DOY 172

Desert Data Sets
During the prevailing part of the period of interest, the applied bias correction gap-filing technique provided better results than the ERA5-only data (Figure 11) with significantly lower bias and bias uncertainty for all gap widths (Figure A3).However, there were some exceptions to this tendency, which deserve our attention.The desert landscape surrounding the Bahariya oasis AWS is

Desert Data Sets
During the prevailing part of the period of interest, the applied bias correction gap-filing technique provided better results than the ERA5-only data (Figure 11) with significantly lower bias and bias uncertainty for all gap widths (Figure A3).However, there were some exceptions to this tendency, which deserve our attention.The desert landscape surrounding the Bahariya oasis AWS is very uniform (see Figure 3), with neither mountains nor significant vegetation to serve as a source of uncertainties in the ERA5 reanalysis.However, the constant inflow of desert air can produce significant variation in air temperature at an hourly time scale.Figure 12 clearly shows that a high standard deviation of the bias correction data for gaps between DOY 120 and DOY 172 corresponded to a high daily variation in air temperature (Figure 13).During September (DOY 250-275) and at the end of the year (after DOY 300), the variability in the daily amplitude was significant, ranging from 7.4 • C to 22.6 • C.This variability affected the bias correction procedure, particularly during the DOY 216-228 period when ERA5 obviously better reproduced the observed air temperature.During this period and the small time window in September, the bias uncertainty of ERA5-only data was lower than that of the debiased data set.

Island Data Sets
Due to the fact that the heat capacity of water is much higher than that of land, the air temperature variation above the sea surface was always much lower than that above land in the study period.Additionally, the daily and annual variation in air temperature above coastal areas and islands was much lower than above far-inland territories.Therefore, it is not surprising that there was a low and uniform (at the annual level) standard deviation in the observed data used for gap filling (Figure 14) for both Montecristo (2.3-2.8 • C) and Pianosa (2.5-2.7 • C) islands.With respect to the ERA5 mean orography, grid cells in which both islands were located were represented by water (Figures 4 and 5), implying a low variation of reanalysis temperature.The last two circumstances demonstrate the high efficacy of the debiasing technique (Figures 15 and 16, left panels) and low differences between RMSE DEB and RMSE ERA5 (Figures 15 and 16, right panels).According to the results presented in Figures A4 and A5, the debiased data set slightly overestimates measurements in the spring and underestimates them in autumn.In the case of ERA5-only data, the situation is exactly the opposite.However, in both locations, bias uncertainty is lower, for all gaps, for debiased data sets than for ERA5-only data sets.

Island Data Sets
Due to the fact that the heat capacity of water is much higher than that of land, the air temperature variation above the sea surface was always much lower than that above land in the study period.Additionally, the daily and annual variation in air temperature above coastal areas and islands was much lower than above far-inland territories.Therefore, it is not surprising that there was a low and uniform (at the annual level) standard deviation in the observed data used for gap filling (Figure 14) for both Montecristo (2.3-2.8 °C) and Pianosa (2.5-2.7 °C) islands.With respect to the ERA5 mean orography, grid cells in which both islands were located were represented by water (Figures 4 and 5), implying a low variation of reanalysis temperature.The last two circumstances demonstrate the high efficacy of the debiasing technique (Figures 15 and 16, left panels) and low differences between RMSEDEB and RMSEERA5 (Figures 15 and 16, right panels).According to the results presented in Figures A4 and A5, the debiased data set slightly overestimates measurements in the spring and underestimates them in autumn.In the case of ERA5-only data, the situation is exactly the opposite.However, in both locations, bias uncertainty is lower, for all gaps, for debiased data sets than for ERA5-only data sets.

Discussion
The gap-filling methodology test results indicate that the presented ERA5-debiasing technique has different efficacies that vary with respect to location, gap width, and the standard deviation of the observed data used for bias correction.However, high values of the Pearson's r coefficient and p-values below 0.05 (Table A1) for all locations and gap lengths indicate a strong linear relationship between ERA5 and the observed data used for gap filling and debiasing.
Overall, mean bias uncertainty and mean RMSE for all gap widths and all DOY are provided in Table 2.It is evident that the new technique significantly decreases bias, bias uncertainty, and root mean square error.The only exception is for Montecristo, for which ERA5 already had a small bias, and therefore the new technique wasnot able to improve the results here.
Particularly important is the propagation of standard uncertainties of input data (predicted and measured air temperature) (Table A2, Appendix A) towards gap bias.The comparison of the standard uncertainty of input data (Table A2) and mean bias uncertainty (Table 2) indicates that U(B) for debiased data is much closer to the uncertainty of input data than in the case of ERA5-only data.

Discussion
The gap-filling methodology test results indicate that the presented ERA5-debiasing technique has different efficacies that vary with respect to location, gap width, and the standard deviation of the observed data used for bias correction.However, high values of the Pearson's r coefficient and p-values below 0.05 (Table A1) for all locations and gap lengths indicate a strong linear relationship between ERA5 and the observed data used for gap filling and debiasing.
Overall, mean bias uncertainty and mean RMSE for all gap widths and all DOY are provided in Table 2.It is evident that the new technique significantly decreases bias, bias uncertainty, and root mean square error.The only exception is for Montecristo, for which ERA5 already had a small bias, and therefore the new technique wasnot able to improve the results here.
Particularly important is the propagation of standard uncertainties of input data (predicted and measured air temperature) (Table A2, Appendix A) towards gap bias.The comparison of the standard uncertainty of input data (Table A2) and mean bias uncertainty (Table 2) indicates that U(B) for debiased data is much closer to the uncertainty of input data than in the case of ERA5-only data.The lowest average RMSE DEB was obtained for island locations (Montecristo = 1.06 • C, Pianosa = 1.26 • C), and the highest average RMSE DEB was obtained for mountain regions (Gumpenstein = 1.88 • C), as was expected according to the elaborated ERA5 reanalysis performances and standard deviations of the observations.However, the high RMSE DEB in the case of small gaps has drawn our attention.For all examined locations, we identified strong correlations (r > 0.7) between the standard deviation of the observed data used for linear regression and RMSE DEB (Figure 17).The Pearson's product-moment correlation coefficient is the test statistic that was used to measure this relationship together with the standardized criteria suggested by Evans [33] for estimating the strength of association.More specifically, we obtained high negative correlations for high RMSE DEB values and high positive correlations for low RMSE DEB values.At a seasonal scale, the correlation among curves was between −0.61 (Montecristo) and −0.60 (Kikinda), indicating a medium-intensity but stable correlation for all locations (Bahariya = −0.56,Gumpenstein = −0.37,Pianosa = −0.27).
We performed a detailed analysis of the measured data used for linear regression and the calculated data for gap filling with respect to the scale of the gap to identify the cause of such systematic behavior.The obtained results lead to the conclusion that RMSE DEB decreases when the standard deviation of the observed data increases with the gap width since for larger standard deviations, the interval of the observed values used for debiasing is broader.Accordingly, in further applications of this technique, it will be important to consider that the standard deviations of measured air temperatures are a result of meteorological and other environmental (the presence and development of plants, high mountains, deserts, and large water bodies) conditions, however the variation in RMSE DEB with standard deviation is the result of the applied calculation methodology.The lowest average RMSEDEB was obtained for island locations (Montecristo = 1.06 °C, Pianosa = 1.26 °C), and the highest average RMSEDEB was obtained for mountain regions (Gumpenstein = 1.88 °C), as was expected according to the elaborated ERA5 reanalysis performances and standard deviations of the observations.However, the high RMSEDEB in the case of small gaps has drawn our attention.For all examined locations, we identified strong correlations (r > 0.7) between the standard deviation of the observed data used for linear regression and RMSEDEB (Figure 17).The Pearson's product-moment correlation coefficient is the test statistic that was used to measure this relationship together with the standardized criteria suggested by Evans [33] for estimating the strength of association.More specifically, we obtained high negative correlations for high RMSEDEB values and high positive correlations for low RMSEDEB values.At a seasonal scale, the correlation among curves was between −0.61 (Montecristo) and −0.60 (Kikinda), indicating a medium-intensity but stable correlation for all locations (Bahariya = −0.56,Gumpenstein = −0.37,Pianosa = −0.27).
We performed a detailed analysis of the measured data used for linear regression and the calculated data for gap filling with respect to the scale of the gap to identify the cause of such systematic behavior.The obtained results lead to the conclusion that RMSEDEB decreases when the standard deviation of the observed data increases with the gap width since for larger standard deviations, the interval of the observed values used for debiasing is broader.Accordingly, in further applications of this technique, it will be important to consider that the standard deviations of measured air temperatures are a result of meteorological and other environmental (the presence and development of plants, high mountains, deserts, and large water bodies) conditions, however the variation in RMSEDEB with standard deviation is the result of the applied calculation methodology.In some exceptional situations, such as in Bahariya (DOY 216-228), the ERA5 deviation from measurements was reduced on a daily basis, producing a much better gap filling performance with the ERA5-only dataset.This could be linked to the observed seasonal shift in the variance and amplitude of the diurnal temperature range.The amplitude was high throughout the year, which is typical of a hot desert climate [34].However, the variance in the diurnal range was much higher for autumn and spring than for other seasons, leading to a significantly smaller bias for ERA5 during the JJA June-August period.Similar behavior has been observed for ERA-Interim and ERA-40 reanalyses [35].
Finally, we discuss the merit of the presented gap-filling technique in performing daily data series.Even though the goal of this study is to develop and test the technique for filling gaps in hourly temperature data series, the analysis is fully justified since users demands are commonly related to daily weather data.At all locations and for all gap scales, the maximum RMSEDEB calculated using daily data (Table 3) is smaller in comparison to results obtained using hourly data.A lower RMSEDEB of daily temperature time series for different gap lengths and locations can be noticed in Figures A6-A10 (left panels).However, differences between RMSEDEB and RMSEERA5 are less pronounced for daily data except in the case of Gumpenstein (Table 3), where the maximum deviation for daily data is the highest.Almost the same results for (RMSEDEB-RMSEERA5) are obtained In some exceptional situations, such as in Bahariya (DOY 216-228), the ERA5 deviation from measurements was reduced on a daily basis, producing a much better gap filling performance with the ERA5-only dataset.This could be linked to the observed seasonal shift in the variance and amplitude of the diurnal temperature range.The amplitude was high throughout the year, which is typical of a hot desert climate [34].However, the variance in the diurnal range was much higher for autumn and spring than for other seasons, leading to a significantly smaller bias for ERA5 during the JJA June-August period.Similar behavior has been observed for ERA-Interim and ERA-40 reanalyses [35].
Finally, we discuss the merit of the presented gap-filling technique in performing daily data series.Even though the goal of this study is to develop and test the technique for filling gaps in hourly temperature data series, the analysis is fully justified since users demands are commonly related to daily weather data.At all locations and for all gap scales, the maximum RMSE DEB calculated using daily data (Table 3) is smaller in comparison to results obtained using hourly data.A lower RMSE DEB of daily temperature time series for different gap lengths and locations can be noticed in Figures A6-A10 (left panels).However, differences between RMSE DEB and RMSE ERA5 are less pronounced for daily data except in the case of Gumpenstein (Table 3), where the maximum deviation for daily data is the highest.Almost the same results for (RMSE DEB -RMSE ERA5 ) are obtained using hourly and daily gap-filling data (Figure A7, right panel), which clearly indicates that there is systematic deviation of ERA5 data from measurements.Average RMSE DEB values obtained using hourly temperatures (Kikinda = 1.3 • C; Gumpenstein = 1.9 • C; Bahariya = 1.5 • C; Montecristo = 1.1 • C and Pianosa = 1.4 • C) are in the same range (1.5-2.0 • C) as those obtained by Chen et al. [32] for missing data values for daily temperature for the United States using a self-calibrating data quality control and which are described as "impressively low average".
There are many fully statistical self-calibrated estimation methods (for example, Chen et al. [36]) as well as spatial, temporal, and spatio-temporal methods [2] that work very well.However, extreme weather events and high daily amplitudes of air temperature will become the new climate norm for many places in the world in the future.It is difficult for statistical techniques to infer data far from observations points.Therefore, we developed and tested gap-filling methods, that include dynamical and statistical concepts.

Conclusions
The presented technique and testing results offer a new methodology for gap filling based on a combined dynamical-statistical methodology that differs from the common statistical methods widely reported in the literature.Results obtained indicate that in the case when measured data are not available, ERA5 data can be used for temperature gap filling.However, in cases when the ERA5 grid point significantly deviates from the observation point, particularly on high mountains, debiasing of ERA5 data based on observations is necessary in order to obtain useful temperature data.Testing in different landscapes (from desert to mountains) and world regions (from subtropical latitudes to midlatitudes) addresses region-specific uncertainties that should be taken into account in further applications of this technique.The next step will be testing the presented concept for soil temperature and air humidity data gaps.
The goal of this study is to enhance climate change and weather data-related studies which rely on long term data series.The complete procedure of technique application andprogram installation and running will be posted on the H2020 SERBIA FOR EXCELL project web page (www.serbiaforexcell.com)until the end of 2018.
Author Contributions: M.L. performed all coding and a major part of calculations, maps, plots, and error analysis, and contributed to writing; B.L.; analyzed requirements of AWS network gap filling, wrote a major part of the paper, and provided funds for open access publishing; L.D. designed solutions and contributed to calculation and writing; M.P. provided data for Bahariya station, contributed to calculation and writing.All authors contributed to the results discussion.

Figure 1 .
Figure 1.Altitude (m a.s.l.) and position of the Kikinda (Serbia) automated weather station (AWS).The mountain dataset was measured at the Gumpenstein synoptic weather station within the network of the Zentralanstalt für Meteorologie und Geodynamik (ZAMG; Central Institute for Meteorology and Geodynamics) (Vienna, Austria).The Gumpenstein weather station is located in the Austrian province of Styria at an altitude of 700 m a.s.l.(Figure 2).The temperature measurements were made at a height of 2m in a standard meteorological shelter.Selected data series were measured during the 2014−2017 period.For the purpose of the study, the 01 May−30 November 2017 period was selected.Atmosphere 2019, 10, x FOR PEER REVIEW 5 of 25

Figure 6 .
Figure 6.(a) Time series with a gap in temperature observations.The blue line represents observations, the dashed blue line represents hidden observations, the red line represents ERA5 values for the nearest grid point, and the green line represents the result of the debiasing process; Linear regression of learning data for one-time step in the gap following the standard equation: = + .

Figure 6 .
Figure 6.(a) Time series with a gap in temperature observations.The blue line represents observations, the dashed blue line represents hidden observations, the red line represents ERA5 values for the nearest grid point, and the green line represents the result of the debiasing process; (b) Linear regression of learning data for one-time step in the gap following the standard equation: OBS = k × ERA 5 + n.

Figure 8 .
Figure 8.Standard deviation of the observed data which are used for bias correction for DOY 60-273 in 2014 in Kikinda (Serbia).

Figure 8 .
Figure 8.Standard deviation of the observed data which are used for bias correction for DOY 60-273 in 2014 in Kikinda (Serbia).

Figure 8 .
Figure 8.Standard deviation of the observed data which are used for bias correction for DOY 60-273 in 2014 in Kikinda (Serbia).

Figure 10 .
Figure 10.Standard deviation of observed data which were used for bias correction for DOY 121-334 in 2017 in Gumpenstein (Austria).

Figure 10 .
Figure 10.Standard deviation of observed data which were used for bias correction for DOY 121-334 in 2017 in Gumpenstein (Austria).

Figure 10 .
Figure 10.Standard deviation of observed data which were used for bias correction for DOY 121-334 in 2017 in Gumpenstein (Austria).

Figure 12 .
Figure 12.Standard deviation of observed data which are used for bias correction for DOY 121-334 in 2017 in Bahariya (Egypt).

Figure 12 .
Figure 12.Standard deviation of observed data which are used for bias correction for DOY 121-334 in 2017 in Bahariya (Egypt).

Figure 12 .
Figure 12.Standard deviation of observed data which are used for bias correction for DOY 121-334 in 2017 in Bahariya (Egypt).

Figure 12 .
Figure 12.Standard deviation of observed data which are used for bias correction for DOY 121-334 in 2017 in Bahariya (Egypt).

Figure 14 .
Figure 14.Standard deviation of the observed data which were used for bias correction for the islands of Montecristo (left panel) and Pianosa (right panel) for the DOY 122-320 in 2016.

Figure 14 .
Figure 14.Standard deviation of the observed data which were used for bias correction for the islands of Montecristo (left panel) and Pianosa (right panel) for the DOY 122-320 in 2016.

Figure 14 .
Figure 14.Standard deviation of the observed data which were used for bias correction for the islands of Montecristo (left panel) and Pianosa (right panel) for the DOY 122-320 in 2016.

Figure 17 .
Figure 17.RMSEDEB for gap width = 1 and the correlation coefficient between the standard deviation of the observed data used for linear regression and RMSEDEB for Kikinda (Serbia; top left), Bahariya (Egypt; top right), Gumpenstein (Austria; middle), Montecristo (Italy; bottom left) and Pianosa (Italy; bottom right).

Figure 17 .
Figure 17.RMSE DEB for gap width = 1 and the correlation coefficient between the standard deviation of the observed data used for linear regression and RMSE DEB for Kikinda (Serbia; top left), Bahariya (Egypt; top right), Gumpenstein (Austria; middle), Montecristo (Italy; bottom left) and Pianosa (Italy; bottom right).

Funding:
The research work described in this paper was realized as a part of the project 'Studying climate change and its influence on the environment: impacts, adaptation and mitigation' (III43007) funded by Ministry of Education, Science and Technological Development of the Republic of Serbia.This paper is supported by the SERBIA FOR EXCELL project which has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no.691998.

Table 1 .
Locations used in the study (* see surface on ERA5).

Table 2 .
Mean bias, mean bias uncertainty, and mean RMSE for all gap widths in selected locations.

Table 2 .
Mean bias, mean bias uncertainty, and mean RMSE for all gap widths in selected locations.

Table 3 .
Maximum absolute values of (RMSE DEB -RMSE ERA5 ) difference and RMSE DEB calculated using hourly and daily data for gap filling.