Gap Filling and Quality Control Applied to Meteorological Variables Measured in the Northeast Region of Brazil

: In this work, we used the MICE (Multivariate Imputation by Chained Equations) technique to impute missing daily data from six meteorological variables (precipitation, temperature, relative humidity, atmospheric pressure, wind speed and insolation) from 96 stations located in the northeast region of Brazil (NEB) for the period from 1961 to 2014. We then applied tests with a quality control system (QCS) developed for the detection, correction and possible replacement of suspicious data. Both the applied gap filling technique and the QCS showed that it was possible to solve two of the biggest problems found in time series of daily data measured in meteorological stations: the generation of plausible values for each variable of interest, in order to remedy the ab-sence of observations, and how to detect and allow proper correction of suspicious values arising from observations.


Introduction
Observational meteorological data are basic elements for climatological analyses [1]. In Brazil, despite the inherent relevance of such observations, the amount of data of this kind has been suffering a significant reduction over the years [2], with several manual weather stations (MWS) being permanently closed, becoming inoperative or functioning precariously, compromising the quality and continuity of the meteorological records.
This situation poses a hindrance to a more detailed, observation-based climate analysis, forcing the use of Reanalyses products, which constitute a synthetic database reconstructed by calibrating a regional climate model to observed historical conditions, in a grid format, with statistical properties, such as means and variances, that are very similar to those of the observations [3]. However, although useful for analysing long-term climate trends and studying future climate change scenarios [4], they are not reliable in terms of extreme events [5].
To overcome such an issue, especially in the case of precipitation data, several methods have been proposed to create gridded products that provide a standardization of the properties of the variable across a spatial field, thus surmounting problems related to sparse and non-uniform rainfall coverage. Some of those methods are based on exploring the surface observations to the maximum [6][7][8][9]; others are based on combining observed rainfall data with estimates from remote sensing [10][11][12][13][14][15][16][17].
In spite of all these efforts, a fine-grained surface observation database is essential, whether to aid in composing such grids or to correct and validate the approaches derived from remote sensing techniques. A surface database is not just about collecting and storing observations; in order to achieve a desirable level of confidence, it needs to undergo three basic steps of processing: quality control checks, gap filling and homogenization. With respect to the daily data, for which accessibility is still very restricted, particularly rigorous Quality Control Systems (QCS) are essential [18,19].
A QCS must combine people and machines. Although QCS software is designed to provide a list of suspicious data, the final decision regarding this should be made by qualified personnel. A QCS should not be expected to detect a suspicious value and automatically remove it from the series; this must be very carefully assessed in order to avoid losses of real extreme values in the time series [20,21]. Therefore, the QCS needs to be based on a variety of consistency tests and provide understandable graphical outputs, as well as summaries containing the list of suspicious data, in order to facilitate decisionmaking and prevent the rejection of good observations [22,23].
Another problem faced by those who work with time series of meteorological variables is the amount of missing data. Studies on trends and extremes that are based on climate indices recommended by the Expert Team on Climate Change Detection and Indices (ETCCDI) of the World Meteorological Organization (WMO), developed by [24], require flawless daily data [5,25]. In this regard, the aim of this study was to present the results of the application of a QCS and a gap filling technique to time series of meteorological variables collected by manual weather stations in the northeast region of Brazil (NEB) in the period of 1961-2014, corresponding to 54 years of daily data, which were assessed here based on this timescale, as well as on 10-day and monthly averages or accumulated values (depending on the variable). The NEB, which is the most populated dryland region on the planet [26], has a singular biome associated with its semi-arid climate, the Caatinga, and is extremely vulnerable to climatic extremes, especially droughts [27,28]. To fill gaps in the daily data, the technique known as Multivariate Imputation by Chained Equations (MICE) was used, a multiple imputation method that has a number of advantages over other methods for dealing with missing data in time series [29][30][31][32].

Area of Study and Data
The NEB encloses an area of approximately 1.56 million km 2 , which is equal to 18% of the Brazilian territory, encompassing nine of its states ( Figure 1): Alagoas (AL), Piauí (PI), Maranhão (MA), Ceará (CE), Rio Grande do Norte (RN), Paraiba (PB), Pernambuco (PE), Sergipe (SE) and Bahia (BA). The Caatinga, which covers more than 750,000 km 2 of the NEB, exhibits, primarily, xerophytic, arboreal, prickly and deciduous properties. The east coast of the NEB, between RN and BA, hosts an exuberant tropical forest called the Mata Atlântica. The transition zone, of mixed vegetation, between the Mata Atlântica and the Caatinga, is called Agreste. In the westernmost portion, covering most of MA, the predominant vegetation is the Amazon forest, whereas in southern MA and PI, as well as in western BA, the Cerrado vegetation (the Brazilian Savannah) prevails [27,28].
The data come from 96 manual weather stations spatially spread over the NEB, representing part of the National Institute of Meteorology (INMET) station network ( Figure  1, blue dots). The period of each series spans from 1 January 1961 to 31 December 2014 [5]. Gap filling and QCS were applied to precipitation (mm), temperature (°C ), relative humidity (%), wind speed (m/s), station-level atmospheric pressure (hPa) and insolation (hours). and South America (bottom right). In the left panel, the blue points represent INMET's weather stations. The climate subregions that are referred to throughout this study are graphically presented in the left panel encompassed by rectangles with the following colour associations: northern NEB, red; northwestern NEB, pink; northeastern NEB, yellow; eastern NEB, blue; southern NEB, green; southwestern NEB, black. Extracted from [28].

Filling in Missing Data
The application of a methodology to fill the gaps in the daily data took place before the evaluation performed by the QCS. The large number of observed gaps was addressed using the MICE technique. There are several fields of the natural sciences in which the MICE technique has been successfully applied [30], including, most notably, biostatistics. In this research, it was adapted to missing data in the time series of meteorological variables across the NEB, following the methodology of [33].
For each variable, MICE creates multiple complete datasets based on a variety of methods, which may include linear regressions, logistic regressions, multinomial log-linear models or Poisson regressions. These models have in common the ability to impute the missing data from the known observed values and their relationships with each dataset. Several forecasts are thus created for each missing value, and the one that produces the least uncertainty and fewest errors, when compared with the observed data, is adopted [34].
Of the two modern approaches for multivariate imputation, i.e., joint modelling and fully conditional specification, the MICE technique is of the latter kind [30]. MICE is developed in R language and is made available as a package. A basic description of its approach is presented below.
Let Yj with (j = 1,...,p) be a set of p incomplete variables, where Y = (Y1,...,Yp). The observed and missing sections of Yj are denoted as Yj obs and Yj missing , respectively, so that Y obs = (Y1 obs ,...,Yp obs ) and Y missing = (Y1 missing ,...,Yp missing ) represent the observed and missing data of Y. The number of imputations must always be equal to m ≥ 1. The imputed dataset hth is given by Y (h) , where h = 1,...,m. Now, let Y-j = (Y1,...,Yj−1, Yj+1,...,Yp) be the collection of variables p − 1 in Y with the exception of Yj. Let Q be the amount of missing data to model. In practice, Q is often a multivariate vector representing any model aimed at the imputation of missing data. Figure 2 illustrates the three main steps in multiple imputation: imputation, analysis and clustering. The software stores the results of each step in specific classes, called mids, mira and mipo, explained in detail below.
The leftmost side in Figure 2 indicates that the analysis starts with a set of observed Yobs data. The problem is that it is not possible to estimate Q from Yobs without making unrealistic assumptions about the unobserved data. Therefore, missing data are randomly generated for the Yobs dataset, and several versions of multiple imputation are generated, with plausible values, according to the nature of the variable, extracted from a distribution specifically modelled for each imputed value for the respective missing value.
In MICE, this task is performed by the mice() function. Figure 2 depicts m = 3 imputed data, Y(1),...,Y (3). The three imputed sets are identical to the non-missing input datasets regarding their type of distribution. The magnitude of these differences reflects the uncertainty about the values to be imputed. The second step is to estimate Q in each imputed dataset, just as in a flawless dataset. This becomes easy, since all the sets are complete. The model applied to Y (1) ,...,Y (m) is generally identical. The estimates Q' (1) ,...,Q' (m) are different from each other.
The third and last step is to gather the m estimates Q' (1) ,...,Q' (m) into a single Qmean estimate and estimate its variance. For Q quantities that are normally distributed, it is possible to calculate the average of Q' (1) ,...,Q' (m) and then add it to the variance of Qmean, according to the method described in [35]. The ideal is to apply this methodology to a column of data containing gaps alongside columns of similar data that do not contain missing values, called predictors, as the relationship established between the datasets will tend to improve the estimates of the data to be imputed to the column that presents missing data [33].
In the adaptation of the MICE technique to fill gaps in precipitation, the series of the gridded precipitation analysis from the Climate Prediction Center of the National Oceanic and Atmospheric Administration (CPC/NOAA) were used as sets of predictors [8,36], based on [37] optimal interpolation method, with a spatial resolution of 0.5° × 0.5°. For the other variables, series of gridded analyses provided by the NCEP/NCAR (National Center for Environmental Prediction/National Center for Atmospheric Research), with 1.0° × 1.0° resolution, were used as predictor variables [3]. In order to avoid the influence of dry periods on rainy periods of the year, and vice versa, we applied MICE to independent files organized month by month, that is, to 12 independent files, from January to December, for each variable.
In time series that have missing data, these are characterised by "NA", using the default number of multiple imputations (m = 5 iterations) of the MICE package, version 2.12, of the free statistical software R. The imputations are generated according to the default method, which is, for numerical data, the PMM (Predictive Mean Matching) method. Using precipitation data to exemplify the procedure, the original faulty series of a weather station is placed side by side with data from the four grid points closest to the location (Table 1). Table 1. Precipitation series with missing data, represented by NA, from the Ouricuri station (WMO code: 82753)-Pernambuco (a) and after the gaps have been filled (imputed values in red; b). The original data of the station (Orig column) are followed by the nearest gridded series, which constitute the set of predictors (G-01, G-02, G-03 and G-04).  After imputing the missing data, at least 5 years without gaps were identified in the original series of observed data. Gaps were then artificially generated for those years and the method was used again, in order to compare the original observed data with the imputed values that replaced them. This allowed us to assess the method's ability. Statistics such as correlations and root mean square error (RMSE) were calculated for these verification periods to validate the methodology. These verification metrics were used for three timescales, namely, daily, 10-day and monthly (the latter two consisted of averages or accumulated values over the period, depending on the variable). The RMSE was selected as the "MICE" dexterity estimation measure because it has, among other advantages, the possibility of expressing the accuracy of the numerical results with error values in the same dimensions as the analysed variables, that is, millimetres for precipitation, degrees Celsius for temperature, percentage for relative humidity, hectoPascals for atmospheric pressure, metres per second for average wind speed and hours for insolation. The daily scale is important for analysing climate extreme indices [38][39][40][41][42][43]; the 10-day scale is important for application in agrometeorological studies, as this is the time step used in many crop growth simulation models [44][45][46], whereas the monthly scale is essential for studies that involve analyses of the influence of modes of variability on climate dynamics and also for research in the area of seasonal and subseasonal climate forecasts [47][48][49][50][51].

QCS
Some QCS techniques have a strong stochastic component that can lead to a high probability of rejecting good observations [20,21,37]. The QCS used here is based on a series of consistency tests, in order to reduce the stochastic dependence and allow the decision-maker to accept or reject data considered doubtful, based on easy-to-understand graphical outputs and reports containing the list of suspicious data.
Consistency tests are an important set of checks for possible errors, as they are expected to explore the temporal and spatial interrelationship of climatological data. The three main kinds of consistency checks are internal, temporal and spatial: • Internal consistency tests express the physical relationships among different climatological elements. In some cases, they are logical tests based on the following premise: if a certain element exists in a given interval, another must also be contained in another interval [52].

•
Temporal consistency tests are based on the persistence over time of climatological elements. Certain selected change thresholds depend on the variable in question, the period of the year and the climatic region to which the elements of the time series belong [53]. • Spatial consistency tests explore the smooth spatial variation of climatological variables. Generally, this type of test involves the estimation of a certain element based on neighbouring observations in the same climatic region [54]. The accepted limit of differences will depend on the type of variable, the climatic region and the distance between the seasons. Therefore, the effectiveness of this type of test will depend on the availability of neighbouring stations [55].
The QCS used in this study is based on a series of tests, called Test Groups. The flowchart shown in Figure 3 details the step-by-step procedure through which all the datasets were analysed. Every column with information was carefully studied, from the column that contains the station's WMO identification code, to the columns that contain the data collection dates and the respective values of the meteorological variables.
An important step is the creation of a file called metadata, which contains the station's basic information: the WMO identification code; the station's name; its longitude, latitude and altitude; the country and state to which it belongs; the start and end dates of its operations; the institution to which it belongs and the type of station, whether manual or automatic. In the QCS pre-processing step, this information contained in the metadata was read and served as the basis for some of the general tests shown in Figure 3.
In the final step of verifying and correcting doubtful data, in order to abbreviate the verification process, a routine identified cases of doubtful data that occurred in more than one test. If a given value was classified as suspicious in more than one of the QCS tests, then it was considered as incorrect data and did not need to be evaluated by the specialist, being summarily sent to a correction process.

Gap Filling
A high percentage of missing data was observed for the analysed variables. In individual terms, the station with the highest number of gaps had 54% missing data, and the lowest percentage found among the stations was 17%. In relation to a specific variable, the highest prevalence of gaps was found to be 62% and the lowest was 13%. Overall, the average percentage of missing data across all stations was 38%.
Gaps can occur in any segment of the series, so the imputation algorithm assigns "plausible" synthetic values according to the reference data, the predictors. In this way, MICE keeps the original data in the series, only filling the gaps. The filling is processed on the daily scale, then 5 years of observed data are randomly chosen and have their data removed, and then MICE is applied again, imputing values that are later compared with real observations; this is the validation process. For the filling performed on the daily scale, the correlations ranged from 0.4 to 0.8 for temperature and the RMSE ranged from 0.9 to 1.9 °C ; relative humidity showed correlations from 0.5 to 0.8 and a RMSE from 6.7 to 14.6%; atmospheric pressure exhibited correlations from 0.3 to 0.8 and a RMSE from 1 to 5 hPa; the average daily wind speed showed correlations between 0.2 and 0.7 and a RMSE between 0.8 and 1.9 m/s; for precipitation, correlations ranged from 0.5 to 0.9 and the RMSE ranged from 4 to 12 mm; insolation presented correlations from 0.1 to 0.7 and the RMSE ranged from 3 to 5 h. Figure 4, in its upper panel, shows the daily (4a), 10-day (4b) and monthly (4c) correlations between the observed and imputed rainfall with respect to the aforementioned validation process. Correlations tended to increase with the accumulation interval [56]. The lower panel exhibits the RMSE, which also increased gradually for precipitation, ranging from 4 to 17 mm for daily values, from 12 to 37 mm for 10-day accumulated precipitation and from 22 to 72 mm for the monthly accumulated values. This behaviour is expected for precipitation, due to its accumulative nature, with similar results being observed in [4]. For all analysed variables, correlations increased with larger accumulation intervals, with the opposite being expected for RMSE [57]. This was observed for all variables whose 10-day and monthly values are the averages of these periods, the only exception being precipitation, for which the 10-day and monthly values are the result of the sum of the daily precipitations over these intervals. In this specific case, the RMSE rose in a proportional fashion, building from the highest observed values of the daily rainfall accumulation.
The results of the application of the same statistical techniques (correlation and RMSE) for the temperature variable are shown in Figure 5. Correlations gradually increased from the daily to the monthly timescales, exceeding 0.9 in some areas in central-south BA. Unlike precipitation, for which it was expected that errors would grow proportionally to the period of accumulation, for temperature, a reduction in the errors was expected, which is noticeable here in Figure 5d-f, in which there are areas with a maximum RMSE of almost 2 °C on the daily scale, dropping to 1.2 and 1.4 °C on the 10-day and monthly scales, with a minimum RMSE around 0.5 °C . For relative humidity ( Figure 6), a similar behaviour to that of temperature was observed for both the correlations (Figure 6a-c) and the RMSE (Figure 6d-f). The area exhibiting the highest correlations is mostly located in the northern NEB, associated with the lowest RMSE values. In the centre-south of BA, the lowest correlations were observed and, in the west of BA, the highest RMSE values occurred, reaching up to 15% for the daily scale. For atmospheric pressure (Figure 7), moderate correlations were observed on the daily scale (Figure 7a In Figure 9, we show the results of the validation procedure applied to daily insolation data (Figure 9a A flawless and reliable observed database is essential for assessing the accuracy of climate change scenario model estimates, such as verifying monthly average temperature estimates [58], allowing for more accurate regionalized data using different methods of calibration [59]. The successful results obtained here in validation tests corroborate previous findings that regarded MICE as a robust gap filling technique with potential for application to different climatic variables. In recent years, the use of MICE as an efficient alternative for imputing missing data has been growing: [60] used the technique to impute missing solar radiation data under different atmospheric conditions, and [61] used the MICE for multivariate imputation and prediction of missing wind speed data on the decadal scale, whereas [32] showed that filling gaps in daily precipitation data from homogeneous climatic regions in Brazil with MICE was superior to kriging and ordinary cokriging methods [62]. The technique is widely used by healthcare professionals and researchers; in this context, [31] demonstrated its relevance and evolution in terms of available programs and software, which have facilitated its use, for example, by researchers in the field of psychiatry. At a time of continuous worldwide interest in past, present and future climatic conditions, our results, demonstrating the effectiveness of the technique, provide significant assistance in its popularization of its use in climate science.       Figure 10 shows an example of filling gaps in daily rainfall data for a station located in the interior of the state of CE. The time series is divided into five 10-year groups, from 1961 to 2010. The original data distribution is shown on the left, with the missing values highlighted in red along the horizontal axis, whereas the right-hand panel exhibits the entire series composed of the preserved original data with the addition of imputed values from the filling technique. This is a good example that shows one of the main types of problems that arise during the analysis of a time series: long periods without any kind of information, as in the first 2 years of the series, 1961 and 1962, as well as between 1971 and 1972, and the large period with no observations from 1985 to 1994. Sparse short gaps are observed in other segments of the series. Obtaining complete series, after a reliable and validated filling scheme, is essential for, among other kinds of research, analysing the temporal evolution of extreme indices. The study of [5] used these complete series to verify the trends and tipping points associated with extreme precipitation and temperature indices in the NEB in order to fill gaps in studies that, due to an excessive occurrence of this issue, had to resort to fewer time series for analysing extremes in the NEB or were limited to the study of specific areas across the region [33,[39][40][41]43,63,64]. Originally missing data are indicated by the red colour, while blue implies observed data and violet indicates imputed data.

QCS
After filling in the gaps, we present examples of results concerning the implementation of a QCS, in order to provide, for the NEB, daily data that are reliable and unburdened by missing information. The QCS is based on a series of tests, or Test Groups, following the flowchart presented in Figure 3. The examples are for a station bearing the WMO code of 82979, located in a municipality in the state of BA called Remanso, with the following geographic coordinates: −9.63° latitude, −42.10° longitude and an altitude of 400.5 m. The average percentage of filled gaps for this station was 30%.
The first test, belonging to the group of general tests, checks if all the data from a station are associated with a single identification code (WMO code), that is, it certifies that there was no contamination by data from another station. The second group of tests, called fixed limit tests, is applied to precipitation, temperature, relative humidity and insolation. This test establishes the lower and upper limits for reasonable values, generally chosen according to historical records relative to a reference climate normal. In this case, we used the climate normals of Brazil for the period of 1961-1990 [65,66]. For this station, the limits for temperature were 10 °C and 41 °C (the minimum historical value recorded for daily minimum temperature and the maximum recorded value for daily maximum temperature, respectively). For this test, some errors were detected (Table 2) for atmospheric pressure (fixed limits of 955-975 hPa) and for insolation (fixed limits of 0-12 h). Variable limit tests, which are part of the third group of tests, identify "extreme" residuals relative to a seasonal cycle, adjusted for the considered variable. Two percentile thresholds are defined (lower and higher), for example, 0.01 (i.e., the 1% percentile) and 0.99 (i.e., the 99% percentile). Extreme percentiles can be calculated with respect to each month or to the whole time series. Residuals less than the lower percentile and greater than the upper percentile are considered extremes. Taking the maximum temperature data from the Remanso station as an example, we found that, for the month of January, the 1% and 99% percentiles are, respectively, the values 25.7 °C and 36.6 °C . If we take all the values of the series, with no monthly distinction, the values of the 1% and 99% percentiles correspond, respectively, to 26.3 °C and 37.2 °C . Therefore, values that exceed these thresholds will be considered extreme and doubtful, and will be reported in a printed output to be analysed manually, as shown in Table 3.
The result of this test is also graphically presented in a box-plot graph. Figure 11 shows these results for relative humidity, maximum temperature, average temperature and minimum temperature, where the red dots refer to values that exceeded these variable limits based on the 1% and 99% percentiles. For precipitation, it was estimated that, for each month, the 95%, 97.5% and 99% percentiles would be used as limits to identify potential extremes of daily rainfall. These percentiles were calculated using the parameters of the gamma distribution adjusted for each month of the year. The fourth group of tests, called temporal continuity tests, investigates two very common problems in daily datasets from manual weather stations: sequences of repeated values, and unjustified extremely high day-to-day discrepancies or "jumps" that occur in the data. The function of detecting sequences of repeated values is intended to find sequences from a defined limit, for example, from 3 days. To detect jumps or extreme discontinuities between values of a variable for consecutive days, a series was assembled with all the daily differences: the difference between 2 January 1961 and 1 January 1961, then the difference between 3 January 1961 and 2 January 1961, and so on. With these series of differences, the percentiles (e.g., 95%, 99% and 99.5%) of the absolute values of the differences were calculated, which were used as limits to define extreme jumps. If the absolute value (that is, a positive or negative difference) was greater than the percentile used as a threshold, this jump was identified as extreme. In this study, the 99.5% percentile was used as the threshold.
The output of this test is printed and plotted, comprising the day, the variable and the doubtful value. Figure 12 shows these results, in two graphical forms, for relative humidity, maximum temperature, average temperature and minimum temperature, where the dots and/or red lines correspond to values that did not respect the limit of the 99.5% percentile of the differences and were therefore characterized as suspicious data. Figure 12. Results of the moving limits test for the 1% and 99% percentiles for relative humidity, and maximum, average and minimum temperatures. Red dots indicate values that exceeded these thresholds.
In the fifth group of tests, the one concerning consistency tests between variables, the consistency is checked relative to the minimum, average and maximum temperatures. This test starts with the condition that the minimum temperatures must not be higher than the maximum temperatures, and that the average temperatures must lie between the daily minimum and maximum temperatures. Both printed and graphical outputs are generated. Suspicious data are analysed and then corrected, when the database has already incorporated the proper correction or rejected it, in which case the test is rerun until the imputed values satisfy the test conditions. Figure 13 shows the expected ideal condition: the left panel shows that all the maximum temperature values were higher than the average temperature, and the right panel shows that all average temperature values in the series were higher than the minimum temperature values. Figure 13. Results of the first consistency test between variables. The figure on the left shows that no average temperature value was higher than the daily maximum temperature, and the figure on the right shows that no average temperature value was lower than the daily minimum temperature. If any of the conditions failed due to suspicious data, these values would be identified as red dots to the left or to the right of the diagonal lines in the graphs.
The daily average temperature obtained for a weather station is called the compensated average temperature, as estimated by the following equation:

TM = [T(12 h) + 2 × T(24 h) + TMX + TN]/5
where TM is the average daily compensated temperature, T(12 h) is the temperature observed at 12 h UTC, T(24 h) is the temperature observed at 24 h UTC, TX is the daily maximum temperature and TN is the daily minimum temperature.
Therefore, after checking with the first test, another way to identify any problems in the daily data relative to average temperatures is to compare the daily value directly with the daily average between the maximum temperature and the minimum temperature. For the execution of this test, a series of differences between the station's daily average temperature and the average obtained between the daily maximum and minimum temperatures is assembled. From this series of differences, the 99% percentile is calculated, which is the maximum tolerance threshold for the difference between the daily average temperature and the daily average between the maximum and minimum temperatures. The output of this test is both in printed and graphical forms. Figure 14 shows the results of this test. Figure 14. Result of the consistency test for the average daily temperature. The red dots represent values for which the difference between the daily average temperature and the average between maximum and minimum temperatures exceeded the 99% percentile.

Conclusions
In this study, the potential of the MICE technique to fill gaps in daily data from time series of meteorological variables collected from multiple MWS over the NEB was presented. The completed data were validated against observations, through correlations and RMSE, on three timescales: daily, 10-day and monthly. For all the variables, correlations increased with the number of days over which the accumulated values (of precipitation) or the average (for the other variables) were calculated.
Precipitation, followed by temperature, relative humidity and atmospheric pressure, were the variables for which the highest correlations were observed among all the compared temporal scales. On the daily scale, wind speed presented moderate correlations, and insolation showed weak correlations. However, the increase in correlations was significant, for all variables, in the 10-day and monthly comparisons. As expected for an accumulative variable, the errors increased with the period of accumulated precipitation, whereas, for the other variables, the errors become gradually smaller on the 10-day and monthly scales.
The QCS is composed of strict criteria (specific tests) for identifying, in an automated manner, doubtful imputed data, although it allows adjustments by expert users, according to their knowledge of the local climate. Suspicious data can be kept if other tests and verifications allow them, such as in the case of spatial consistency tests that facilitate the comparison of similar suspicious occurrences at nearby stations. Otherwise, if more than one test indicates data inconsistency and the spatial analysis does not indicate close similarities, the doubtful value can be eliminated from the series, which will undergo as many filling procedures as necessary until a plausible synthetic value can occupy the place of the rejected data.
These results showed the efficiency of the technique for filling time series of meteorological variables, as well as that of the QCS. In the case of precipitation and temperature, both the filled and control/comparison datasets from this research were successfully used in studies of analyses of climatic extremes indices [5], and for statistical downscaling of regionalized climate change scenarios [4,45,59]. In the field of seasonal and subseasonal climate forecasting, these series will compose a database of surface observations for the calibration and verification of the Brazilian Global Atmospheric Model (BAM) [67], which is the atmospheric module of the Brazilian Earth System Model (BESM), aiming to achieve a hybrid dynamic-statistic coupling for the observed surface data and to perform adjustments in the BAM's seasonal forecasting for the NEB.