Subdaily Rainfall Estimation through Daily Rainfall Downscaling Using Random Forests in Spain

Subdaily rainfall data, though essential for applications in many fields, is not as readily available as daily rainfall data. In this work, regression approaches that use atmospheric data and daily rainfall statistics as predictors are evaluated to downscale daily-to-subdaily rainfall statistics on more than 700 hourly rain gauges in Spain. We propose a new approach based on machine learning techniques that improves the downscaling skill of previous methodologies. Results are grouped by climate types (following the Köppen–Geiger classification) to investigate possible missing explanatory variables in the analysis. The methodology is then used to improve the ability of Poisson cluster models to simulate hourly rainfall series that mimic the statistical behavior of the observed ones. This approach can be applied for the study of extreme events and for daily-to-subdaily precipitation disaggregation in any location of Spain where daily rainfall data are available.


Introduction
Rainfall data at hourly, or even smaller, temporal scales are essential for applications in many fields, such as urban hydrology, infrastructure design, and risk assessment [1][2][3][4][5][6].Nevertheless, hourly, or higher resolution rainfall data tend to have lower temporal and spatial coverage due to their expensive instrumental cost and technical limitations [7].
In addition to the information provided by pluviometers, there are other sources of information derived from satellites [8], numerical models [9], or radars [10], which also provide rainfall data at temporal aggregations shorter than daily.However, these sources of information tend to have large biases [11] and should not be directly interpreted as point records, since the results are from spatial averaging over fixed areas (pixels) [12].
Due to this limitation, several disaggregation methods have been proposed to generate higher time resolution data from coarse time resolution pluviometer records [7,[13][14][15][16][17][18].The purpose of such techniques is to simulate a range of possible disaggregation scenarios that could occur at finer time-scales, under the assumption that some important statistical features of the precipitation process would be preserved [19].Major disaggregation methods include: (1) Poisson cluster models based on point process theory [20,21]; (2) random cascade models based in scale-invariance theory [22]; and (3) nonparametric resampling models based on methods of fragments (MOF) [23].
The performance of these methods was evaluated for daily-to-hourly precipitation disaggregation in some studies [7,24].The results of these studies indicate that generally, all models reproduce appropriately the average rainfall and the proportion of dry intervals.However, while MOF and Poisson cluster models preserve variance, extreme rainfall values, and wet and dry spells, random cascade models present serious problems doing this.Nevertheless, it is difficult to draw a general conclusion on which method is the best, since rainfall properties significantly differ from one climate to another.
It is important to note that while random cascade and MOF models are used exclusively as "disaggregators", Poisson cluster models may also be used as synthetic generators able to simulate unrecorded subdaily precipitation patterns.This makes Poisson cluster models an interesting tool for extreme studies, constraining the uncertainty associated with the length of the precipitation series [25].Another important difference between these models is that, while random cascade and MOF require the entire precipitation series for the disaggregation process, Poisson cluster models only require some relations between daily and subdaily rainfall statistics, which makes them more advantageous in several applications such as climate change studies [26].
The main constraint of these studies and methods is that generally, they are evaluated including subdaily data from the gauge itself in the calibration process.However, in most hydrological and hydraulic projects, only daily data are available.To overcome this limitation, some authors proposed alternative approaches.Cowpertwait et al. [27] presented a methodology based on determining regressions between daily and subdaily statistics, where the latter exist, and using these regressions to inform the fitting procedure of the Poisson cluster model where subdaily data are not available.They realized that the relations found between daily and subdaily statistics differed from the northern to the southern United Kingdom (U.K.).
Marani and Zanetti [28] followed a different approach that uses scaling laws, analytical expressions that encapsulate the change that statistical properties of a variable undergo when studied at different scales [29,30].They applied scaling laws to derive subdaily rainfall statistics from daily observations, which then inform the Poisson cluster models.Their analysis showed that the best fitting analytical forms, as well as their parameters, were highly dependent on the spatial location and climate of the study area, thus limiting the generalization of the results.
Beuchat et al. [31] extended the work by Marani and Zanetti [28] to more than 200 stations distributed along the United States of America (USA), Switzerland, and the U.K. and showed that there is not a unique scaling law that best fits observations in any location.Indeed, Beuchat et al. [31] obtained better results following an approach similar to the one presented by Cowpertwait et al. [27], in which regressions inform the Poisson cluster models.In these regressions, atmospheric variables and daily rainfall statistics are predictors and subdaily rainfall statistics predictands.In spite of improving the results, Beuchat et al. [31] concluded that additional analysis was required to extend their results to climate zones not covered by their study.
The main contribution of this paper is to develop a robust methodology to generate hourly rainfall series in those locations of Spain where only daily rainfall data are available.To do so, we validate and extend the methodology proposed by Beuchat et al. [31] covering the wide range of climates present in Spain [32].The methodology, based on predicting subdaily rainfall statistics using atmospheric data and daily rainfall series as predictors, is calibrated and validated with more than 700 rainfall stations with hourly records.Machine learning techniques are used to improve the skill of the regressors.Results are grouped by climate types (following the Köppen-Geiger classification) to investigate possible missing explanatory variables in the analysis.Finally, the predicted subdaily rainfall statistics are used to feed Poisson cluster models, in fifteen case studies across Spain, in order to improve their skill to simulate hourly rainfall series.
The present study concludes that when Poisson cluster models are fitted using the statistics derived with our methodology, they are able to generate synthetic rainfall series that better reproduce the observed hourly statistics and the frequency of extreme events, compared to those only fitted with daily rainfall records.

Study Area and Information Sources
Spain has been selected as the study area for two main reasons: first, because it covers a wide range of climate types (see Figure 1) with different rainfall characteristics in each of them [32]; this climatic diversity makes Spain an interesting location to test the skill of downscaling methods; and second, because there is a very dense hourly rain-gauge network.This makes it possible to validate the methods over a homogeneous dataset that covers a wide range of climate types.The climatic diversity in Spain includes semi-arid climates in the Mediterranean region, dominated by convective processes that induce extreme floods; temperate climates affected by frequent Atlantic storms inducing frontal precipitation; continental climates, in the interior of the Peninsula, with strong seasonality and strong rainfall events; and even unique climates like the one found in the Canary Islands [33][34][35], highly influenced by the abrupt local orography, the Canary ocean current, and their proximity to the African continent.
Spain contains four of the main five climatic groups of Köppen-Geiger's climate classification [36]: B (dry), C (temperate), D (continental), and E (polar).The classification for Spain has been developed by the Spanish National Climate Authority [32] with observations covering the period 1971-2000 (see Figure 1).The only group missing is Group A (tropical), which has no representation in the territory.Climatic subgroups Csc and ET, although present in small areas of the territory, around Mount Teide, and in the highest parts of the Pyrenees, do not contain any rainfall gauges with hourly data; therefore, the methods cannot be validated in these two subgroups.Similarly, the mountain climates Dsb, Dsc, Dfb, and Dfc contain very few stations (between one and three), and thus, will be combined into a unique D climate group for the analysis.
Rainfall data have been obtained from the organizations listed in Table 1.These gauges were quality controlled and selected on the basis of different quality criteria.As a first filter, we selected those stations with at least ten or more full years during the period 1990-2015.We defined a full year as one that has more than 24 days without missing values in every month of that year.We tested the selected stations for homogeneity, both absolute and relative [37], resulting in a total of 735 hourly gauges.Rainfall statistics used in the present article are available at https://osf.io/2vhn9/?view_ only=7fb00bb7356443a39743520ec1b80a56. Figure 1 shows the location of the rainfall stations used in the study, as well as the spatial distribution of Köppen-Geiger climate types in Spain.The left-hand panel shows the spatial distribution of the 735 rainfall stations with hourly records (black stars) used to calibrate and validate the methodology proposed in this work.The right-hand panel shows the spatial distribution of Köppen-Geiger climate types as developed by AEMET [32].
In addition to rainfall data, the analyzed methodology requires atmospheric variables as predictors in order to improve the performance of the regression models.Atmospheric variables inform regression models to the relation between rainfall statistics at different temporal scales in each area of Spain.
Atmospheric predictors have been obtained from two different reanalysis databases to evaluate the sensitivity of the methods: (1) the NCEP reanalysis database [48], with a spatial resolution of 2.5 • degrees and monthly temporal resolution, and (2), the NCEP Climate Forecast System Reanalysis (CFSR) database [49], developed by the National Oceanic and Atmospheric Administration (NOAA), with a spatial resolution of 0.312 • degrees and hourly temporal resolution.Monthly averages and variances of surface air temperature (TAS), relative humidity at the 850 hPa level (HUR), sea-level pressure (SLP), zonal wind speed component at the 500-hPa level (Uwnd), and meridional wind speed component at the 500-hPa level (Vwnd), have been used to feed the regression models.

Beuchat's Model
The methodology followed in the present article is based on the model proposed by Beuchat et al. [31].This model relies on regressions of rainfall statistics between daily and subdaily data.Monthly atmospheric predictors from reanalysis and elevation were used as external covariates.Regression relations were developed for variance, skewness, and proportion of dry intervals.A brief description of the method proposed by Beuchat et al. [31] is given below.
A regression model was adjusted for each set of statistics and temporal aggregations (1 h, 3 h, 6 h, and 12 h).Supradaily statistics at 1 day, 2 days, and 3 days of temporal aggregation were used as predictors.This processing would yield 12 individual regression models (three statistics times four temporal aggregations).Mean, variance, and skewness were log-transformed to ensure predictions always provided positive results.The proportion of dry intervals was logit-transformed to ensure it always lied in the interval [0, 1].To cope with seasonal non-stationarities, statistics were calculated for each month of the year.Thus, before estimating each set of statistics, each time series was split into 12 monthly subseries.Regressions were computed using multivariate adaptive regression splines (MARS) [50,51].MARS combines predictors in a basis of linear splines, which allows discovering breakpoints in the relations between predictors and predictands.The fitting procedure was carried out stepwisely; each step introduced a new degree of interaction between predictors.During the first step, no interaction between predictors was allowed.The generalized cross-validation procedure (GCV) [52] was applied to retain the most significant predictors.During the second step, second order interactions were allowed between predictors, and the GCV was used again to retain the most significant terms.The procedure was repeated with third and higher order terms until new terms were completely discarded by the GCV procedure.In contrast to the other statistics, lag-one autocorrelations (ρ h ) were derived from the MARS model for variance, making use of the following two equations: where Y 1 h and Y 2 h are the rainfall quantities for two adjacent, non-overlapping h-hour intervals.

RFB Model
We propose some changes over the model presented in Beuchat et al. [31] to improve the skill of the downscaling technique and also in order to simplify the model for future application.
The first change was the use of machine learning techniques for the regression.Machine learning techniques are able to find non-linear relations between predictors and predictands; they do not need external mathematical expressions [53]; and they do not require a multi-step fitting procedure to ensure high accuracies.Random forests (RF) [54] were used as the machine learning regressor.RF has been used and evaluated in several water resources studies [55][56][57].An advantage of RF is that it does not require any log or logit transformation of the input to constrain the predicted results, simplifying the application of the method.Two hyperparameters were analyzed for optimization: N estimators, the total number of trees in the forest, and min samples leaf, the minimum number of samples required to be at a leaf node.
The second change refers to the way the lag-one autocorrelation coefficient (ρ 1 or ACF-lag1) is computed.Beuchat et al. [31] proposed an indirect computation making use of the predictions for two different temporal aggregations to derive the autocorrelation coefficient (see Equations ( 1) and ( 2)).Instead, we propose using the same approach followed for other statistics and fit a regression model between the lag-one autocorrelation coefficient and the supradaily statistics and atmospheric data.We also propose extending the work to the transition probabilities, i.e., the probability of having two consecutive wet or dry periods of duration T(φ WW T and φ DD T ).These two variables are useful, together with the lag-one autocorrelation coefficient, to inform synthetic models of the temporal structure of rainfall and also to reproduce antecedent conditions (e.g., initial detention storage levels).The combined use of the three statistics helps to construct more robust synthetic rainfall models [58].In fact, Cowpertwait et al. [58] found that when autocorrelations were excluded from the fitting procedure and the transition probabilities used, synthetic series simulated by Poisson cluster models matched the historical lag-one autocorrelations for most of the months and levels of aggregation.
The third change affects the selection of predictors.In order to keep the model as parsimonious as possible, the number of predictors considered in the analysis was limited to the minimal set, which resulted in accurate predictions for each subdaily statistic through the use of the Akaike information criterion [59].
The fourth change refers to the atmospheric reanalysis database used to provide the atmospheric predictors.As mentioned in Section 2, the CFSR database was used instead of the NCEP one since it provides higher spatio-temporal resolution.
From now on, we will refer with B (Beuchat's model) to the model presented in Beuchat et al. [31] and with RFB (random forests-based model) to the refined model presented in this article.

Model Evaluation
To evaluate the predictive skill of the different models, a 10-fold cross-validation was applied to the fitting procedure.Ten-fold cross-validation [60] divides the complete dataset into 10 portions of approximately the same size.The model was trained with nine of the portions (training set), and its skill was assessed on the hold-out portion; the one not used for training (testing set).This procedure was repeated 10 times, holding out each time one of the portions, and the average value of the skills was reported.
The model's skill to predict the different statistics was evaluated through the coefficient of determination R 2 , defined as: where is the sum of squares of residuals, f i being the model prediction for observation y i ; and where SS tot is the total sum of squares SS tot = ∑ i (y i − ȳ) 2 , which is proportional to the variance of the data.The coefficient of determination will be one (R 2 = 1) for a perfect model, zero (R 2 = 0) for a model no better than predicting the average every time, and can be negative (R 2 < 0) as models may be arbitrarily worse.The predictive skill of the models was defined as the average R 2 value resulting from the 10 cross-validation.Figure 2 depicts a graphical scheme of the methodology used in this study for downscaling daily rainfall.The original hourly rainfall database was aggregated into the daily scale.Daily rainfall statistics are computed as required to fit regression relations between daily statistics and hourly statistics.Monthly atmospheric data were also used as a predictor to improve the skill of the regressor.Predicted hourly statistics were compared with the observed ones and the predictive skill of the model assessed.

Synthetic Rainfall Generation
The rainfall generator proposed here is an extension of the Neyman-Scott rectangular pulse model (NSRPM) [61,62] that essentially allows the superposition of several independent point processes [63].
The origins of a storm follows a Poisson process of rate λ.Associated with each storm origin is a random number (υ) of rain cells, for which the waiting time after the storm origin of each rain cell is exponentially distributed with parameter β.The duration and the intensity of each rain cell are exponentially distributed with parameters and χ (scale parameter), respectively.The intensity of each rain cell is constant throughout its duration.The total intensity at any instant is the sum of the intensities due to all active cells at that instant.It is assumed that the duration, intensity, and waiting time after the storm origin of any rain cell are independent of other rain cells [15].
In the version used for this analysis, two independent superposed point processes were used to allow for different storm types (e.g., convective and stratiform rain), which leads to a model with 10 parameters (λ 1 , λ 2 , υ 1 , υ 2 , β 1 , β 2 , 1 , 2 , χ 1 , and χ 2 ).Different parameterizations for each calendar month provide an annual cycle of rainfall properties.Statistical properties of the model were derived by Cowpertwait [61], Cowpertwait [64], Leonard et al. [62], and Rodriguez-Iturbe and Eagleson [20].Besides, the model could also be easily extended in the space dimension if rainfall spatial correlation were known and required for the study area [63].

Model Comparison
In this section, we compare the results of the downscaling analysis carried out for models B and RFB, described in Section 3.All available hourly rainfall gauges (735 series; see Figure 1) were used to fit and evaluate the predictive skill of the models by cross-validation.
Results presented in Table 2 show that the RFB model improves the prediction of the B model in most climates for the 1-h temporal level.Only for BWh and BSh climates, the prediction given by B is better for the 1-h variance.These climates correspond to desert and arid climates, where we find higher values of TAS in the testing than in the training set.In these cases, while the B model, based on the MARS technique, is able to predict values larger than observed (extrapolate), RFB model is not.This is due to the fact that MARS is a parametric model, whereas random forests is an ensembling one.The limited skill of the B model to predict the ACF-lag1 1-h is due to the way in which lag-one autocorrelation coefficients are derived.As seen in Equations ( 1) and ( 2), instead of applying specific regressions for this variable, this method draws from previously-predicted quantities, and so increases the uncertainty of the prediction.
For larger aggregations periods (results for 12-h rainfall are presented in Table 2), results for variances, skewnesses, and probability of a dry interval (Pdry) improve remarkably with respect to the 1-h results in both models.Indeed, R 2 values around 0.9 were systematically obtained for the temporal aggregation of 12 h.However, the prediction of the ACF-lag1 deteriorated for larger aggregation periods.
Table 3 shows the predictors and the predictands selected for the B and RFB models.It is important to remark that the RFB model includes only a subset of the predictors originally used in Beuchat et al. [31].As demonstrated in Table 2, even with this reduction in the number of predictors, the RFB model improves the overall results obtained with the original B model for all statistics.Some of the predictors used in Beuchat et al. [31] explain a negligible percentage of the variance of our data; hence, we have removed them to avoid overfitting.The predictors that were ultimately selected were those whose removal significantly degraded model performance.Average surface air temperature (TAS) was the only large-scale atmospheric variable selected as a predictor except for the ACF-lag1 1-h statistics, for which surface air temperature variance (σ 2 TAS ), relative air humidity (HUR), and elevation of the station were also selected.
Table 3. Predictors used (first column of the table) by models B and RFB (heading row) for the prediction of each statistic (subheading row).Predictors are: average daily precipitation (µ 24 ), daily precipitation variance (σ 2  24 ), probability of a dry day (φ 24 ), probability of two consecutive dry days (φ 48 ), daily rainfall skewness (γ 24 ), lag-1 daily autocorrelation coefficient (ρ 1  24 ), probability of two consecutive wet days (φ WW 24 ), probability of two consecutive dry days (φ DD 24 ), average surface air temperature (TAS), surface air temperature variance (σ 2 TAS ), relative air humidity (HUR), and elevation of the station.Predictands are: rainfall variance (σ T ), probability of a dry interval (φ T ), rainfall skewness (γ T ), rainfall lag-1 autocorrelation coefficient (ρ 1 T ), probability of two adjacent wet intervals (φ WW T ), and probability of two adjacent dry intervals (φ DD T ) at the T should it be italics?(in hours) aggregation scale.A colored cell indicates that a predictor has been used to predict a given predictand for the specific model (B: Beuchat's model, in red; RFB: random forests-based model, in green).

Performance Analysis of RFB
Figure 3 shows the prediction obtained by the RFB model for variance, proportion of dry intervals (Pdry), skewness coefficient, and autocorrelation lag-one (ACF-lag1) for the 1-h temporal level.Figure 3 shows the scatter plots of the observed versus predicted values and the corresponding error distribution; the associated performances in terms of R 2 for 1-h and 12-h variance, Pdry, skewness, and ACF-lag1 are available in Table 2.The scatter plot and the error distribution show that the results for variance and Pdry were satisfying for all time scales with average R 2 values around 0.83 and 0.96, respectively (see Table 2).For skewness and ACF-lag1, the model was less accurate with R 2 values around 0.73 and 0.61.Nevertheless, error distributions were unbiased with most of the mass concentrated around zero for all the statistics.
The large dispersion existing in the observed values of skewness (with values between zero and 125) compared to the other statistics makes its prediction more difficult, explaining partially its lower accuracy.The skill of the models to predict skewness was significantly worse for climate D (see Table 2), mountainous areas of high altitude (>1000 m).There, local atmospheric conditions were not properly captured by large-scale atmospheric variables [65].Indeed, we realized that TAS was not correlated to the 1-h skewness for climate D, a reason why the ability of the models to predict 1-h skewness in this climate was hindered.
Predictions of 1-h ACF-lag1 were significantly less accurate than those of the other three statistics.The reason was that supradaily rainfall predictors have practically no importance in the prediction of 1-h ACF-lag1, unlike in the case of the other three statistics (see Table 3).In fact, ACF-lag1 one-day resolution was not selected as part of the predictors.In contrast to the skewness, the worst results for the 1-h ACF-lag1 correspond to climates BWh, BWk, BSh, and BSk (desert and semi-arid climates), which appear in small areas in the southeast of the provinces of Almería, Murcia, and Alicante, as well as in some areas of the Canary Islands.These climates, affected by extreme convective storms, show values of 1-h ACF-lag1 very close to zero, conditions under which random forests fails to classify in a single group.
The lack of accuracy on the 1-h ACF-lag1 predictions led us to explore another way to capture the temporal structure of rainfall.Then, the RFB model was used in the same way to predict the transition probabilities (from a dry interval to a dry interval, φ DD 1h , and from a wet interval to a wet interval, φ WW 1h ).As shown in Figure 4, most of the error distribution of the φ DD 1h lied within the ±1% error interval, whereas it spanned between ±50% error in the case of φ WW 1h .The relative abundance of two consecutive dry periods, in comparison with two consecutive wet periods (mostly due to the intermittency of rainfall) was likely the reason for the different performance.However, the results in terms of R 2 for the transition probabilities, with values around 0.97 and 0.77, were significantly better than those reached for the ACF-lag1 (0.61).Table 4 shows the values of R 2 for the 1-h and 12-h φ DD 1h and φ WW 1h .In contrast to the ACF-lag1, the predictive skill for the transition probabilities increased at larger temporal aggregations.As mentioned in the Methodology Section, we carried out a sensibility analysis to assess the effect in the results of improving the spatial resolution of the atmospheric reanalysis data.The CFSR database was used instead of the NCEP, since it provides a better spatial resolution of the atmospheric variables.Despite this improvement, results were not significantly affected.This could suggest that either average atmospheric conditions were sufficient to explain the regression between daily-to-subdaily rainfall statistics or the CFSR data resolution (0.312 • degrees) was not able to capture local atmospheric conditions in Spain.
Additional analysis were carried out to evaluate the robustness of the method.The first analysis referred to the total amount of stations used during fitting.An increasing number of hourly rainfall stations was selected in an iterative process and the overall performance computed.When the number of stations was above 25% of the total number of stations available (around 200), the results converged in terms of R 2 , and only minor changes on the error distribution were observed.Secondly, all the stations pertaining to a climate were removed from the database, and the model was used to predict hourly statistics for the missing climate; the process was repeated for each climate type.In this case, the model performance was severely degraded, indicating that the model skill to extrapolate hourly statistics to unobserved climates is limited.
An important point to highlight is that not even fitting the model for our dataset could we obtain coefficients of determination as high as the ones reported in Beuchat et al. [31] for the 340 gauges scattered throughout Switzerland, the U.K., and USA.An overall more homogeneous set of stations could explain this performance difference, although we would have expected a more heterogeneous dataset to have been used in Beuchat's model fitting, as the spatial coverage is larger.However, the gauges selected in Beuchat et al. [31] rarely exceeded the skewness value of 60, while in Spain, this value is higher for a large percentage of the stations.Therefore, the strong climate variability, both in space and time, found in the Spain dataset and the incorporation of more extreme rainfall regimes could explain the decrease in performance of the regressions.

Performance of Simulated Rainfall
In this subsection, we investigated how RFB-predicted statistics can be used in conjunction with Poisson cluster models to simulate synthetic hourly rainfall series in Spain.
Fifteen locations across Spain were used as case studies (see Figure 5) to illustrate the ability of NSRPMs, fed by RFB-predicted statistics, to generate synthetic rainfall hourly series in Spain.The gauges with larger hourly rainfall records were selected covering all climate types.As in Beuchat et al. [31], we assessed the ability of NSRPMs to reproduce observed hourly series characteristics under three scenarios.The "exact scenario" corresponds to the situation where NSRPM is fitted on observed daily and subdaily statistics.In the "target scenario", observed subdaily statistics are replaced by the RFB-predicted statistics.In the "simple scenario", NSRPM is fitted only on observed daily rainfall statistics.The performance of these three scenarios allows us to assess the loss of performance when the observed subdaily rainfall statistics is replaced by those predicted by RF technique, and also when subdaily rainfall data are not used in the fitting procedure.

Málaga Granada Almería
Fitting the NSRPM involves minimizing an objective function, which integrates a weighted sum of rainfall statistics at different temporal levels of aggregation.Table 5 shows the set of rainfall statistics and associated weights selected for fitting the models.In the case of the "simple scenario", only the weights and statistics at a daily (1 d) temporal aggregation were used.One NSRPM was calibrated for each study case and scenario, except for the "simple scenario", for which 10 NSRPMs were calibrated since the resulting subdaily statistics might differ significantly from one calibration to another.One thousand years of continuous synthetic rainfall records were simulated for each calibration in order to ensure the stability of the results.

Mean Variance Skewness Proportion of Dry Intervals
Lag-1 Correlation Figures 6 and 7 show the 1-h rainfall performance of the NSRPM simulations.Figure 6 shows the performance for variance, Pdry, and skewness, while Figure 7 that of ACF-lag1, φ DD , and φ WW .Black color corresponds to the "exact scenario"; the line and squares represent, respectively the observed and simulated 1-h statistics.Red color corresponds to the "target scenario"; the dashed-line and squares represent, respectively, the RFB-predicted and the simulated 1-h statistics.The blue-hatched area shows the range of statistics for the calibration in the "simple scenario".Each row corresponds to a case study.One-hour mean rainfall results are not shown in the study since the NSRPMs reproduce almost perfectly the observed values under the three scenarios.
As we can see in Figures 6 and 7, NSRPMs were flexible enough to simulate the observed and the RFB-predicted values of variance, Pdry, and skewness; except in some specific cases such as Almería in Month 5 and Barcelona in Month 6.In these cases, the squares (simulated statistics from the NSRPMs) fall far away from the lines (observed and predicted statistics).The temporal dependence terms (ACF-lag1, φ DD , and φ WW ), however, agree less accurately.φ WW was not correctly captured by NSRPMs for several months in Granada, Huesca, and Barcelona.Predicted φ DD matched observations with errors contained within ±2%.A perfect fit of the Pdry and the transition probabilities was never obtained in any of the case studies because rainfall values below a specific threshold were deemed null.
Comparing the observed statistics (black lines) with those simulated by the "target scenario" (red squares) and "simple scenario" (blue-hatched area), we concluded that the "target scenario" performed significantly better.The "simple scenario" underestimated variance and skewness in most cases (León, Albacete, Zaragoza, Huesca, Baleares, Cantabria); we will see later how this affects the generation of extreme values (see Figure 8).The "simple scenario" underestimated Pdry as well, which also showed great dispersion between the 10 calibrations.This was because NSRPMs did not keep the observed dry/wet spells at shorter temporal aggregations when not fed with subdaily statistics; the calibration process found solutions with a smaller number of storms (λ) of longer duration (β), leading to series with longer values of ACF-lag1 and φ WW than the observed.
Interesting results were also found when the empirical intensity-frequency curves derived from observed and simulated scenarios were compared (see Figure 8).Black dots represent exceedance probability values of the observed rainfall series.Black and red lines correspond to the exceedance probability of "exact scenario" and "target scenario", respectively.The blue-colored area represents the range of exceedance probability values between the 10 calibrations in the "simple scenario".Each panel corresponds to a case study.
Figure 8 shows that the results obtained by the "exact scenario" reproduced the observed values adequately.The same happens for the "target scenario", except for Barcelona and Mallorca, where the intensity of the values with a exceedance probability less than 0.001% (10 −5 ) was underestimated.In contrast to the results provided by the "exact scenario" and the "target scenario", the "simple scenario" failed to reproduce the observed exceedance probability in most of the cases.Average results of the 10-calibrations underestimated the observed values except for Málaga and La Coruña.In fact, the maximum simulated value in the "simple scenario" rarely exceeded the maximum observed value, which seems unlikely since the observed series had a maximum of 20 years of records, while the simulated series comprised 1000 years.Black color corresponds to the "exact scenario"; the line and squares represent, respectively, the observed and simulated 1-h statistics.Red color corresponds to the "target scenario"; the dashed-line and squares represent, respectively, the RFB-predicted and the simulated 1-h statistics.The blue-hatched area shows the range of statistics for the calibration in the "simple scenario".Each row corresponds to a case study.One-hour rainfall performance of the NSRPM simulation of ACF-lag1, φ DD and φ WW .Black color corresponds to the "exact scenario"; the line and squares represent, respectively, the observed and simulated 1-h statistics.Red color corresponds to the "target scenario"; the dashed-line and squares represent, respectively, the RFB-predicted and the simulated 1h-statistics.The blue-hatched area shows the range of statistics for the calibration in the "simple scenario".Each row corresponds to a case study.

Discussion and Conclusions
In this work, we present a robust methodology to generate hourly rainfall series in those locations of Spain where only daily data are available.For this, we have extended the methodology proposed by Beuchat et al. [31] to more than 700 hourly rainfall series covering the whole Spanish territory.The methodology aims at predicting subdaily rainfall statistics based on daily rainfall records and external predictors (climate variables from reanalysis datasets and elevation).The predictive skill of the methodology is evaluated by cross-validation.
Predicted rainfall statistics are: variance (σ 2 ), skewness (γ), proportion of dry intervals (φ), lag-one autocorrelation (ρ), and transition probability of dry-dry and wet-wet intervals (φ DD and φ WW ) for temporal aggregations of 1 h, 2 h, 3 h, 6 h, and 12 h.Results can be used in conjunction with Poisson cluster models to simulate synthetic hourly rainfall series for peninsular Spain, the Balearic Islands, and the Canary Islands.
Some changes over the model presented in Beuchat et al. [31] were proposed in this paper that result in the following improvements: (1) the use of random forests-based models improves the results providing a less biased distribution of relative errors and higher values of R 2 ; (2) the large amount of predictors are reduced to the minimum, avoiding overfitting and limiting the uncertainty of the model in case it was updated with future climate conditions (e.g., for climate change studies); (3) the proposed strategy of directly modeling lag-one autocorrelation with a regression relation, instead of indirectly through intermediate variables, has also improved the overall performance of the original model.The ease of use of machine learning techniques, together with their improved prediction skill, may make them preferable to more traditional methods like MARS.However, special care needs to be taken when the random forests approach is used to extrapolate outside the range of climate for which it was calibrated, since it is an ensembling learning method.
Fifteen case studies covering the different climate types were retained to illustrate the ability of the Poisson cluster models, fed by RFB-predicted statistics, to generate synthetic rainfall hourly series in Spain.They reproduced significantly better the observed hourly statistics than those informed only with daily rainfall statistics.The results obtained with those models calibrated only with daily rainfall data extremely underestimated the observed 1-h variance and skewness, predicted lower values of Pdry than observed, and showed values significantly higher than observed for the ACF-lag1 and φ WW .As our article shows, all these limitations have a direct impact on the highest rainfall values of the distribution, simulating values much lower than those observed.In addition, statistics obtained from the simulated hourly rainfall series when Poisson cluster models are fitted only with daily rainfall data show great dispersion depending on the set of parameters found by the models in each calibration.
The results are restricted to single-site applications (i.e., to small catchments).However, the model could also be easily extended to incorporate the space dimension if rainfall spatial correlation at a one-day level of aggregation were known in the study catchment.This model, known as STNSRP (spatio-temporal Neyman-Scott rectangular pulses), would allow simulating hourly rainfall series at ungauged sites by predicting the scale parameter from geographic variables (e.g., elevation).
The present study demonstrates that machine learning techniques are able to improve the skill of linear regression in order to predict subdaily rainfall statistics in Spain.The use of random forests instead of MARS results in less complex regression models with a smaller number of predictors.
The result of our work may be especially interesting for the study of extreme events, to daily-to-subdaily precipitation disaggregation, to calibrate and validate numerical models at hourly resolution, and to generate synthetic time series that properly capture rainfall structure at small scales in areas of Spain where hourly rainfall observations do not exist.

Figure 2 .
Figure 2. Scheme of the methodology.First, hourly rainfall data are aggregated into daily data.Second, supradaily and subdaily rainfall statistics are estimated.Third, monthly atmospheric series from reanalysis data over the period of observation of the corresponding hourly rainfall series are extracted in each gauge location.Then, monthly statistics (mean and variance) are estimated from atmospheric predictors.Daily and supradaily rainfall statistics together with monthly atmospheric statistics and elevation are used as a predictors.The regressor is trained to fit the observed statistics of hourly rainfall.SLP, sea-level pressure; MARS, multivariate adaptive regression splines.

Figure 3 .
Figure 3. Performance evaluation of the random forests-based model (RFB) for variance, proportion of dry intervals (Pdry), skewness coefficient, and autocorrelation lag-one (ACF-lag1) for 1-h intervals.The first and third columns show the scatter plots of the observed versus downscaled (predicted) values; colors indicate climate type; dashed black line correspond to the linear regression.The second and fourth columns show the relative error (%) distributions through a kernel density estimator representation (err = (Obs 2 − Pred 2 )/Obs 2 ).

Figure 4 .
Figure 4. Performance evaluation of the random forests-based model (RFB) for the transition probabilities dry-dry and wet-wet for 1-h intervals.Results are shown in the same way as defined in Figure 3.

Figure 5 .
Figure 5. Location of the hourly rainfall gauges used to perform the simulation of the rainfall by the Neyman-Scott rectangular pulse model (NSRPM).

Figure 6 .
Figure 6.One-hour rainfall performance of the NSRPM simulation of variance, Pdry, and skewness.Black color corresponds to the "exact scenario"; the line and squares represent, respectively, the observed and simulated 1-h statistics.Red color corresponds to the "target scenario"; the dashed-line and squares represent, respectively, the RFB-predicted and the simulated 1-h statistics.The blue-hatched area shows the range of statistics for the calibration in the "simple scenario".Each row corresponds to a case study.

Figure 7 .
Figure 7.One-hour rainfall performance of the NSRPM simulation of ACF-lag1, φ DD and φ WW .Black color corresponds to the "exact scenario"; the line and squares represent, respectively, the observed and simulated 1-h statistics.Red color corresponds to the "target scenario"; the dashed-line and squares represent, respectively, the RFB-predicted and the simulated 1h-statistics.The blue-hatched area shows the range of statistics for the calibration in the "simple scenario".Each row corresponds to a case study.

Figure 8 .
Figure8.Empirical intensity-frequency curves derived from observed and simulated rainfall at 1-h aggregation.Black dots represent exceedance probability values of the observed rainfall series.Black and red lines correspond to the exceedance probability of "exact scenario" and "target scenario", respectively.The blue-colored area represents the range of exceedance probability values between the 10 calibrations in the "simple scenario".Each panel corresponds to a case study.

Table 1 .
Organizations providing rainfall data for the study.

Table 2 .
Coefficient of determination (R 2 ) computed through 10-fold cross-validation for the 1-h and 12-h variance, probability of a dry interval (Pdry), skewness, and lag-1 autocorrelation coefficient for Beuchat's model (B) and for the random forests-based model (RFB) proposed in the present paper.Bold letters highlight the best performing model for every climate and statistic.

Table 4 .
Coefficient of determination (R 2 ) computed through 10-fold cross-validation for the 1-h and 12-h variance for the random forests-based model (RFB) proposed in the present paper.

Table 5 .
Set of statistics used to fit NSRPM and associated weights.d and h indicate daily and hourly level of aggregation, respectively.