A Nonparametric Stochastic Approach for Disaggregation of Daily to Hourly Rainfall Using 3-Day Rainfall Patterns

: As infrastructure and populations are highly condensed in megacities, urban ﬂood management has become a signiﬁcant issue because of the potentially severe loss of lives and properties. In the megacities, rainfall from the catchment must be discharged throughout the stormwater pipe networks of which the travel time is less than one hour because of the high impervious rate. For a more accurate calculation of runo ﬀ from the urban catchment, hourly or even sub-hourly (minute) rainfall data must be applied. However, the available data often fail to meet the hydrologic system requirements. Many studies have been conducted to disaggregate time-series data while preserving distributional statistics from observed data. The K-nearest neighbor resampling (KNNR) method is a useful application of the nonparametric disaggregation technique. However, it is not easy to apply in the disaggregation of daily rainfall data into hourly while preserving statistical properties and boundary continuity. Therefore, in this study, three-day rainfall patterns were proposed to improve reproducible ability of statistics. Disaggregated rainfall was resampled only from a group having the same three-day rainfall patterns. To show the applicability of the proposed disaggregation method, probability distribution and L-moment statistics were compared. The proposed KNNR method with three-day rainfall patterns reproduced better the characteristics of rainfall event such as event duration, inter-event time, and toral amount of rainfall event. To calculate runo ﬀ from urban catchment, rainfall event is more important than hourly rainfall depth itself. Therefore, the proposed stochastic disaggregation method is useful to hydrologic analysis, particularly in rainfall disaggregation.


Introduction
With the urban catchment being paved, rain must be discharged throughout the stormwater pipe networks. Although the travel time depends on the geometry of the catchment; in many cases, this is less than one hour because of the high impervious rate. Therefore, accurate runoff calculations from urban catchments has become a complex task requiring specific data, such as sub-hourly rainfall data and geographical representation of the catchment. In order to calculate runoff in hourly time step, hourly rainfall data has to be observed. However, historical hourly rainfall data are not always available; therefore, disaggregation of available data to hourly rainfall is required to improve the runoff model accuracy in mega cities.
Many researches have been conducted to disaggregate historical annual flow data to monthly or daily flow data and preserve statistical properties [1][2][3][4][5][6][7]. A method [6] is proposed to disaggregate annual streamflow data into monthly data using the fractional noise approximation method. In this method, after the back transformation, the sum of monthly flow data with seasonal variability becomes equal to the annual flow data. However, these methods become complicated to apply as the dimensions increase. Therefore, various researchers have attempted easily applicable non-parametric methods. These non-parametric methods are data-driven and prior assumptions about the additivity property and the probability density function do not need to be defined in advance.
The kernel density estimation method [8] and the K-nearest neighbor resampling (KNNR) method [9] are the most popular non-parametric methods. They have been widely applied to simulate the conditional probability density function. However, the kernel density estimation method [8] has the potential disadvantage of inefficiency and boundary discrepancy in high-dimensional problems [10,11]. Boundary discrepancy indicate the violation of flow continuity across the disaggregation time boundaries [10]. As discussed in [10], for example, if a year belongs to a drought year, the flowrate at the end of the year indicates a drought pattern. However, this might be reversed in the first flowrate for the next year because there is no correlation between the last value in a year and the first value in the next year. When the year changes, the drought flowrate could be shifted to the wet year. This is not realistic. Therefore, it is especially important to preserve the boundary continuity.
A computationally faster K-nearest neighbor-based time resampling approach (KNNR) [12] replaces the kernel density estimation [9] to avoid the computational inefficiency and the boundary issues. This technique was applied to disaggregate yearly flow to monthly [11]. A method of the combination of KNNR method and optimization technique was also applied to disaggregate monthly flow data into daily flow [13]. However, optimization technique required long time of computation to preserve statistical characteristics and continuity [14].
In the field of daily rainfall generation, stochastic resampling methods have been successfully applied in many studies because continuity is not required [15][16][17][18][19]. Therefore, it could be assumed that no boundary continuity is required in disaggregated hourly rainfall. However, still a key requirement in stochastic rainfall simulation is that the generated sequences resemble the observed rainfalls. Unlike the monthly flowrate, hourly rainfall does not have periodicity. When annual flowrate is disaggregated into monthly, first, it is determined whether the annual flowrate belongs to dry or wet year. Then, periodicity of monthly flowrate in dry or wet year is referenced in the disaggregation procedure.
However, when daily rainfall is disaggregated into hourly, it is difficult to determine the distribution of hourly rainfall in a day because there is no typical periodicity. Even when the amount of daily rainfall is same, typical distribution of hourly rainfall cannot be determined; thus, statistical properties of observed hourly rainfall cannot be preserved.
In this study, three-day rainfall patterns were proposed to preserve statistical properties of hourly rainfall continuity across the boundaries. Distribution of hourly rainfall only from daily rainfall data group having the same pattern is referred in disaggregation procedure. Daily rainfall data in four stations across South Korea were stochastically disaggregated to hourly rainfall using the modified KNNR method with three-day rainfall patterns. The statistical properties with/without three-day rainfall patterns of disaggregated and observed hourly rainfall were compared.

Stochastic Disaggregation Using the KNNR Method
The basic concept of nonparametric space-time disaggregation in [7] is derived from the conditional probability function, f (X d Z) , of hourly rainfall X d with the condition of daily rainfall Z. The sum of hourly rainfall X d is daily rainfall Z. The sum of the total 24-h rainfall (X 1 , X 2 , . . . , X d , d = 24) should be equal to the daily rainfall Z.
To disaggregate daily rainfall data into hourly data, a set of n historical hourly rainfall data must be expressed in the form of a matrix X(d × n) as follows: In addition, the daily rainfall data vector, Z(1 × n), must be defined. We use the Gram-Schmidt process and its inverse process to disaggregate the Z value. The Gram-Schmidt process is an orthonormalizing column vector of X to the plane. The rotation matrix R is obtained using the method described by [8]. Using the rotation matrix R, Y(d × n) can be obtained as Y = R × X.
Using matrix R, matrix X is rotated to matrix Y and the last row of the transformed matrix Y is changed to Y d = Z/ √ d. K-nearest neighbors among daily rainfall data, having the same three-day rainfall pattern, were selected using the weight W(k) in Equation (2).
where k = 1, 2, . . . , K, K = √ N, and N is the number of sample data. The disaggregation is completed by back transformation to the original space using x = R T Y . Where x (d × 1) is the disaggregated hourly rainfall vector.

Proposed Three-day Rainfall Patterns
As described previously, stochastically generated rainfall data should reproduce statistics of observed rainfall data. Therefore, when daily rainfall data is disaggregated into hourly rainfall, statistical properties such as mean, standard deviation, etc., should be preserved. However, the modified KNNR in [9] is not applicable to disaggregate daily rainfall into hourly because it is not guaranteed that the statistical properties will be reproduced. Therefore, hourly rainfall distribution or some information about hourly rainfall depth are required.
From preparative analysis about rainfall data in South Korea, it was found that duration of rainfall event in South Korea is often longer than a day. Therefore, depending the rainfall depth in previous and subsequent days, distribution of hourly rainfall should be deferent. In this study, three-day rainfall pattern was introduced. Six rainfall patterns were proposed to disaggregate hourly rainfall data into 20 min in [20,21]. In addition to the six rainfall patterns, seventh rainfall pattern is newly proposed in this study.
In Figure 1, the seven rainfall patters were presented. When the amount of rainfall in D day is in the middle, rainfall patterns "decrease" from previous day (D − 1) to subsequent day (D + 1) in Types 1, 3a, and 4a. Rainfall patterns "increase" from previous day (D − 1) to subsequent day (D + 1) in types 2, 3b, and 4b. Lastly, Type 5 that is newly proposed in this study shows the "equal" amount of rainfalls in previous (D − 1) and subsequent (D + 1) days.
From a different perspective, with the D day rainfall in the center, "between" patterns decrease (type 1) or increase (type 2) from previous day (D − 1) to subsequent day (D + 1). In "lower" patterns, rainfall in D day is smaller than previous and subsequent rainfalls (type 3a and 3b). Lastly, "higher" patterns have the higher rainfall in D day than previous and subsequent rainfall (type 4a, 4b, and 5).
If rainfall duration is longer than two days, any day should be included in one of six rainfall patterns: type 1, 2, 3a, 3b, 4a, and 4b. Even if rainfall duration is more than four days, information about three-day rainfall pattern is enough to disaggregate a target daily rainfall data. However, only one day's rainfall has to be defined separately because it is not increasing nor decreasing patterns. Since it is not easy to have the same rainfall depth in previous and subsequent days, type 5 (equal) usually represents the rainfall event that event duration is less than a day.
By introducing a three-day rainfall pattern, continuity across boundaries might be also preserved. When hourly rainfall increases at the end of a day, the first hourly rainfall of the next day is likely to increase as well because the rainfall patterns were considered. In Figure 2, disaggregation process is described where the hourly and daily rainfall data were collected from the four observatory stations, and matrix X is composed using the observed hourly and daily rainfall data. Daily rainfall data were disaggregated into hourly rainfall data under the process described in Figure 2. Rotation matrix is obtained using the method described in [8] and applied to calculate Y. The historical daily rainfall data were categorized into seven rainfall patterns ( Figure 3) and select K nearest neighbors from a corresponding rainfall pattern (group). Using the K nearest In Figure 2, disaggregation process is described where the hourly and daily rainfall data were collected from the four observatory stations, and matrix X is composed using the observed hourly and daily rainfall data. Daily rainfall data were disaggregated into hourly rainfall data under the process described in Figure 2. Rotation matrix R is obtained using the method described in [8] and applied to calculate Y. The historical daily rainfall data were categorized into seven rainfall patterns ( Figure 3) and select K nearest neighbors from a corresponding rainfall pattern (group). Using the K nearest neighbors, daily rainfall is disaggregated into hourly rainfall using three-day rainfall patterns. The procedure could be repeated as many as required.
When the hourly rainfall is retransformed, small number of negative values (less than 0.4%) could be generated [9]. However, as mentioned in [9], the negative value did not influence the data statistics. In this study, 0.7% of negative values also was occurred. However, the sum of the negative values was distributed over the positive hourly rainfall data according to the ratio of rainfall depth. Therefore, influence of negative value was removed.

Applied Data
In this study, four representative locations (Seoul, Incheon, Busan, and Jeju) observed by the Korea Meteorological Administration (KMA) were selected. These four locations are the most representative regions of Korea's rainfall patterns located in the inland, coastal, and southernmost islands of the Republic of Korea (South Korea). Figure 3a shows the location of four observatories. Hourly rainfall data of these four observatories during the rainy season (April to October) from 1961 to 2017 (57 years) were collected.

Comparison of Statistical Properties of Hourly Rainfall with/without Three-day Rainfall Patterns
Daily rainfall in Seoul, Incheon, Busan, and Jeju observatories were disaggregated stochastically using KNNR method with/without three-day rainfall patterns. Disaggregation was implemented 10,000 times and the simple results are summarized in Figure 4.
First, the Kolmogorov-Smirnov test (K-S test) was implemented to compare the probability distribution of statistical properties in observed and simulated hourly data and the results are shown For hydrologic purposes, hourly rainfall data was divided into rainfall events for runoff calculation. Rainfall events were determined using 11 h of inter-event time definition (IETD) using observed hourly rainfall data. IETD of urban areas in South Korea was known as 11 h according to [22]. In each station, separated rainfall event was more than 2000 (around 35 events per year). In Figure 3b-e, observed daily rainfall was categorized into seven rainfall patterns. In all four locations, the lower patterns (type 3a and 3b) have the smallest number of cases in South Korea. The other five patterns have similar number of cases around 400-500. Particularly, the newly proposed rainfall pattern (type 5) has many cases in all of four locations. Therefore, it is found that just one-day rainfall events are also frequent in South Korea. Other than type 5, rainfall events with other three-day rainfall pattern types are considered as more than two days of rainfall. Average event duration of South Korea is 2-3 days depending on a station.

Comparison of Statistical Properties of Hourly Rainfall with/without Three-day Rainfall Patterns
Daily rainfall in Seoul, Incheon, Busan, and Jeju observatories were disaggregated stochastically using KNNR method with/without three-day rainfall patterns. Disaggregation was implemented 10,000 times and the simple results are summarized in Figure 4.
First, the Kolmogorov-Smirnov test (K-S test) was implemented to compare the probability distribution of statistical properties in observed and simulated hourly data and the results are shown in Figure 4. Four statistical properties, hourly rainfall depth (h), rainfall event duration (ED), inter-event time (IET), and total rainfall depth of rainfall event (TRE), were selected to be applied in K-S test. The results of K-S test were presented using p-values of 10,000 simulated cases in the form of box plot in Figure 4. Under null hypothesis that the probability distribution of statistical properties (h, ED, IET, and TRE) from the observed and simulated hourly rainfall are same, p-values (significance probability) were calculated with significance level of 5% and 1%. If p-value is lower than 0.05 (significance level of 5%), the null hypothesis is rejected and it means that the probability distributions of the statistical properties (h, ED, IET, and TRE) are different each other; thus, two data sets (original and disaggregated) are assumed to be dissimilar. While if p-value is greater than 0.05 (significance level of 5%), the null hypothesis is selected, and two data sets might have the same probability distribution and statistical properties.
The reason that four statistical properties (h, ED, IET, and TRE) were selected to analysis the hourly rainfall data is because the data is commonly used to calculate the runoff from urban catchment. In mega cities, surfaces are paved with asphalt and rainwater is discharged through stormwater pipe network, which is designed for rainfall from 10 to 30 years of frequency. It is important to recognize information about rainfall events such as rainfall event duration (ED), inter-event time (IET), and total rainfall depth of rainfall event (TRE). Therefore, not only hourly rainfall depth but also rainfall event information is critical to analyze characteristics of runoff from urban catchments.
In Figure 4, when three-day rainfall patterns were not applied (left in Figure 4), null hypothesis for IET and TRE were rejected in many locations under significance level of 5%. That means that the probability distribution of IET and TRE from original and disaggregated hourly rainfall data are not the same. P-value for ED without three-day rainfall patterns are larger than 5% significance probability. Whereas, p-value of ED, IET, and TRE increase in all locations when three-day rainfall patterns were applied (right in Figure 4). Therefore, characteristics for rainfall events such as ED, IET, and TRE become more similar to the original hourly rainfall when three-day rainfall patterns were considered.
Null hypothesis for hourly rainfall depth from original and disaggregated hourly rainfall are both accepted. Therefore, regardless of whether three-day rainfall patterns were considered or not, probability distributions of hourly rainfall depth are similar with 5% of significance level. Therefore, it seems that disaggregation for independent data works properly and it is difficult to conserve the characteristics of rainfall events.  The characteristics of rainfall events are related to the boundary issue as well. In Figures 5 and 6, probability distribution of original hourly rainfall, KNNR method without three-day rainfall patterns, and the proposed KNNR method considering three-day rainfall patterns in Seoul ( Figure 5) and Busan (Figure 6), were drawn. In both Seoul and Busan stations, hourly rainfall distribution of the proposed KNNR method considering three-day rainfall patterns shows more similar distribution to the original hourly data. Particularly, Type 1, 2, 4a, 4b, and 5 show much better performance than KNNR method without rainfall patterns. Therefore, the characteristics of rainfall event such as ED, IET, and TRE were preserved better in the proposed KNNR method with three-day rainfall patterns. Morever, the last hour rainfall in previous day and the first hour rainfall in subsequent day were also connected more realistically. Therefore, disaggregated hourly rainfall data with three-day rainfall patterns are more appropriate to be used in hydrologic calculation in urban area.   According to Figure 7, regardless of whether three-day rainfall patterns were considered, Lmoment statistics (L-mean, L-scale, L-skewness, and L-kurtosis) were calculated similarly in both cases. All of four L-moment statistics for hourly rainfall depth (h) in both cases are higher than According to Figure 7, regardless of whether three-day rainfall patterns were considered, L-moment statistics (L-mean, L-scale, L-skewness, and L-kurtosis) were calculated similarly in both cases. All of four L-moment statistics for hourly rainfall depth (h) in both cases are higher than average. Therefore, three-day rainfall patterns do not affect the disaggregated results. However, L-moment statistics for total amount of rainfall event (TRE) are improved and describe the original hourly rainfall statistics. Therefore, characteristics of rainfall event are preserved better when three-day rainfall patterns.

Conclusions
Daily rainfall was stochastically disaggregated using KNNR method considering three-day rainfall patterns. In the field of rainfall disaggregation, it was difficult to preserve statistical properties and cross-boundary continuity. Sometimes, it was assumed that boundary continuity is not necessary to be preserved. However, disaggregated hourly rainfall data should have similar statistical properties and cross-boundary continuity must be preserved when event duration is longer than a day. Total seven three-day rainfall patterns were proposed after analyzing daily rainfall patterns in South Korea. In the proposed KNNR method, hourly rainfall was stochastically resampled only from a group with same three-day rainfall pattern. The proposed method was applied to 52 years of daily rainfall data observed at Seoul, Incheon, Busan, and Jeju weather stations in South Korea. Hourly rainfall depth (h), event duration (ED), inter-event time (IET), and total rainfall depth (TRE) of the original and disaggregated data were compared, respectively, with a significance level of 5% and 1% to compare probability distribution. L-moment statistics were also compared. As a result, characteristics of rainfall event such as event duration, inter-event time, and total amount of rainfall depth were reproduced successfully when three-day rainfall patterns were considered. Therefore, the disaggregated hourly rainfall data could be applied in hydrologic models to calculate runoff from urban catchments that hourly or sub-hourly rainfall data is required. Moreover, boundary continuity in hourly rainfall distribution was preserved more realistically when three-day rainfall patterns were considered. These results can be used for the extension of insufficient hourly rainfall data or the temporal disaggregation of climate change scenarios.
Author Contributions: H.P. contributed to conceptualization, methodology, data collection, software, and validation. G.C. contributed to conceptualization, methodology, writing, and validation. All authors have read and agreed to the published version of the manuscript.