A Novel Approach for Delineation of Homogeneous Rainfall Regions for Water Sensitive Urban Design—A Case Study in Southeast Queensland

: The delineation of homogeneous regions is primarily based on long-term overall rainfall characteristics and therefore does not necessarily consider the homogeneity of event-based rainfall characteristics. However, event-based rainfall characteristics including antecedent dry days, rainfall intensity, total rainfall and total duration of rainfall events are critical for Water Sensitive Urban Design (WSUD). Accordingly, this study presents a novel approach to objectively identify homogeneous rainfall regions based on event-based rainfall characteristics. This approach uses cluster analysis and Hosking–Wallis heterogeneous tests collectively to test the homogeneity of event-based rainfall characteristics. A case study conducted for southeast Queensland (SEQ), Australia is also presented in this article. This study compares the results of the novel modiﬁed approach against results of the conventional approach for the delineation of homogeneous regions. It was evident from the results that the entire SEQ could be treated as a homogeneous rainfall region based on the conventional approach. In contrast, based on the modiﬁed approach, the coast and the inland of SEQ were identiﬁed as separate homogeneous regions. Further, antecedent dry days and rainfall intensity were recognized as the deciding rainfall characteristics in the delineation of homogeneous rainfall regions.


Introduction
The critical features of Water Sensitive Urban Design (WSUD) are estimated by taking historical rainfall records of a representative rainfall station into account. Therefore, selection of an appropriate rainfall station that best represents the characteristics of regional rainfall is important for the design of WSUD. Typically, rainfall stations are selected by identifying potential homogeneous rainfall regions. A homogeneous rainfall region represents a region that has statistically similar rainfall everywhere in the region over a long period of time [1].
There are a range of methods available to identify rainfall homogeneous regions. These methods can be broadly classified into four, namely, geographical convenience, subjective partitioning, objective partitioning and multivariate analysis [1]. Geographical convenience refers to the demarcation of possible homogeneous regions based on administrative boundaries or based on major geographical and physical groupings. This approach is essentially arbitrary and often considered misleading. Subjective partitioning refers to the demarcation of possible homogeneous regions by the inspection of the rainfall characteristics and established prior knowledge about the study area [2]. Objective partitioning involves identifying a group of similar meteorological stations by minimizing a with-in group heterogeneous criterion. Typical heterogeneous criteria used by researchers include with-in group variation of sample coefficients of variation, with-in group variation of sample L-skewness and likelihood-ratio statistics [1,[3][4][5]. Multivariate analysis, in particular cluster analysis (CA) and principal component analysis (PCA), are prevalently used across various research studies to group meteorological stations with similar observations [4,[6][7][8]. In this regard, a multivariate data matrix that includes critical variables defining the characteristics of the meteorological stations within the study area is used in analysis so that the stations are grouped based on the similarities among them.
Subjective partition approaches are commonly used in identifying the rainfall homogeneous regions in the context of WSUD in Australia. For example, the Brisbane City Council and Moreton Bay Waterways [9] suggest the demarcation of rainfall homogeneous regions for southeast Queensland purely based on the established knowledge of the spatial variation of mean annual rainfall and the number of rain days per year. Similarly, the Northern Territory Department of Planning and Infrastructure [10] recommend single rainfall station data for Darwin, Australia, based on the assumption that the rainfall characteristics across Darwin Region is uniform. There are also instances where the nearest rainfall station has been used to study different aspects of WSUD [11][12][13].
In addition to the inherent subjectiveness in the existing approach, there are also those based on overall rainfall characteristics such as average annual rainfall, average number of wet days per year or average number of dry days per year [9]. However, the delineation of homogeneous rainfall region based on these variables may be misleading for WSUD [11,13]. In the context of WSUD, event-based rainfall characteristics are perceived with more importance than the overall rainfall characteristics. Rainfall event characteristics such as antecedent dry days, rainfall intensity, total rainfall and rainfall duration are critical characteristics of rainfall that are directly related to the stormwater quality and quantity. Stormwater quantity is directly linked to the total rainfall depth, duration and rainfall intensities [13][14][15][16][17]. Stormwater quality is primarily influenced by the pollutant processes, namely, build-up and wash-off. Build up is primarily a function of antecedent dry days [18], and wash-off is a function of rainfall intensity [18,19]. In addition, concentrations of different pollutants are found to be associated with different rainfall characteristics. For example, Wang et al. [20] found that the nitrogen concentration in stormwater runoff is inversely proportional to the rainfall duration. Lee and Bang [21] suggested that the concentration of suspended solids and chemical oxygen demand (COD) are proportional to the rainfall durations. In addition, in the context of climate change, the event-based rainfall characteristics are expected to change in most parts of the world while the average rainfall conditions remain more or less the same.
Therefore, it is critical to incorporate event-based rainfall characteristics in defining the homogeneous regions and thereby selecting the representative rainfall station for the design of WSUD. This paper presents a novel approach to objectively demarcate the rainfall homogeneous regions based on event-based rainfall characteristics.

Study Area
Southeast Queensland was selected as the study area. Southeast Queensland has an extensive meteorological station network with measurements taken in daily, 3 h, 30 min and 1 min (pluviography) formats. For this study, pluviographic rainfall data for 17 stations were collected from the Weather Station Directory of the Bureau of Meteorology (BoM), Australia, for the period between 2011 and 2015 primarily based on the availability of data. The stations used for this analysis and their geographical information are presented in Table 1.

Event Seperation
Individual events were separated, and variables such as antecedent dry days, maximum rainfall intensity, total rainfall depth and the rainfall duration of each event were determined. These event-based rainfall characteristics are critical rainfall characteristics that influence stormwater quality and quantity.
For example, pollutant build-up is primarily influenced by the antecedent dry days [11,[13][14][15], while the pollutant wash-off is influenced by the rainfall intensity [11,13] or total rainfall depth [22], which are the two processes that ultimately determine the stormwater quality. In addition, the total rainfall depth and duration of a rainfall event are directly related to the quantity of stormwater generated from a catchment [13,17]. Therefore, a demarcation of homogeneous regions based on the event-based rainfall characteristics such as antecedent dry days, rainfall intensity, total rainfall depth and rainfall duration is more appropriate. The following criteria were considered in separating individual rainfall events: • An event was considered independent only if the consecutive event was separated by at least 3 h antecedent duration. Otherwise, those events were treated as a single event. There is no guideline or literature available to suggest an accurate value for minimum antecedent dry period to consider two consecutive events as independent events. Therefore, 3 h was selected as a reasonable value based on previous experience and expert advice. • An event that constituted less than 1 mm total rainfall for a period greater than 1 h was not considered as a storm event and not considered for the analysis.

•
The maximum rainfall intensity (in mm/h) of the events was estimated by calculating the moving total of 1 h rainfall throughout the rainfall duration. • Any event having data entries of false quality based on quality classifications of BoM was discarded from the study.

Cluster Analysis
A cluster can be referred to as the formed group of objects with similar attributes in which the objects within the particular clusters are more relatable to each other and are clearly different from any objects from a different cluster. The higher the closeness of objects within the cluster and the higher the difference between different clusters will make the cluster analysis sound and accurate.
Many studies have opted for cluster analysis in identifying homogeneous rainfall regions, e.g., Terassi et al. [23], Lyra et al. [24], Goyal et al. [25] and Oliveira-Junior et al. [26]. However, the approaches and the algorithms used to perform the cluster analysis are inherently different. Among the commonly used cluster analyses, k-means clustering and hierarchical clustering are the most predominantly used clustering approaches [27].

k-Means Clustering
The number of clusters in k-mean clustering is user-defined and referred to as k. Once the k is defined, k number of random centroids are selected, and each object is assigned to the closest centroids to form clusters. The closest centroid is assigned based on the smallest Euclidean distance to the centroids as given by: where x = (x 1 , x 2 , x 3 , . . . , x n ) and c = (c 1 , c 2 , c 3 , . . . , c n ) denote the objects and the centroids of the clusters, and dist(x, c) denotes the Euclidean distance between x and c. Then, the centroids of the clusters are again calculated and used as the new centroids. The objects are then assigned to the closest new centroids. This process is repeated until the centroids remain unchanged. The quality of the k-means clustering can be expressed by an objective function that can be determined by the proximities of the objects to the cluster centroids. The Sum of Square Error (SSE), also referred to as scatter, is the most common index used to measure the quality of the clustering. SSE calculates the Euclidian distance of each object to their closest centroids, and the computation of SSE can be given by: where c denotes the centroids and x denotes the objects assigned to that particular centroid.

Hierarchical Clustering
Hierarchical clustering initially considers every single object as an individual cluster and successively merges the closest next cluster based on the Euclidean distance, as given in Equation (1), until they form a single cluster. This formation is typically presented in a dendrogram, which displays all sub-clusters and the order in which they merge. The proximity among the clusters is defined by three different approaches, namely, simple link, complete link and group average. Single link defines the proximities of the clusters based on the Euclidean distance between the closest two points of different clusters while the complete link defines the proximities based on the farthest points of different clusters. The group average determines the average pairwise Euclidean distance (for all objects) to measure the proximity of clusters.

Hosking-Wallis Heterogeneity Test
The Hosking-Wallis heterogeneity test is commonly applied for identifying the homogeneous rainfall regions in regional rainfall frequency analysis [28][29][30]. The Hosking-Wallis heterogeneity test has been opted into this research to objectively test the rainfall homogeneity of a region. This test primarily compares the variations in the L-moment ratios between the meteorological stations of the actual regions and a set of artificially created homogeneous regions (based on the average rainfall characteristics of the region).
In an ideal situation, every station in a homogeneous region should have the same L-moment ratios. However, in practice, the L-moments ratio are different for every meteorological station due to differences in their observations. Nevertheless, they can be reasonably treated as homogeneous if the differences in the L-moment ratios of the meteorological stations are statistically insignificant [1].
Accordingly, the Hosking-Wallis heterogeneity test estimates the degree of heterogeneity of a group of meteorological stations using a set of statistical indexes called H indexes. H indexes compare the between-station dispersion of L-moment ratios for a group of stations with what would be expected for an artificially developed homogeneous region. The artificial homogeneous region is developed by repeated simulations that generate synthetic rainfall data with the same record lengths of the actual meteorological stations based on the regional average L-moments. The regional weighted average of L-CV-t (R) , L-skewness-t 3 (R) , and L-kurtosis-t 4 (R) in the Hosking-Wallis heterogeneity test are calculated as: where N denotes the number of stations in the region and n j denotes the record length of station j.
In order to measure the heterogeneity of the meteorological stations, the Hosking-Wallis heterogeneity test suggests three dispersion measures, namely, V 1 , V 2 and V 3 based on the L-coefficient of variation, L-coefficient of variation and L-skewness and L-skewness and L-kurtosis, respectively, as given by: In order to simulate the dispersion measurements for the artificial homogeneous region, a four-parameter kappa distribution is fitted to the regional average L-moment ratios (1, t R , t 3 R and t 4 R ) to simulate N sim realizations of artificial homogeneous regions (with the same record lengths of the actual meteorological stations). The mean µ v and the standard deviation σ v of the dispersion measurements of the artificially simulated homogeneous region are calculated. Then, the dispersions of the actual and simulated homogeneous regions are compared using the statistical index H i (for i = 1, 2 and 3) as given by: Accordingly, three statistical indexes, H 1 , H 2 and H 3 , are calculated based on the corresponding dispersion measures V 1 , V 2 and V 3 . The region is declared acceptably homogeneous if H i < 1, possibly heterogeneous if 1 ≤ H i < 2 and definitely heterogeneous if H i ≥ 2 [1].
Furthermore, Hosking et al. [1] suggested that dispersion index H 1 alone can be used to identify homogeneous regions as it has higher discriminatory power than other indexes. However, many studies use all three dispersion indexes or at least the first two dispersion indexes to identify homogeneous regions [31,32]. Therefore, in this study, only the first two indexes, H 1 and H 2 , have been used to assess the degree of homogeneity of the region. The dispersion indexes were generated using an R package named 'homtests' [32].

Results and Discussion
The delineation of homogeneous regions in southeast Queensland was carried out using the conventional approach and the modified approach specifically designed to suit the requirements for WSUD, considering event-based rainfall characteristics.

Conventional Approach
In the conventional approach, the rainfall data of a set of the selected meteorological stations are tested for homogeneity purely based on the probabilistic characteristics of the continuous rainfall records. The method does not necessarily consider probabilistic characteristics of individual rainfall events.
Firstly, we performed the Hosking-Wallis heterogeneous test for the selected 17 stations in SEQ. The statistical indexes of heterogeneity, H 1 and H 2 , were calculated for 500 simulations of realizations using the R-package 'homtests' [32]. The results suggested that the entire region could be treated as homogeneous, with H 1 = 0.6483 and H 2 = 0.9142. The codes executed to run the package are provided in Appendix A.

Modified Approach
Although the overall rainfall characteristics of the entire SEQ were homogeneous, the event-based rainfall characteristics of the region may be different. For example, an event with intense rainfall for a short period of time and an event with less intense rainfall for a longer period of time may result in similar overall (average) rainfall characteristics. However, these events can potentially produce completely different stormwater quality and quantity scenarios. Therefore, it was important to consider the homogeneity of rainfall stations based on event-based rainfall characteristic when selecting the representative meteorological stations for WSUD. Table 2 presents the outcomes of the Hosking-Wallis heterogeneity tests performed for individual rainfall characteristics. The results suggest that the study area was potentially heterogeneous based on the H 1 and H 2 values. Based on the first two dispersion indexes (H 1 and H 2 ), antecedent dry day showed a higher level of heterogeneity compared to other event-based rainfall characteristics. The maximum rainfall was found to be potentially heterogeneous across SEQ. In contrast, total rainfall and rainfall duration were homogeneous across SEQ. Overall these results suggest that the entire SEQ cannot be considered homogeneous based on all event-based rainfall characteristics. Accordingly, homogeneous regions suggested based on the continuous-rainfall approach may not necessarily be homogeneous based on individual rainfall characteristics. In addition, it was also noticeable that the antecedent dry days and the maximum rainfall intensity showed heterogeneity among the rainfall stations while the total duration and the total rainfall of the events were homogeneous. Therefore, it can be concluded that antecedent dry day periods and maximum rainfall intensity have higher spatial variation and thus should be the deciding rainfall characteristics in the delineation of homogeneous regions.
Accordingly, the next step was to identify all potential homogeneous regions inside SEQ and to assess the degree of homogeneity of the identified potential homogeneous regions (using the Hosking-Wallis heterogeneity test). This step was performed using cluster analysis. Agglomerative hierarchical cluster analysis and k-means cluster analysis were performed using R package 'stats' [33]. The parameters used for the analysis included 3rd quartile (Q3) values of antecedent dry days, maximum rainfall intensity, total rainfall and duration of the individual events of each station as given in Table 3. This was because the average or the median of the rainfall characteristics at the meteorological stations considered were expected to be similar in nature and therefore may not be grouped into discrete clusters. In contrast, selecting a higher quartile value may result in too many unrealistic clusters. Our repeated analysis suggested that the use of the 3rd quartile values provide the most appropriate outcomes. The analysis suggested three potential clusters based on the dendrogram shown in Figure 1. Accordingly, Cluster 1 comprised Stations 9, 7 and 11. Cluster 2 comprised Stations 3, 12, 14, 16 and 17 and Cluster 3 comprised Stations 1, 5, 6, 8, 10, 13 and 15. However, the geographical locations of the meteorological stations of the clusters as presented in Figure 2 suggested that the meteorological stations of Cluster 1 and Cluster 2 were located in close proximity to the SEQ coast and the meteorological stations of Cluster 3 were located inland. Furthermore, Cluster 1 and Cluster 2 stations did not show a clear geographical separation.
Furthermore, scatter plots produced to examine the meteorological stations of the three clusters based on the considered rainfall characteristics are presented in Figure 3. It can be observed from Figure 3 that the rainfall stations of Cluster 3 have clear distinctions between the rainfall stations of Cluster 1 and Cluster 2. In contrast, the rainfall station of Cluster 1 and Cluster 2 showed no clear separations. These results suggest that Cluster 1 and Cluster 2 can be treated as single clusters representing coastal SEQ while Cluster 3 represents inland SEQ. Accordingly, two potential homogeneous regions were identified within the study area, namely, Coastal-SEQ and Inland-SEQ. The degree of homogeneity of the identified regions of Coastal-SEQ and Inland-SEQ were evaluated by performing the Hosking-Wallis heterogeneity tests using event-based rainfall characteristics. The summary of the results is presented in Table 4. As shown in Table 4, the dispersion indexes H 1 and H 2 were found to be less than one for all the rainfall characteristics for both Coastal-SEQ and Inland-SEQ. Therefore, Coastal-SEQ and Inland-SEQ were identified as two separate homogeneous regions within SEQ based on the event-based rainfall characteristics.

Conclusions
The entire southeast Queensland can be treated as a homogeneous region based on the conventional (continuous-rainfall) approach. However, based on individual rainfall characteristics such as antecedent dry days, maximum rainfall intensity, total rainfall and duration of the rainfall events, there were two separate homogeneous regions identified.
This implies that although the characteristics of the continuous rainfall data between stations were statistically similar, the event-based characteristics can have significant differences among stations. Therefore, the conventional approach in delineating rainfall homogeneous regions may be misleading in the context of Water Sensitive Urban Design. The newly proposed approach in this study is technically more robust and provides reliable results.
The outcomes of this study can also be broadly related in any rainfall homogeneous assessment. The most common use of homogeneous testing is regional frequency estimations. Many of the regional frequency analyses are based on the conventional Hosking and Wallis [1] regionalization approach including the revised Intensity-Frequency-Duration (IFD) estimates for Australia [34], where stations were grouped with the assumption that all the stations in the homogeneous regions have a similar probability distribution with a single scaling factor. However, as discussed, the current approach does not check the homogeneity based on individual rainfall characteristics. Therefore, the method adopted in this study can be effectively used in identifying homogeneous regions in the regional frequency analysis especially for studies that depend on the event-based characteristics of rainfall such as Water Sensitive Urban Design.
The newly proposed approach for identifying homogeneous rainfall region is more suitable in studies related to climate change. This is because the common outcomes from studies related to future rainfall trends suggest that although not many changes are expected in the overall rainfall characteristics, the rainfall patterns and characteristics are expected to change significantly in the future. Therefore, it will be more appropriate to apply the approach presented in this study in defining the rainfall homogeneous regions in the context of climate change.