Spatial Characteristics of Precipitation in the Greater Sydney Metropolitan Area as Revealed by the Daily Precipitation Concentration Index

: In this study; the spatial distribution of the Daily Precipitation Concentration Index (DPCI) has been analyzed inside the Greater Sydney Metropolitan Area (GSMA). Accordingly, the rainfall database from the Australian Bureau of Meteorology archive was utilized after comprehen ‐ sive quality control. The compiled data contains a set of 41 rainfall stations indicating consistent daily precipitation series from 1950 to 2015. In the analysis of the DPCI across GSMA the techniques of Moran’s Spatial Autocorrelation has been applied. In addition, a cross - covariance method was applied to assess the spatial interdependency between vector ‐ based datasets after performing an Ordinary Kriging interpolation. The results identify four well ‐ recognized intense rainfall develop ‐ ment zones: the south coast and topographic areas of the Illawarra district characterized by Tasman Sea coastal regions with DPCI values ranging from 0.61 to 0.63, the western highlands of the Blue Mountains, with values between 0.60 and 0.62, the inland regions, with lowest rainfall concentra ‐ tions between 0.55 and 0.59, and lastly the districts located inside the GSMA with DPCI ranging 0.60 to 0.61. Such spatial distribution has revealed the rainstorm and severe thunderstorm activity in the area. This study applies the present models to identify the nature and mechanisms underlying the distribution of torrential rains over space within the metropolis of Sydney, and to monitor any changes in the spatial pattern under the warming climate.


Introduction
The awareness of the importance of the spatial and temporal distribution of precipitation is important not only from a meteorological viewpoint, but also for its importance in different fields such as agriculture, hydrology, water resources and flood control. Estimation of the spatial and temporal distribution of precipitation is a complex undertaking, particularly in cases where detailed information concerning the impact of topography and land−use impacts on the prevailing atmospheric circulation is not quite available, such as the situation over southeastern Australia [1].
The concentration index (CI) is one of the indices that can be applied to characterize the temporal concentration of precipitation followed by spatial analysis [2]. A CI analysis makes it possible to characterize different spatial scales, which is of interest due to its effects on geo−hydrological processes and the analysis of erosion and soil loss [3]. Applying this type of analysis, interest is not only focused on climate but also on the effect of heavy rainfall on other areas of the environment and society [4,5]. The CI method was already applied in many different parts of the world [6][7][8][9][10][11][12][13][14][15][16][17][18]. While we focus on the most commonly applied index here, there are other indices that indicate different aspects of

Climatology of the Study Area
The GSMA, which is located on the southeast coast of Australia in New South Wales and lies in the western part of the Tasman Sea (Figure 1), includes a highly populated area of approximately 3.8 million in population. The study area is bounded in the north by 33°30′ S latitude, extending to 150°30′ E longitude in the west, and to the southeast at 34°30′ latitude and 151°30′ longitude. The region is bowl-shaped with a low plain in the middle which is effectively walled in on three sides by hills. In general, the Sydney region enjoys a temperate climate and commonly the broad-scale wind pattern is westerly in the winter, and easterly in the summer. The climate of this region arises from a complex interaction of broad-scale, regional and local controls [23].
Rainfall over the GSMA may occur throughout the year but is highest between March and June. Also, precipitation is slightly higher during the first half of the year when easterly winds dominate (February-June), and lower in the second half; mainly from July to September. Rainfall can occur throughout the year with variation concerning altitude and distance from the coast, with wetter areas being closer to the coast or in higher altitudes. Due to the low predictability of rain as well as the well-known impacts from climate drivers to the region [24], the wettest and driest months change annually. Within the study area and surrounds, annual rainfall varies from around 700 mm to 1400 mm. More climatological information of the study area can be found in [25] and [26]. On the regional scale, rainfall in the GSMA is influenced by the synoptic weather systems in the region, such as fronts originated from the Southern Ocean, east coast lows, subtropical cyclones and ex-tropical cyclone remnants migrating to the higher latitudes.
More locally, the GMSA is also known as one of the hotspots for severe thunderstorms in Australia [26]. As can be seen in the following analysis, the torrential rain from severe thunderstorms contribute significantly to the spatial characteristics of CI in the area.

Data
Daily rainfall data for forty−one (41) weather stations have been extracted from the Australian Bureau of Meteorology (BoM) online archives. Recording periods varied in duration for each station, but many data are available from 1950 to 2015. Rainfall data from the BoM has already been quality controlled with confirmation of the extremes with local reports and that observations from nearby stations do not disagree with each other. We have further verified such agreement among the stations after we downloaded the data. Most of the stations have complete data series in the study period-even the station with the least recorded data over the whole period has over 90% of coverage. The provided daily rainfall data, presenting relatively uniform coverage throughout the study area, have been carefully entered in a particular GIS database. Rainfall station characteristics are shown in Table A1 (Appendix A) and their spatial distribution is mapped in Figure 2. The only series with less than 10% missing data on an annual scale were used to calculate the DPCI indexes.
To reach the main aims of the study, four interconnected techniques of Concentration Index, Moran's Spatial Autocorrelation, Ordinary Kriging interpolation, and a cross-covariance method were applied to assess the spatial dependence (covariance) between vector-based datasets. The last method was applied to reveal several of the inherent spatial inter-dependencies among dissimilar variables.  Table A1 (Appendix A). Topography in the area can refer to the DEM in

The Daily Precipitation Concentration Index (DPCI)
The DPCI, proposed by [27], is examined in this study. An example is illustrated for the Albion Park station data; this station recorded the highest rain during the 67 years from 1950 to 2015 (see Table A2 in Appendix A). In computing the DPCI, only observed daily precipitation values more than 1 mm were considered. The DPCI method was applied to the data based on the fact that the contribution of daily rainfall events to the total amount is generally well described by a negative exponential distribution [28].
The DPCI in this study consists of aggregating daily precipitation into increasing 10−mm categories and determining the relative impact of the different classes by analyzing the relative contribution (as a percentage) of the accumulated precipitation, Y, as a function of the accumulated percentage of occurrence frequency (X). Previous work such as [11] showed that such function can be based on Equation (1). . (1) where a and b are constants that can be determined through the least−squares method.
where N is the number of classes. After determining the two constants a and b, the integral of the exponential curve (so−called Lorenz curve) between 0 and 100 shows the area S ( Figure 3), which is given by: Based on S, the area S' compressed by the exponential curve, the equidistribution line and X = 100 is apparently the difference between 5000 (half of the total area) and the value of S:

5000
(5) Applying Equation (5) the DPCI value for each rainfall station is then a fraction of S´ to the lower surface of the triangle bounded by the equidistribution line.
Examples of the empirical curves or "concentration curves" of Y versus X for Albion Parks and Wombeyan stations are presented in Figure 3. The annual DPCI values for these two stations are 0.62 and 0.54, respectively (see Table 1). By definition, the value of the DPCI is always a number between 0 and 1, and geometrically it represents the percentage of the triangle area between the line Y = X and the exponential curve. The DPCI is virtually equal to 0 when the contribution of each category of precipitation to the total is the same, and equal to 1 when precipitation falls into one category only and the exponential curve becomes the straight line Y = 0. Exponential curves of this type were calculated for all meteorological stations across the GSMA. As an example, different stages of calculating the above−mentioned parameters are given in Tables A1 and A2 (Appendix A). Table 1. Values of the constants "a", "b", DPCI, 90% percentile of rain and maximum daily rainfall (mm) at each of the 41 stations (with full names given in Table A1).

Spatial Correlation
In the second stage of data analysis, a Moranʹs spatial autocorrelation technique was used to measure spatial autocorrelation based on rainfall station locations and DPCI values [29]. Given the set of 41 rainfall stations and associated DPCIs, it evaluates whether the pattern expressed is clustered, dispersed or random ( Figure A1 in Appendix B). The tool calculates the Moranʹs I Index value and both a z−score and p−value to evaluate the significance of that Index. The Moranʹs I statistic for spatial autocorrelation is given by where Zi is the deviation of an attribute for feature (i.e., a particular rainfall station's DPCI) from its mean, Wi,j is the spatial weight between stations i and j (designated as the significance, i.e., the p−value, of the correlation of rain between the two stations), N is the total number of stations and S0 is the aggregate of the spatial weights by: For the current study, the ZI score for the statistic is computed by applying the following equations.
In which E is the expectation value and V the variance. Under the case of no spatial autocorrelation, ⌊ ⌋ 1/ 1 . Subsequently, a spatial interpolation method, known as the Kriging technique, was applied to yield better results than other techniques ( [30,31]). The Kriging technique assumes that the statistical surface to be interpolated has a certain degree of continuity ( [32]). The technique applies moving averages and has the advantage of producing the standard error for the estimated values. Among all the Kriging methods, the ordinary mode was applied, as an advanced geostatistical procedure. This method was well fitted to all data layers to generate estimated DPCI surfaces from a re−projected set of point values [33]. The Kriging model is based on a statistical technique that includes autocorrelation; that is, the statistical relationships among the measured points. Potentially, geostatistical techniques not only have the capability to produce a prediction surface but also provide some measure of the certainty or accuracy of the predictions. Kriging tools weight the surrounding measured values to derive a prediction for each DPCI unmeasured location. There are variations of the techniques, such as the Ordinary Cokirging and those that consider topographical information, that can further improve the performance ( [34,35]). The general formula for both interpolators is formed as a weighted sum of the data: (10) where Z (Si) is regarded as the measured DPCI values at the ith location, λi shows an unknown weight for the measured value at the ith rainfall station location, S0 specifies the prediction location and N indicates the number of stations. With the Kriging method, the weights are based not only on the distance between the measured points and the prediction location but also on the overall spatial arrangement of the measured points. To use the spatial arrangement in the weights, the spatial autocorrelation must be quantified. Thus, in ordinary Kriging, the weight, λi, depends on a fitted model to the measured points, the distance to the prediction location and the spatial relationships among the measured DPCI values around the prediction location. In the current study, an Ordinary Kriging formula is used to create maps of the prediction DPCI and ʺbʺ constant surfaces and associated accuracy models. Ordinary Kriging assumes the second−order trend removal model with no transformation type: In the above equation, μ is an unknown constant whereas one of the main issues concerning ordinary Kriging is whether the assumption of a constant mean is reasonable.
Sometimes there are good scientific reasons to reject this assumption. However, in this study, it was found that applying a second−order trend removal following an exponential Kernal Function (as a simple prediction method) gives remarkable flexibility in final interpolation method accuracy. Once again, the Kriging method was also applied to illustrate the variation and spatial distribution of the constant ʺbʺ values in the study area. This arbitrary way allows direct interstation comparison of the distribution of ʺbʺ value at each rainfall station across all districts.
To calculate the Pearson Overall Correlation Coefficient, a band collection statistic tool was furthermore computed among the DPCI and one of the other rainfall related parameters [36]. These parameters include the mean annual precipitation (AP), coefficient of variation (CV) of rainfall, the total number of rainfall days (TN), maximum rainfall observed (MxR) and the ʺaʺ and ʺbʺ constants taken from Equation (1). This tool was applied to provide statistics for the bivariate analysis of a set of raster bands by computing covariance and correlation for every event. The following equation was accordingly used to determine the covariance between layers i and j.
In the above equation, Z indicates for example DPCIs observed of a cell, i, j are layers of a stack, u is the mean of cells and N is the number of cells. The overall correlation between the rainfall datasets was then computed as: where the σ's are standard deviations. As usual, the correlation coefficient is between −1 and 1.

Spatial Distribution of DPCI
By calculating the annual DPCI values it was found that they range greatly, between 0.54 and 0.63, and are spread across the study area represented by the 41 rainfall stations. This range is consistent with the global results in [2] that showed high values (>0.5) of the Gini Index (which has the same concept of the DPCI but without the assumed mathematical form of the Lorenz curve as in Equation (1), thus the Gini index is highly correlated with the DPCI) over eastern Australia. Table 1 indicates the DPCI values and the rainfall percentage contributed by 90% of the rainiest days for the 41 weather stations across GSMA from 1950 to 2015. Also values of the constants ʺaʺ and "b" (as the exponential curves are given by Equation (1)) and observed maximum daily rainfall are represented in the table. DPCI values present strongly different daily precipitation regimes, as Woonona station (0.63) is located in the southeast of the study area and precipitation there has a higher concentration and is more irregular than in Wombeyan station (0.54) which is located on the Tableland somewhere in the outlying southwest of the study area. The concentration can be considered a function of the relative separation of the equidistribution line, which is greater in Albion Park (with the highest maximum daily rainfall observed) than others (Figure 3).
Applying the Global Moranʹs I statistic it is possible to test an existing spatial autocorrelation based on rainfall station locations and DPCI values. The Spatial Autocorrelation tool returns five values: the Moranʹs I index, expected index, variance, z−score and p−value. Given the z−score of 3.55, there is a less than 1% likelihood that this clustered pattern could be the result of random chance, expressing the fact that there are spatially significant clusters of DPCI values among the existing dataset based on the spatial autocorrelation report.  To find more about the nature of the spatial distribution of intense rains inside of the GSMA, the geographical distribution of the ʺbʺ constant, which from Equation (1) is the parameter to control the shape of the rainfall concentration curve and thus carries important information on the rainfall distribution, has been converted to classes of intensity level of rainfall occurrence ( Figure 5). Very high intense amounts can be seen near the topography southeast of the study area, just over the Illawarra Escarpment. Besides, in some parts of the Sydney Metropolitan district, for example in the west of the City, and areas located in the northwestern corner of the Parramatta River, very intense ʺbʺ values can be observed. In comparison with the lowlands of the GSMA, over the Blue Mountains (Katoomba station), intense rain events are also relatively high. In contrast, non−intense classes of ʺbʺ values can be seen over the inland parts of the GSMA.
For comparison, the geographic position of the flash flood events (observed during 1989−2015 with a thundery−rain more than 50 mm) is overlaid on the distribution of the ʺbʺ constant map. In the GSMA, flash flood events are mostly induced by several weather systems, such as the local thunderstorms and east coast lows over the ocean. It can be seen that most of these flash flood events occurred in the areas with the high "b" values, which determine the shape of the concentration curve. The spatial pattern of "b" also highly resembles that of the severe thunderstorms, especially those with hail occurrence [26]. These facts indicate that the storm activity in the GSMA largely determine the CI pattern on the daily timescale.

Spatial Correspondences
A cross−covariance model was presented to assess the spatial dependence (covariance) between two vector-based datasets. Here the first dataset is the DPCI, while the second is one of the important rainfall-related parameters such as mean annual rainfalls (AP), coefficient of variation (CV), the total number of rainfall days (TN), maximum rainfalls observed (MxR) and the ʺaʺ and ʺbʺ constants taken from Equation (1) that control the concentration curve. In the analysis, the attribute of one point (i.e., the DPCI) is correlated with the second attribute (i.e., one of the rain−related parameters) at another point, and this is repeated for all pairs of geographic points. The spatial distribution of the correlation (termed cross-covariance surface or cloud) can then be applied to examine the local characteristics of spatial correlation between the two attributes (datasets). The details of this cross-covariance model has been documented in Appendix C. This technique was applied to look for spatial shifts in existing correspondences between the DPCI and the other datasets throughout the GSMA.
A covariance surface with directional search capabilities was also involved in the modeling. For this reason, the values in the cross−covariance cloud were put into six bins based on the direction and distance separating a pair of locations [37]. These binned values were then averaged and smoothed to produce a cross−covariance surface (and associated correlations) for each pair of dataset throughout the study area ( Figure 6). It can be seen that the DPCI possesses regions of high covariance with most of the rain parameters and also the "a" and "b" constants in the concentration curve, however, there is variability in the locations with the highest covariance. For example, the DPCI has high covariance with the climatological parameters AP and TN over the northwest. The highest covariance with the 'magnitude' of the concentration curve ("a") is also on the western side. This may be due to the topographic variation. However, the pattern of covariance with the two parameters directly related to the DPCI, namely the MxR and "b", has a southwest−northeast orientation. This is aligned with the distribution of the DPCI in Figure 4. In Table 2, the values of Pearson's correlation coefficient for the five pairs of variables are indicated. The Pearson overall correlations (r) for all rainfall related parameters, except the total number of rainy days (TN), are statistically significant at 0.95 and 0.99 levels, respectively. Correlation between TN and annual DPCI is nearly +0.24 (p < 0.5) and not significant; in other words high number of rain days is not a good indicator of high DPCI. The reason is that similar annual values could be achieved with different daily distributions.

Summary
In the current study, daily rainfall observations (1950-2015) from 41 rainfall stations inside the GSMA have been analyzed. According to the applied criteria and techniques used, the outcomes are summarized as follow:  Within the GSMA, the essential features of climate in different districts are characterized by narrow rainfall zones close to the coast, under the combined influence of the Tasman Sea and the topography and land use patterns, leading to very different rainfall spatial distributions.  The DPCI values in the Illawarra coastal elevated areas, parts of the Sydney Metropolitan area and the Blue Mountains are high, with concentration index values close to 0.60-0.63. This reflects the fact that very few rainy days could bring a high percentage of annual precipitation.  The DPCI values obtained and distribution pattern of constant ʺbʺ are largely subject to influences from the topography and land use of the region. Generally, western and central regions inside GSMA are areas where rainfall is regular compared to eastern regions, while the southeastern districts and small parts of Metropolitan areas show the most aggressive DPCI values.  Despite the significant variations in spatial cross−correlating models between the DPCI and 6 other rain−related parameters (AP, CV, TN, MxR, "a" and "b"), there are considerable positive relationships among data layers at 0.95 significance levels for most parts of the study area.  The spatial patterns of the DPCI and "b" constant highlight the importance of catastrophic effects of such intense rainfall events, predominantly originating with severe thunderstorm and flash flood events.

Discussion
Inside of the Sydney Metropolitan area, daily precipitation is one of the factors in the processes of creating flash floods, and accordingly, differences in the spatial distribution of precipitation can lead to dissimilar precipitation regimes and various climatic conditions [38]. As was indicated in Table A1 (Appendix A), even if the annual total amounts are similar in many of the rainfall stations, precipitation processes may be different due to a different degree of concentrated rainfall in the time and space of the study area. Accordingly, the spatial distribution of precipitation can produce noticeably different impacts on natural and social processes across the GSMA-of particular interest for water management-flood control programs, and water availability for natural ecosystems. As the results show, the daily concentration of precipitation on an annual scale (expressed by the DPCI values in Table 1) is characterized by two different spatial gradients. One lies from the east to the west and the second is detectable from south to north, the latter characterized by the Tasman Sea coastal areas.
Overall, the spatial distribution of DPCIs follows a gradient between inland and the coastal areas, which may indicate approaching intense rainfall from different geographic directions. The results in this study have indicated that most parts inside GSMA are subject to severe rain, but with different likelihood of high DPCI (Figure 4). For example, the gigantic water resources of the Tasman Sea may influence the distribution of intense rainfall. On the other hand, a large proportion of rainfall comes from severe thunderstorms that occur over the northeast GSMA, the CBD and over the inner metropolitan area [26]. Also, the increased roughness associated with variation in topography and heat island phenomena may affect the spatial distribution of concentrated precipitation [39].
However, the pronounced dissimilar DPCI values and the subsequent cross−covariance surfaces (Figures 4 and 6) support the overall picture of multi−subjected developing areas and approaching weather systems from various directions in the region, which are under dissimilar synoptic patterns causing atmospheric instability [40,41]. It was found by previous studies that at least four types of weather patterns account for most of the rainfall in the region [42,43], and logically the amount, frequency, and intensity of precipitation events vary substantially in the region, as shown in the records during a long period from 1950-2015 [44,45]. Another weather pattern occurs in summer and involves the location of the Tropical Convergence Zone bringing torrential rainfalls [46]. Occasionally, weather systems from the southeast generate storms striking the region with torrential precipitation. During the warm months (October to March) the prevailing easterly moist winds provide much of the moisture needed in the intensification of widespread and severe thunderstorm activity in the region [26]. Given the short duration of typical thunderstorm activity in terms of hours, likely it would contribute substantially to the DPCI.
Not all variations in the total precipitation and associated differences in the DPCI can be explained simply in terms of differences between dissimilar weather systems and the nature of the prevailing air masses [47]. The geographical distribution of DPCI and the ʺbʺ values illustrate that the coastal areas are subject to a high probability of intense rainfall ( Figure 5). In the southwest extension of the coastal area, over the Illawarra Escarpment, topography has clear influences on the rainfall amounts. The high ʺbʺ distribution in the vicinity of elevated topography of the Illawarra Escarpment suggests an orographic enhancement of instability, particularly for sites facing the east (as indicated by Figure 4). Similarly, in the highland area west of Sydney, there appears at least two different patterns of intense rainfall events. The Blue Mountain ranges, located at the northwest of the study area, have some of the highest DPCI values, particularly in the summer months. Thus, the issue arises whether the limited number of rain stations, especially over the high mountains, can capture such topographic effect to extreme precipitation adequately. One way to improve is to extend the data sources representing rainfall distribution, which may include a radar−based estimate and gridded reanalysis dataset, the latter able to reduce the uncertainty during the spatial interpolation process. The other method is to incorporate theoretical topographic rainfall models (e.g., [48,49]) to improve the representation of extreme precipitation over high elevations.
Internationally, the values of CIs found across Europe are similar to those described in Iran by [6] and are lower than those offered by [7] in China. It has been proposed by [7] as a general explanation for differences between results from [27] in the Iberian Peninsula and China, that different climate systems and precipitation mechanisms were responsible for rainfall (such as a typhoon). Generally, it has been suggested that precipitation trends based on annual maximum daily events observed in most parts of the world have nearly the same signs. However, the trend of heavy precipitation is disproportionately larger than the trend of the total [50]. Some of the previous investigations and the more recent work of [5] demonstrated the prominence and precision of CI applications in different parts of the world. It was suggested that even without any change in total precipitation, there may be changes in the frequency of intense daily precipitation in a climate change context; a fact that would have led to meaningful variations in the precipitation concentration patterns [51][52][53].

Appendix A
The tables in this appendix document basic information of the rain stations and rainfall statistics in this study (Table A1), and parameters for computing the DPCI using Albion Park station as an example (Table A2). Table A1. The geographic coordinates, study period, average annual rainfall (AP), coefficient of variation (CV) and total number of rainy days (TN) for the 41 rain stations across the GSMA. Appendix B Figure A1 in this appendix illustrates the physical interpretation of the Global Moran's I Statistic. Depending on the z-score value, the rainfall distribution changes from a dispersed pattern (negative extreme), random pattern (most of the z-score around the mean) to a clustered pattern (positive extreme). The values of the Moran's Index (−0.001755), z-score (3.553463) and the p-value (0.000380) based on the dataset in this study are given in the upper left corner of the figure. Figure A1. Illustration of the Global Moran's statistic and association with the dispersed, random and clustered precipitation patterns.

Appendix C
In this appendix the details in the procedure of performing spatial cross-covariance analysis between two attributes (datasets) are documented. An example of the DPCI (first attribute) and the constant "b" (second attribute) is illustrated in Figure A2. There are several steps in the analysis:  The empirical cross-covariance for a pair of locations (NSW rainfall stations) between two datasets (DPCI and "b") is first plotted as a function of the distance between the two locations ( Figure A2 upper panel). In this illustration, each red dot shows the empirical cross-covariance between the pair of stations, with the attribute of one station taken from the first dataset and the attribute of the second station taken from the second dataset. The Cross-covariance cloud can be used to examine the local characteristics of spatial correlation between two datasets, and it can be used to look for spatial shifts in the correlation between two datasets. A cross-covariance cloud looks something like the NSW example.


The values in the cross-covariance cloud are put into bins based on the direction and distance separating a pair of locations. These binned values are then averaged and smoothed to produce a cross-covariance surface. The legend (Figure A2 lower panel  left) shows the colors and values separating classes of covariance values.  A covariance surface with search direction capabilities is also provided in the ArcGIS tool. The extent of the cross-covariance surface is controlled by the lag size and number of lags that are specified ( Figure A2 lower panel right). The search direction and width are indicated by the blue and red lines over the cross-covariance surface. One example has been shown in the figure and these options can be modified. Figure A2. Cross-covariance surface or cloud between the DPCI and the constant "b" in the concentration curve. The upper panel has all the covariance values according to the distance between the two points for computing the covariance. The lower panel has the cross-covariance map and an example of the directional search arrow. The search arrow is set based on the parameters on the right hand side (such as the direction, lag size and number of lags).