Spatial and Seasonal Variations and InterRelationship in Fitted Model Parameters for Rainfall Totals across Australia at Various Timescales

Probabilistic models are useful tools in understanding rainfall characteristics, generating synthetic data and predicting future events. This study describes the results from an analysis on comparing the probabilistic nature of daily, monthly and seasonal rainfall totals using data from 1327 rainfall stations across Australia. The main objective of this research is to develop a relationship between parameters obtained from models fitted to daily, monthly and seasonal rainfall totals. The study also examined the possibility of estimating the parameters for daily data using fitted parameters to monthly rainfall. Three distributions within the Exponential Dispersion Model (EDM) family (Normal, Gamma and Poisson-Gamma) were found to be optimal for modelling the daily, monthly and seasonal rainfall total. Within the EDM family, Poisson-Gamma distributions were found optimal in most cases, whereas the normal distribution was rarely optimal except for the stations from the wet region. Results showed large differences between regional and seasonal φ-index values (dispersion parameter), indicating the necessity of fitting separate models for each season. However, strong correlations were found between the parameters of combined data and those derived from individual seasons (0.70–0.81). This indicates the possibility of estimating parameters of individual season from the parameters of combined data. Such relationship has also been noticed for the parameters obtained through monthly and daily models. Findings of this research could be useful in understanding the probabilistic features of daily, monthly and seasonal rainfall and generating daily rainfall from monthly data for rainfall stations elsewhere.


Introduction
Probabilistic models have extensive applications in understanding rainfall characteristics, generating synthetic data and predicting future events [1,2].Model based prediction has been used in ecology, hydrology, water resources management and agricultural planning [3][4][5][6].Synthetic data obtained through models are useful when observed rainfall record is inadequate in length, completeness, or spatial coverage [7][8][9].
Determining theoretical probability distributions for modelling rainfall at various timescales has gained interest in contemporary literature.For example, either Markov chain [10,11] or logistic regression models [12] have been used to model occurrence of daily rainfall.Positively skewed daily Variations in average, dispersion and extreme rainfall events have been observed between the case study stations.For example, mean daily rainfall range from 0.80 mm for Woodgreen to 4.41mm for Mount Olive.Spread in seasonal rainfall totals, as measured by co-efficient of variation, varied from 48.65% to 114.20% between the stations (Table 1).Extreme monthly rainfall totals as measured by 95th percentiles vary from 96.0 mm for dry station (Woodgreen) to 545.9 mm for wet tropical summer dominated station (Mount Olive).All statistical analyses were conducted using the open access statistical software, R [29].This software is freely available without any licence agreement and the software can be downloaded through the R-project web portal (http://www.R-project.org).The tweedie package [30] within the R environment was used to fit statistical models.

Methods
The study investigated a subfamily of exponential dispersion model (EDM) family of distributions to model daily, monthly and seasonal rainfall totals of Australia.The variable y (e.g., Variations in average, dispersion and extreme rainfall events have been observed between the case study stations.For example, mean daily rainfall range from 0.80 mm for Woodgreen to 4.41mm for Mount Olive.Spread in seasonal rainfall totals, as measured by co-efficient of variation, varied from 48.65% to 114.20% between the stations (Table 1).Extreme monthly rainfall totals as measured by 95th percentiles vary from 96.0 mm for dry station (Woodgreen) to 545.9 mm for wet tropical summer dominated station (Mount Olive).All statistical analyses were conducted using the open access statistical software, R [29].This software is freely available without any licence agreement and the software can be downloaded through the R-project web portal (http://www.R-project.org).The tweedie package [30] within the R environment was used to fit statistical models.

Methods
The study investigated a subfamily of exponential dispersion model (EDM) family of distributions to model daily, monthly and seasonal rainfall totals of Australia.The variable y (e.g., daily, monthly or seasonal rainfall totals) following the EDM family of distributions has the probability function: where θ is the canonical parameter and φ the dispersion parameter, a and k are suitable functions of θ and φ to link the variable y to an exponential probability function.The parameter θ can be positive or negative while φ is always positive [31].Tweedie distributions are those EDMs for which variances are proportional to some power (p, also called index parameter) of the mean and have been used for modelling rainfall totals of Australia, Malaysia and India [7,32,33].The Tweedie distribution with mean µ, dispersion parameter φ and index parameter p is dehnoted as Tw p (µ, φ); p / ∈ (0, 1).Important properties of the Tweedie distributions are documented in scientific literature [34,35].The normal (p = 0), Poisson (p = 1, φ = 1), gamma (p = 2) and inverse Gaussian (p = 3) distributions are special cases of Tweedie distributions.For (p ≥ 2), the distributions are suitable for modelling positive, right-skewed data.The distributions for which 1 < p < 2, are called the Poisson-Gamma (or P-G) distributions [36] and are capable of modelling positive rainfall data that include zero values (days with no rainfall).Dunn and Smyth [36] have shown theoretically that sometimes the maximum likelihood estimate of p is found on the boundary of the parameter space so that → p → 1 .One interpretation is that the normal distribution (p = 0) may be optimal within the Tweedie family (as the Tweedie distribution is not defined for 0 < p < 1.No application of negative p values has been proposed in modelling rainfall.Except for four special cases, the Tweedie probability function cannot be written in closed form, and hence, maximum likelihood estimates of the parameters cannot be obtained directly.Therefore, the tweedie.profilefunction of R package tweedie initially considers a set of values for the index parameter, and computes the log-likelihood.The maximum likelihood value of the parameter is obtained from the plot of the log-likelihood against index parameter. The models were fitted with the first half of the available data series and the rest were kept for validation purpose.Stochastic datasets were generated using the distributions and parameters obtained from the fitted models.The statistics, 5th, 25th, 95th, 99th percentiles and probability of no rainfall of observed (validated dataset) and generated rainfall amounts were compared.

Results
For the six case study stations, estimated p-indices, dispersion parameters (φ) and optimal probability distributions within the Tweedie family are presented in Table 2. Within the family, P-G distributions were found to be optimal for modelling daily rainfall totals for all case study stations.The monthly rainfall totals for Pemberton and seasonal rainfall totals for Springwood, Mount Olive, Maryborough and Pemberton, the p-indices were greater than but close to two, and hence, Gamma distributions were considered as near-optimal within the Tweedie family.It is notable that, even for strictly positive rainfall totals, P-G distribution may be optimal within the Tweedie family.In these situations, the studied data series may not include dry event; however, the model allows the possibility of getting future dry events.Based on 1327 stations studied, within the Tweedie family, P-G distributions were found optimal for modelling the daily rainfall at any location.For monthly rainfall totals, P-G distributions were found optimal for 97.8% of stations and Gamma distributions were found near-optimal for the remaining gauges.For seasonal rainfall totals, the P-G, Gamma and Normal distributions were found optimal for 71.4%, 27.0% and 1.6% of stations respectively.For the cases where P-G distributions were optimal, the median of p-indices for daily, monthly and seasonal rainfall totals were 1.53, 1.58 and 1.71 respectively.Relatively smaller and consistent p-indices were noticed for models fitted to the daily rainfall data compared to those for monthly and seasonal timescales.The median dispersion parameters for the models fitted to the daily, monthly and seasonal rainfall were 10.95, 3.85 and 1.62 respectively (Figure 2).Relatively smaller φ-indices were found for cases where gamma distributions were optimal.
Climate 2019, 7, x FOR PEER REVIEW 5 of 10 Based on 1327 stations studied, within the Tweedie family, P-G distributions were found optimal for modelling the daily rainfall at any location.For monthly rainfall totals, P-G distributions were found optimal for 97.8% of stations and Gamma distributions were found near-optimal for the remaining gauges.For seasonal rainfall totals, the P-G, Gamma and Normal distributions were found optimal for 71.4%, 27.0% and 1.6% of stations respectively.For the cases where P-G distributions were optimal, the median of p-indices for daily, monthly and seasonal rainfall totals were 1.53, 1.58 and 1.71 respectively.Relatively smaller and consistent p-indices were noticed for models fitted to the daily rainfall data compared to those for monthly and seasonal timescales.The median dispersion parameters for the models fitted to the daily, monthly and seasonal rainfall were 10.95, 3.85 and 1.62 respectively (Figure 2).Relatively smaller ϕ-indices were found for cases where gamma distributions were optimal.

Comparing the Fit of the Model Using Tweedie Distributions
Once the optimal distribution within the Tweedie family and the parameters are obtained, the performance of the models in generating extreme events have been examined.For the purpose, statistics representing extreme rainfall event for validated and simulated data have been compared.Considering large proportion of dry days, the 95th and 99th percentiles of daily rainfall totals have been compared.Whereas, for monthly rainfall totals, 25th and 95th percentiles, and for seasonal rainfall totals 5th and 95th percentiles have been compared.The probabilities of no rainfall have been compared for all timescales.For validating, at each timescale and station, 1000 samples were generated using respective distributions and fitted parameters.The abovementioned statistics were estimated for each of the samples.Thus, for a specific timescale and station, 1000 values of each statistic were obtained.Medians of the statistics from all samples were then computed and compared with the statistics of validation data of respective timescale and station (Figure 3).

Comparing the Fit of the Model Using Tweedie Distributions
Once the optimal distribution within the Tweedie family and the parameters are obtained, the performance of the models in generating extreme events have been examined.For the purpose, statistics representing extreme rainfall event for validated and simulated data have been compared.Considering large proportion of dry days, the 95th and 99th percentiles of daily rainfall totals have been compared.Whereas, for monthly rainfall totals, 25th and 95th percentiles, and for seasonal rainfall totals 5th and 95th percentiles have been compared.The probabilities of no rainfall have been compared for all timescales.For validating, at each timescale and station, 1000 samples were generated using respective distributions and fitted parameters.The abovementioned statistics were estimated for each of the samples.Thus, for a specific timescale and station, 1000 values of each statistic were obtained.Medians of the statistics from all samples were then computed and compared with the statistics of validation data of respective timescale and station (Figure 3).The models slightly overestimate 95th percentiles and underestimate 99th percentiles of daily and 95th percentiles of monthly rainfall totals.The models generate data reasonably well when considering the other percentiles considered in the analysis.While considering the probability of no rainfall, the models generate data reasonably well for all timescales.The extreme rainfall amounts, for example the 99th percentiles of daily and 95th percentiles monthly rainfall have a long tail and so usual distributions cannot capture well the extreme events.Extreme value distributions, such as, the Weibull, the generalised Pareto and the three-parameter log-normal, may be useful to fit such data.However, these distributions do not capture the lower parts of the data, that is, the major portion of the dataset.The mixture of models may perform better to capture both extremes of datasets; however, the approach requires extra parameters which may lead to a higher degree of parameter uncertainty in the modelling.The three parameter Tweedie models make a balance in the sense that, they capture well the probability of no rain and lower parts of the datasets and reasonably well the extremely high events.

Spatial and Seasonal Variations of Parameters
The parameters of fitted models were compared for regions with diverse rainfall climate as defined by the BoM.From the arid zone (middle and parts of west coast of the country), 104 stations were considered.Next, 323 stations were studied from parts of mid-east coast that represent climate with wet summer and low winter rainfall.The north regions, with 76 studied stations, represent rainfall climate with wet summer and dry winter.From the regions with fairly uniform seasonal rainfall totals, 225 stations were studied.The largest sample of rainfall stations (504) were in regions The models slightly overestimate 95th percentiles and underestimate 99th percentiles of daily and 95th percentiles of monthly rainfall totals.The models generate data reasonably well when considering the other percentiles considered in the analysis.While considering the probability of no rainfall, the models generate data reasonably well for all timescales.The extreme rainfall amounts, for example the 99th percentiles of daily and 95th percentiles monthly rainfall have a long tail and so usual distributions cannot capture well the extreme events.Extreme value distributions, such as, the Weibull, the generalised Pareto and the three-parameter log-normal, may be useful to fit such data.However, these distributions do not capture the lower parts of the data, that is, the major portion of the dataset.The mixture of models may perform better to capture both extremes of datasets; however, the approach requires extra parameters which may lead to a higher degree of parameter uncertainty in the modelling.The three parameter Tweedie models make a balance in the sense that, they capture well the probability of no rain and lower parts of the datasets and reasonably well the extremely high events.

Spatial and Seasonal Variations of Parameters
The parameters of fitted models were compared for regions with diverse rainfall climate as defined by the BoM.From the arid zone (middle and parts of west coast of the country), 104 stations were considered.Next, 323 stations were studied from parts of mid-east coast that represent climate with wet summer and low winter rainfall.The north regions, with 76 studied stations, represent rainfall climate with wet summer and dry winter.From the regions with fairly uniform seasonal rainfall totals, 225 stations were studied.The largest sample of rainfall stations (504) were in regions with wet winter and low summer rainfall climate.Finally, 95 stations were considered from regions with wet winter and dry summer.
As shown in the top-left panel of Figure 4, seasonal variations in the mean rainfall amounts were evident across the studied regions.For the models fitted to daily rainfall, consistent p-indices have been observed across the regions and over the seasons (middle and right -top panels).Standard deviations of the dispersion parameters are presented in the bottom panels of Figure 4. Compared to daily or seasonal rainfall, relatively lower variation in the dispersion parameter were observed for monthly timescale.The largest and most inter-seasonal variations in the standard deviation of dispersion parameters was observed for the stations located in the dry and summer dominated northern regions.Highest dispersion parameters were observed for winter rainfall, especially in the arid and summer dominated regions.Significant seasonal variations in estimated parameters demands fitting separate models to individual seasons.
Climate 2019, 7, x FOR PEER REVIEW 7 of 10 with wet winter and low summer rainfall climate.Finally, 95 stations were considered from regions with wet winter and dry summer.
As shown in the top-left panel of Figure 4, seasonal variations in the mean rainfall amounts were evident across the studied regions.For the models fitted to daily rainfall, consistent p-indices have been observed across the regions and over the seasons (middle and right -top panels).Standard deviations of the dispersion parameters are presented in the bottom panels of Figure 4. Compared to daily or seasonal rainfall, relatively lower variation in the dispersion parameter were observed for monthly timescale.The largest and most inter-seasonal variations in the standard deviation of dispersion parameters was observed for the stations located in the dry and summer dominated northern regions.Highest dispersion parameters were observed for winter rainfall, especially in the arid and summer dominated regions.Significant seasonal variations in estimated parameters demands fitting separate models to individual seasons.

Estimating Parameters for Seasonal Models Using the Combined Model and Daily Timescales Form Monthly Timescale
This study aimed at exploring the possibility of generating rainfall data for individual seasons using estimated parameters based on annual total rainfall.For this purpose, the correlation coefficient (r) between the parameters from the overall datasets and those from individual seasons were estimated.Relatively weak relationship (r = 0.39) between ϕ-indices obtained from the fit of overall and winter daily models was observed.The reason may be the higher variations in the estimated dispersion parameters for the season.Strong relationships among the parameters (r = 0.70-0.91)for other seasons justify the possibility of predicting the parameters to individual seasons from those obtained from combined data.Table 3 represents results from the fitted linear models.The regression parameters are statistically significant for all cases, hence, the parameters for individual seasons can be estimated from the parameters of combined dataset.

Estimating Parameters for Seasonal Models Using the Combined Model and Daily Timescales Form Monthly Timescale
This study aimed at exploring the possibility of generating rainfall data for individual seasons using estimated parameters based on annual total rainfall.For this purpose, the correlation coefficient (r) between the parameters from the overall datasets and those from individual seasons were estimated.Relatively weak relationship (r = 0.39) between φ-indices obtained from the fit of overall and winter daily models was observed.The reason may be the higher variations in the estimated dispersion parameters for the season.Strong relationships among the parameters (r = 0.70-0.91)for other seasons justify the possibility of predicting the parameters to individual seasons from those obtained from combined data.Table 3 represents results from the fitted linear models.The regression parameters are statistically significant for all cases, hence, the parameters for individual seasons can be estimated from the parameters of combined dataset.Scatterplots in Figure 5 represent the relationships between parameters obtained from daily and monthly timescales.Strong correlation between daily and monthly p-indices have been observed (left panel).Regression equations have been fitted to predict parameters for daily data using the parameters for monthly parameters.Statistically significant positive slope parameters have been observed for both models.The results confirm that parameters for daily timescale can be predicted using the parameters obtained through models fitted to monthly data.Scatterplots in Figure 5 represent the relationships between parameters obtained from daily and monthly timescales.Strong correlation between daily and monthly p-indices have been observed (left panel).Regression equations have been fitted to predict parameters for daily data using the parameters for monthly parameters.Statistically significant positive slope parameters have been observed for both models.The results confirm that parameters for daily timescale can be predicted using the parameters obtained through models fitted to monthly data.Simulated data obtained with the parameters of fitted models have similar characteristics (probability of no rainfall, extreme rainfall events) of observed data.However, the models slightly overestimate the 95th percentiles of daily rainfall and underestimate the 99th percentiles of daily and 95th percentiles of monthly rainfall totals.The results indicate that, the heavy right tail of the observed data with extreme events may not be covered well by the proposed model.Larger variations in rainfall amounts over the seasons and across the regions were obvious for the studied stations.However, the mean and standard deviations of p-indices were consistent in all cases.Standard deviations of ϕ-indices were more consistent for the monthly models than daily or seasonal model.Higher values in the dispersion parameter of daily data may be due to higher variation in the datasets, whereas, the variations in the seasonal models may be because of smaller sample size.Higher variations in the parameter were obtained for the stations from arid and summer dominated rainfall regions, and for winter seasons.Strong correlation, and significant slope parameters between the combined dispersion parameter data and those from individual seasons indicate the possibility of estimating the parameters for individual seasons from those estimated from combined data.The Simulated data obtained with the parameters of fitted models have similar characteristics (probability of no rainfall, extreme rainfall events) of observed data.However, the models slightly overestimate the 95th percentiles of daily rainfall and underestimate the 99th percentiles of daily and 95th percentiles of monthly rainfall totals.The results indicate that, the heavy right tail of the observed data with extreme events may not be covered well by the proposed model.Larger variations in rainfall amounts over the seasons and across the regions were obvious for the studied stations.However, the mean and standard deviations of p-indices were consistent in all cases.Standard deviations of φ-indices were more consistent for the monthly models than daily or seasonal model.Higher values in the dispersion parameter of daily data may be due to higher variation in the datasets, whereas, the variations in the seasonal models may be because of smaller sample size.Higher variations in the parameter were obtained for the stations from arid and summer dominated rainfall regions, and for winter seasons.Strong correlation, and significant slope parameters between the combined dispersion parameter data and those from individual seasons indicate the possibility of estimating the parameters for individual seasons from those estimated from combined data.The results also justify the possibility of estimating parameter for daily data using fitted parameters of monthly data.

Figure 1 .
Figure 1.Map of Australia showing rainfall zones, locations of the studied stations (grey dots).The case study stations are named and represented by black squares.

Figure 1 .
Figure 1.Map of Australia showing rainfall zones, locations of the studied stations (grey dots).The case study stations are named and represented by black squares.

Figure 2 .
Figure 2. Boxplots representing Q1 (first quartile, bottom edge of the box), median (thick horizontal line in the middle), Q3 (third quartile, top edge of the box) and outliers (circles) for: (a) index parameter (p) and (b) dispersion parameter (ϕ) for the 1327 stations at various timescales.

Figure 2 .
Figure 2. Boxplots representing Q1 (first quartile, bottom edge of the box), median (thick horizontal line in the middle), Q3 (third quartile, top edge of the box) and outliers (circles) for: (a) index parameter (p) and (b) dispersion parameter (φ) for the 1327 stations at various timescales.

Climate 2019, 7 , 10 Figure 3 .
Figure 3. Scatterplots of simulated and observed (validated) rainfall for 95th, 99th percentiles and probability of no rain for daily data (plots a, b and c respectively), 25th, 95th percentiles and probability of no rain for monthly data (plots d, e and f respectively) and 5th, 99th percentiles and probability of no rain for seasonal data (plots g, h and i respectively).

Figure 3 .
Figure 3. Scatterplots of simulated and observed (validated) rainfall for 95th, 99th percentiles and probability of no rain for daily data (plots a, b and c respectively), 25th, 95th percentiles and probability of no rain for monthly data (plots d, e and f respectively) and 5th, 99th percentiles and probability of no rain for seasonal data (plots g, h and i respectively).

Figure 4 .
Figure 4. Plots representing mean daily rainfall, mean and SD of p-index for daily model (plots a, b and c respectively) and SD of ϕ-index for daily, monthly and seasonal data (plots d, e and f respectively) for all studied timescales over the seasons and across regions of Australia.

Figure 4 .
Figure 4. Plots representing mean daily rainfall, mean and SD of p-index for daily model (plots a, b and c respectively) and SD of φ-index for daily, monthly and seasonal data (plots d, e and f respectively) for all studied timescales over the seasons and across regions of Australia.

Figure 5 .
Figure 5. Scatterplots representing the relationships between monthly and daily p-indices (a) and ϕindex (b) for the 1327 gauging stations.

Figure 5 .
Figure 5. Scatterplots representing the relationships between monthly and daily p-indices (a) and φ-index (b) for the 1327 gauging stations.

Table 1 .
Statistics of rainfall totals at various timescales for six case study stations.
* coefficient of variation.

Table 1 .
Statistics of rainfall totals at various timescales for six case study stations.

Table 2 .
Index parameter (p) values, optimal distribution function within the Tweedie family and dispersion parameters (φ) values of daily, monthly and seasonal rainfall data for the six case study stations.

Table 2 .
Index parameter (p) values, optimal distribution function within the Tweedie family and dispersion parameters (ϕ) values of daily, monthly and seasonal rainfall data for the six case study stations.

Table 3 .
Correlation and regression coefficients of fitted models of seasonal parameters on the parameters for combined data.Separate results have been presented for pand φ-indices.

Table 3 .
Correlation and regression coefficients of fitted models of seasonal parameters on the parameters for combined data.Separate results have been presented for p-and ϕ-indices.