Synthetic Rainfall Modeling Using a Modified Hybrid Gamma-GP Distribution

Jin, Hyang Gon; Hong, Seunghyun; Kim, Yongku

doi:10.3390/app15179563

Open AccessArticle

Synthetic Rainfall Modeling Using a Modified Hybrid Gamma-GP Distribution

by

Hyang Gon Jin

¹,

Seunghyun Hong

¹

and

Yongku Kim

^1,2,*

¹

Department of Statistics, Kyungpook National University, Daegu 41566, Republic of Korea

²

KNU G-LAMP Research Center, Institute of Basic Sciences, Kyungpook National University, Daegu 41566, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9563; https://doi.org/10.3390/app15179563

Submission received: 27 July 2025 / Revised: 29 August 2025 / Accepted: 29 August 2025 / Published: 30 August 2025

Download

Browse Figures

Versions Notes

Abstract

Stochastic weather generators are commonly employed to create synthetic sequences of daily weather variables across diverse fields, including hydrological, ecological, and agricultural studies. Realistic precipitation sequences, in particular, serve as essential inputs in numerous modeling frameworks. Generalized linear models (GLMs) that incorporate covariates to capture seasonality and teleconnections represent one effective approach for stochastic weather generation. However, these models often underestimate the interannual variability of seasonally aggregated variables, notably precipitation intensity during wet seasons. Recent methods developed to mitigate the issue of overdispersion have nevertheless struggled to adequately replicate observed precipitation intensities in wet seasons. To overcome this limitation, we propose integrating a modified hybrid gamma and generalized Pareto distribution into the GLM-based weather generator. This enhanced method was evaluated using daily precipitation data from Seoul, Korea, and successfully reproduced realistic precipitation intensities while effectively addressing the overdispersion issue.

Keywords:

generalized linear model; modified hybrid gamma with generalized Pareto distribution; overdispersion; stochastic precipitation generator

1. Introduction

Stochastic weather generators are statistical tools used to simulate synthetic sequences of precipitation data that retain the essential statistical and temporal characteristics of observed rainfall. These models are particularly valuable in hydrology, ecology, climate impact assessment, and water resource management, where long-term precipitation data are required but often unavailable [1]. For instance, synthetic daily weather sequences have been used as model inputs to investigate the impacts of climate variability on crop yields [2]. The foundational work on stochastic precipitation generation dates back to the 1970s, when Richardson (1981) [3] introduced one of the first daily rainfall generators based on Markov chains. This approach models rainfall occurrence as a two-state Markov process (wet or dry), capturing the persistence and transition probabilities of rain events. Richardson’s model paved the way for the further development of stochastic weather generators by coupling rainfall occurrence with amount distributions. Markov chain-based models have remained popular due to their simplicity and ability to represent temporal dependence [4]. These models typically involve estimating transition probabilities between wet and dry states and fitting probability distributions (such as gamma or exponential) to rainfall amounts. However, single-site Markov models often struggle to capture a longer-term climate variability. More sophisticated statistical and machine learning techniques are incorporated to improve the realism of simulated precipitation. For example, models using hidden Markov models (HMMs) [5] and generalized linear models (GLMs) [6] provide improved flexibility in capturing the complex behavior of precipitation occurrence and intensity. Furthermore, nonparametric approaches and weather generators based on neural networks and deep learning have emerged, though their adoption in operational hydrology remains limited due to interpretability and data requirements.

The primary objective of stochastic weather generators is to replicate the specific statistical characteristics of meteorological variables. These generators typically rely on parametric modeling [3], resampling techniques [7], or a combination of both methods [8]. A notable parametric method utilizes generalized linear models (GLMs), facilitating the stochastic modeling of daily weather variables. Furrer and Katz (2007) [9] applied a GLM-based stochastic weather generator to daily weather data from Pergamino, Argentina, a location characterized by a pronounced wet season. Parametric stochastic weather generators frequently underestimate the observed interannual variability of seasonally aggregated variables, a phenomenon referred to as overdispersion [10,11,12]. To address this limitation, Kim et al. (2012) [13] introduced smoothed seasonal total precipitation and seasonal mean minimum and maximum temperatures as covariates into a generalized linear model (GLM)-based weather generator. They applied locally weighted scatterplot smoothing (LOESS; [14,15]) to effectively mitigate underdispersion. Furthermore, the integration of seasonally aggregated climate statistics into the statistical downscaling of seasonal climate forecasts has yielded robust methodologies for generating weather sequences consistent with seasonal forecasts, significantly benefiting resource planning and management [16]. Nevertheless, challenges remain regarding the selection of appropriate smoothing parameters and forecasting reliability due to the requirement of future seasonal aggregate covariates. Recently, Kim et al. (2017) [17] included seasonal dry/wet indicators derived from hidden Markov model-based decodings of seasonal total precipitation as covariates in the GLM weather generator [18]. Despite these advancements, the model still struggles to produce sufficient precipitation intensity during the wet season (July–September) (Figure 1).

Stochastic precipitation generators are widely used for climate change impact studies [1], hydrologic modeling, and risk assessment. However, challenges persist in accurately simulating extreme precipitation events and adapting to non-stationary climate conditions [19]. Consequently, ongoing research focuses on improving extreme event modeling and integrating climate model outputs into stochastic generators. Gamma [20] and log-normal distributions are commonly used to model rainfall data, although mixed distributions have gained popularity more recently. The gamma distribution, however, tends to perform poorly in cases of heavy precipitation-characterized by high rainfall with low frequency—where high accuracy is essential. To address extreme rainfall events, the generalized Pareto (GP) distribution is applied effectively. Yet, the GP distribution is less suitable when the frequency of light rain is high.

To overcome these limitations, spliced distributions, which combine two different distributions over separate supports, have become a focus of research. Hanum et al. (2015) [21] demonstrated that a spliced distribution composed of a gamma distribution for the lower range and a Pareto distribution for the upper range better fits tropical heavy rainfall data from Jakarta, Indonesia, compared to single distributions such as gamma, Pareto, or GP alone (see also [22]). However, spliced distributions typically lack continuity and differentiability at the threshold separating the two component distributions. To address this, hybrid distributions have been developed that ensure continuity at the threshold, although differentiability is still not guaranteed. Building on this, Kim et al. (2019) [23] introduced a modified hybrid gamma–generalized Pareto distribution, which is a generalized form of the spliced distribution. This model uses a gamma distribution for the ‘head’ (lower values) and a generalized Pareto distribution for the ‘tail’ (extreme values). They analyzed the threshold conditions for this modified hybrid distribution, derived its negative log-likelihood function, and estimated parameters using the differential evolution algorithm for approximate maximum likelihood estimation. Recent methodological advances in stochastic weather generation highlight the value of non-stationary, extreme-accurate frameworks. Nguyen et al. (2024) [24] developed a climate-informed nsRWG that conditions precipitation distributions on circulation and temperature, successfully reproducing spatial and extreme characteristics for future climate scenarios. Guan et al. (2024) [25] demonstrated the ability of nsRWG to capture heavy precipitation events across scales using extremity indices. Abbas et al. (2025) [26] proposed a zero-inflated extended GP model that seamlessly models dry days, typical accumulation, and extremes, while Reulen and Mehrkanoon (2024) [27] leveraged attention-based GANs for enhanced nowcasting of extreme precipitation.

Building on these contemporary directions, our study introduces the MHGGP–GLM framework, which further advances the stochastic modeling of precipitation intensity by ensuring smooth transitions between distributional regimes and addressing overdispersion issues in both frequent and extreme rainfall. The novelty of this study lies in the integration of a modified hybrid gamma–generalized Pareto (MHGGP) distribution into a GLM-based stochastic weather generator, enabling the simultaneous modeling of frequent light rainfall and rare extreme events within a unified framework. This approach effectively mitigates the long-standing problem of overdispersion by capturing interannual variability in wet-season precipitation more realistically. Unlike conventional spliced models, the proposed method preserves continuity and differentiability at the distribution threshold, ensuring stable parameter estimation and smoother simulation outputs. Empirical validation using a 51-year dataset from Seoul, Korea, confirms the model’s superior ability to reproduce rainfall statistics, highlighting its potential utility for hydrology, agriculture, and climate risk assessment. Section 2 briefly reviews the foundational GLM approach for stochastic weather generation and details the mixture of gamma and generalized Pareto distributions with a threshold, highlighting its efficacy in producing realistic weather sequences. Section 3 evaluates the model’s performance using daily weather data from Seoul, Korea, with particular attention to reducing overdispersion and accurately modeling precipitation intensity. Finally, Section 4 provides a discussion of the findings and their implications.

2. GLM Precipitation Generator Using Modified Hybrid Gamma with GP Distribution

2.1. GLM Precipitation Generator

The GLM-based stochastic weather generator employed in this study draws upon the foundational structure originally proposed by [3], which introduced the concept of using a two-part model to separately simulate precipitation occurrence and intensity. This foundational framework has since been extended and refined in numerous ways, most notably by [9], who developed a stochastic weather generator grounded in generalized linear modeling (GLM) principles to capture the stochastic nature of daily weather events. In the present study, we adopt the GLM-based approach introduced by [9] while implementing several modifications to suit the objectives of our analysis. A concise overview of the model structure is provided below for completeness; however, a more comprehensive treatment, including algorithmic implementation and parameter estimation procedures, can be found in the original work by [9].

To simplify the interpretation of results, particularly those related to overdispersion in precipitation processes, we intentionally exclude large-scale climate drivers—specifically the El Ni

\tilde{n}

o-Southern Oscillation (ENSO)—as covariates in our modeling framework. This represents a deliberate departure from the methodology adopted by [9], who incorporated ENSO indices to account for interannual climate variability. By omitting ENSO-related terms, we focus on the intrinsic temporal structure and seasonal variability of local precipitation processes, thereby facilitating a clearer examination of the generator’s behavior in the absence of exogenous climate signals.

The modeling of daily precipitation within this GLM-based framework follows the methodology of [28], who proposed a chain-dependent process to model precipitation occurrence, with the transition probabilities governed by seasonal functions. Specifically, let

J_{t}

denote the binary precipitation indicator on day t, where

J_{t} = \{\begin{matrix} 1, & if precipitation occurs \\ 0, & if otherwise \end{matrix} .

The temporal dependence between successive days is captured through a first-order Markov process, whereby the transition probability,

p_{i j} (t)

, defined as the probability that precipitation state

j \in {0, 1}

, occurs on day t given that state

i \in {0, 1}

is modeled using a logistic regression:

p_{i j} (t) = P r (J_{t} = j ∣ J_{t - 1} = i), i, j = 0, 1,

This transition probability is specified via a Binomial GLM with a logit link function. To capture the inherent seasonality in precipitation occurrence, the logistic model incorporates sinusoidal terms representing the annual cycle. The seasonal terms are defined as

S_{t} = sin (2 π t / 365)

and

C_{t} = cos (2 π t / 365)

, reflecting periodic behavior with a one-year cycle.

The conditional probability of precipitation on day t, denoted

p_{t}

, is then modeled as

l o g (\frac{p_{t}}{1 - p_{t}}) = κ_{0} + a J_{t - 1} + b_{1} C_{t} + b_{2} S_{t} + c_{1} C_{t} J_{t - 1} + c_{2} S_{t} J_{t - 1},

(1)

where

κ_{0}

represents the baseline log-odds of precipitation occurrence, and the coefficient a quantifies the influence of the preceding day’s precipitation occurrence on the current day’s precipitation probability. The coefficients

b_{1}

and

b_{2}

capture the amplitude and phase of the seasonal cycle, respectively. The interaction terms

c_{1}

and

c_{2}

allow for seasonal modulation of the autocorrelation in precipitation occurrence, thereby enabling the model to reflect differing seasonal dynamics in wet and dry spell persistence.

For the precipitation intensity component, conditional on

J_{t} = 1

, we follow the approach of [28] in modeling daily precipitation amounts using a gamma distribution, which is commonly used due to its flexibility and positive support. The conditional mean precipitation intensity of the gamma distribution at time t,

μ_{t} = α β_{t}

is modeled as a function of time through a sinusoidal formulation that captures the inherent annual cycle in precipitation. This is expressed as

l o g (μ_{t}) = κ_{1} + d_{1} C_{t} + d_{2} S_{t},

(2)

where

C_{t} = cos (2 π t / 365)

and

S_{t} = sin (2 π t / 365)

are the standard cosine and sine terms representing annual periodicity. Note that the scale and shape parameter of the gamma distribution can be reparameterized by the mean and variance parameters. The coefficients

d_{1}

and

d_{2}

determine the amplitude and phase shift of the seasonal cycle, respectively, while

κ_{1}

represents the baseline log-mean intensity. This formulation enables the model to reflect realistic seasonal patterns in precipitation intensity, improving the accuracy of synthetic series generation and enhancing the representation of interannual variability. The mean of the gamma distribution is here allowed to vary seasonally through the inclusion of a sinusoidal function, thereby reflecting seasonal patterns in precipitation intensity. Despite its empirical utility, this method tends to underestimate precipitation intensity during peak wet periods, which in turn leads to a systematic underestimation of the interannual variability of aggregated precipitation indices—such as seasonal or annual totals. This limitation is well-documented in the literature and highlights an ongoing challenge in accurately simulating the full distribution of daily precipitation, particularly in climates characterized by highly variable wet seasons. Therefore, the GLM-based precipitation generator adopted here offers a flexible and statistically rigorous framework for simulating daily precipitation processes, capturing both temporal dependence and seasonality. However, inherent trade-offs exist, particularly in the representation of extreme events and aggregated variability, necessitating further refinement or post-processing to ensure realistic simulation outcomes across a range of temporal scales, and this approach typically fails to generate sufficient precipitation intensity, particularly during wet seasons, leading to an underestimation of observed interannual variance in seasonally aggregated variables.

2.2. Modified Hybrid Gamma with GP Distribution

In the modeling of daily precipitation intensity, especially within the context of stochastic weather generators and hydrological simulations, the selection of an appropriate probability distribution is critical for accurately capturing the statistical characteristics of observed rainfall. The gamma distribution is widely employed for this purpose, particularly in the simulation of frequent, low to moderate rainfall events. Its mathematical properties, including a relatively thin and exponentially decaying right tail, make it well-suited for representing precipitation regimes in which light rainfall is dominant. This suitability arises from the gamma distribution’s flexibility in accommodating positively skewed data, while ensuring a non-negative support consistent with the physical nature of precipitation. However, despite these advantages, the gamma distribution often exhibits limitations when applied to datasets that contain a substantial number of heavy rainfall events. In such cases, the empirical distribution of precipitation intensities frequently displays a long right tail—a characteristic that the gamma distribution fails to accommodate adequately. This mismatch can result in poor goodness-of-fit metrics, particularly in the upper quantiles of the distribution, thereby impairing the model’s ability to accurately represent extreme precipitation events that are of significant interest in risk assessment, climate change impact studies, and water resource planning.

To better capture the statistical properties of heavy rainfall events, the generalized Pareto (GP) distribution is often employed. The GP distribution is well-known for its heavy-tailed nature, making it particularly effective for modeling exceedances over high thresholds as commonly encountered in the analysis of extremes. Its theoretical basis lies in extreme value theory, where it arises as the limiting distribution of scaled excesses above a specified threshold. This makes it a natural candidate for modeling the tail behavior of precipitation data. Nonetheless, while the GP distribution can provide a superior fit for high-intensity precipitation events, it often performs poorly for low and moderate intensities, where the tail is relatively thin. Furthermore, the application of the GP distribution typically necessitates the truncation of data below a predefined threshold, which can result in the loss of valuable information contained in the bulk of the distribution. This trade-off between tail accuracy and data completeness introduces practical challenges, particularly when attempting to construct a unified model capable of capturing the full spectrum of precipitation intensities.

Given these limitations, several studies have explored the use of hybrid or mixture distributions to leverage the respective strengths of the gamma and GP distributions. In particular, approaches that combine a gamma distribution for the lower and moderate ranges of precipitation intensity with a GP distribution for the upper tail have shown promise in producing synthetic precipitation sequences that are both realistic and statistically consistent with observed data. Such composite models provide enhanced flexibility and accuracy across a broader range of precipitation values, allowing for improved representation of both common and extreme events. Nevertheless, a commonly cited drawback of conventional mixture models lies in the potential discontinuity or non-differentiability at the threshold separating the two component distributions. This discontinuity can introduce artifacts in simulation output and complicate both parameter estimation and model interpretation. In response to these concerns, recent methodological advancements have proposed modified hybrid distributions that ensure smooth transitions at the threshold point. Specifically, Kim et al. (2019) [23] introduced a refined hybrid model in which the gamma distribution governs the lower segment of the intensity spectrum, while the GP distribution governs the upper segment, with careful parameterization to maintain continuity and differentiability at the junction. This construction not only preserves the statistical integrity of the model across the full range of precipitation intensities but also facilitates more robust simulation of both frequent and extreme rainfall events. By adopting this modified hybrid approach, the present study aims to overcome the shortcomings of traditional single-distribution and discontinuous mixture models. In doing so, we enhance the realism and reliability of synthetic daily precipitation generation, thereby improving the utility of stochastic weather generators for applications in climate modeling, hydrological impact analysis, and infrastructure design under conditions of climatic variability and extremes.

A form of spliced distributions by probability density function

f (x)

is the constructed probability density function

f_{1} (x), f_{2} (x), \dots, f_{k} (x)

[29,30]. As a special case, spliced distribution by probability density function

f (x)

is the combined part of the head from probability density function

f_{1} (x)

with the part of the tail from

f_{2} (x)

and can be shown as follows:

f (x) = \{\begin{matrix} a_{1} f_{1}^{*} (x), & i f - \infty < x < θ, \\ a_{2} f_{2}^{*} (x), & i f θ < x < \infty, \end{matrix}

where

a_{1}

and

a_{2}

are mixing weights greater than 0 and satisfied by

a_{1} + a_{2} = 1

. Also, each of

f_{1}^{*} (x)

and

f_{2}^{*} (x)

can be formed as follows:

f_{1}^{*} (x) = \frac{f_{1} (x)}{F_{1} (θ)} and f_{2}^{*} (x) = \frac{f_{2} (x)}{1 - F_{2} (θ)},

where each of

F_{1} (x)

and

F_{2} (x)

is a cumulative distribution function of

f_{1} (x)

and

f_{2} (x)

, and

θ

means the range limit of the domain and is also regarded as one of the model parameters.

Suppose that part of the head,

f_{1} (x)

, is the probability density function of a gamma distribution with shape parameter

α

and scale parameter

β

, and part of the tail,

f_{2} (x)

, is the probability density function of a GP distribution with location

θ

, scale parameter

σ

, and shape parameter

ξ

. However, the probability density function of spliced distribution suggested by [29] generally is not continuous. So it needs to have several of the critical conditions as follows for being continuous and differentiable [31]. Under both the continuity and differentiability of probability density function

f (x)

, the threshold

θ

satisfying

\frac{d}{d θ} log [\frac{f_{1} (θ)}{f_{2} (θ)}] = 0,

is indicated by

θ = (α - 1) β

where

α > 1

. That is, threshold

θ

of the modified hybrid gamma and generalized Pareto distribution only depends on the parameters of the gamma distribution under condition

α > 1

.

The probability density function (PDF) of this modified hybrid gamma–generalized Pareto (MHGGP) distribution is expressed as

f (x) = \{\begin{matrix} \frac{1}{(1 + δ)} \frac{1}{β^{α} γ (α, \frac{θ}{β})} x^{α - 1} e^{- \frac{x}{β}} & if - \infty < x < θ, \\ \frac{δ}{(1 + δ)} \frac{1}{σ} {(1 + \frac{ξ (x - θ)}{σ})}^{(- ξ^{- 1} - 1)} & if θ < x < \infty, \end{matrix}

and is denoted as MHGGP

(α, β, θ, σ, ξ)

. Under the continuous condition, the positive mixing weights of MHGGP distribution are

\frac{1}{(1 + δ)}

and

\frac{δ}{(1 + δ)}

, respectively. In addition, the

δ

value for mixing weights can be expresses as follows:

δ = \frac{f_{1} (θ) [1 - F_{2} (θ)]}{f_{2} (θ) F_{1} (θ)} = \frac{σ θ^{α - 1} e^{- \frac{θ}{β}}}{γ (α, \frac{θ}{β}) β^{α}},

where

γ (α, \frac{θ}{β})

is the lower incomplete gamma function. Note that

δ

is determined by parameters of both generalized Pareto distribution and gamma distribution.

2.3. Statistical Analysis

Formally, the model employed for precipitation occurrence in this study is identical in structure to the basic model previously described, retaining the same first-order Markov chain and generalized linear modeling (GLM) approach to simulate the binary occurrence of daily precipitation. However, in modeling precipitation intensity, we depart from the conventional framework by incorporating a more flexible distributional form—specifically, a mixture of the gamma distribution and the generalized Pareto (GP) distribution—instead of relying solely on the gamma distribution as traditionally done. This hybrid model, denoted as MHGGP (modified hybrid gamma–generalized Pareto), is designed to better capture the dual behavior of precipitation intensities, which often exhibit light to moderate values frequently but also include sporadic, high-intensity events that lie in the tail of the distribution. We estimate the parameters of the modified hybrid gamma–generalized Pareto (GP) distribution using maximum likelihood estimation (MLE). Based on the general form of the likelihood function for the spliced distribution,

L (α, β, ξ, σ, θ) = {(\frac{1}{1 + δ})}^{n} δ^{m} \prod_{x_{i} \leq θ} f_{1} (x_{i}) {[\frac{1}{F_{1} (θ)}]}^{M} \prod_{x_{i} > θ} f_{2} (x_{i}) {[\frac{1}{1 - F_{2} (θ)}]}^{m},

the log-likelihood function for parameters (

α, β, θ, σ, ξ

) is expressed as follows:

log L (α, β, θ, σ, ξ) = - n log (1 + δ) + \sum_{x_{i} \leq θ} log f_{X} (x_{i}) - M log F_{X} (θ) + \sum_{y_{i} > θ} log f_{Y} (y_{i}) + m log δ,

where

n = M + m

,

M = \sum_{i = 1}^{n} I (x_{i} \leq θ)

,

m = \sum_{i = 1}^{n} I (x_{i} > θ)

, and

F_{1} (x)

is the cumulative density function of the gamma distribution. Since obtaining explicit maximum likelihood estimators (MLE) for the parameters (

α, β, θ, σ, ξ

) by directly maximizing the log-likelihood function

log L (α, β, θ, σ, ξ)

is infeasible, we employ the differential evolution (DE) algorithm for global optimization. The DE algorithm is particularly advantageous, as it does not require differentiability, a condition typically necessary for classical optimization methods. This makes DE suitable for handling non-differentiable optimization problems, multiple local minima, and complex nonlinearities.

A synthetic sequence of daily precipitation time series is generated as follows. First, models (1) and (2) are calibrated using data from the entire study period. To ensure parameter stability and interpretability, the shape parameter

α

of the gamma distribution is estimated globally and treated as time-invariant. In contrast, the scale parameter

β_{t}

is allowed to vary with time to account for seasonal variations in precipitation intensity. Specifically, the conditional mean precipitation intensity at time t,

μ_{t} = α β_{t}

, is reparameterized based on the conditional mean of non-zero precipitation intensities for each calendar day t. Conditional on

J_{t} = 1

, precipitation during the non-wet season is generated from the fitted gamma distribution, whereas precipitation during the wet season is simulated using the modified hybrid gamma–generalized Pareto distribution. This hybrid framework effectively overcomes the limitations of single-distribution approaches by simultaneously capturing both the bulk and tail behavior of daily precipitation. The threshold

θ_{t}

, which defines the point of transition from the gamma to the generalized Pareto (GP) distribution, is specified as

θ_{t} = (α - 1) β_{t}

. Observations exceeding this threshold (

y_{t} > θ_{t}

) are classified as exceedances, and only these data are used to estimate the GP parameters

(σ, ξ)

, ensuring an accurate representation of tail behavior.

3. Real Data Analysis

Our statistical approach is based on linking long-term (interannual) predictor variables with short-term (daily) predictands. However, the use of observed seasonal climate statistics derived from the gamma distribution may introduce substantial noise into the daily weather data, potentially leading to underdispersion in the aggregated climate statistics. This study analyzes daily precipitation data for Seoul obtained from the Korea Meteorological Administration (KMA). The dataset covers a 51-year period, from 1961 to 2011, with data from February 29 of leap years excluded to maintain consistency. Analysis reveals a pronounced annual cycle in Seoul’s precipitation, with a notable peak occurring during late spring and summer, and a clear minimum observed throughout the winter months. Descriptive statistics summarizing Seoul’s rainfall data from 1961 to 2011 are presented in Table 1.

The 51-year rainfall dataset is positively skewed, indicating significant differences between the maximum value and both the first and third quartiles (see Figure 2). The data exhibit numerous instances of low rainfall, with relatively few occurrences of large rainfall events grouped together. In this study, we estimate the parameters of the modified hybrid gamma–generalized Pareto (GP) distribution using maximum likelihood estimation (MLE). Table 2 summarizes the approximated MLEs for the modified hybrid gamma–generalized Pareto distribution based on Seoul’s rainfall data from 1961 to 2011. In particular, the modified hybrid gamma–generalized Pareto distribution provides a better fit to the observed rainfall data in Seoul during the wet season.

It should be noted that the shape parameter of the gamma distribution,

α

, is estimated using the entire 51-year dataset and is subsequently applied uniformly across the entire study period. In contrast, the scale parameter,

β_{t}

, of the gamma distribution is derived from the conditional mean precipitation intensity, allowing it to vary temporally. Since the threshold

θ_{t}

depends solely on the gamma distribution parameters, the parameters of the generalized Pareto distribution are estimated exclusively from observations exceeding this threshold

θ_{t}

.

Table 3 provides the estimated coefficients and their corresponding standard errors for each component of the proposed stochastic precipitation generator for Seoul. It is noteworthy that the precipitation models employ the daily mean precipitation rate as a covariate rather than total precipitation. Results indicated that all covariate categories are statistically significant except for the interaction terms in the precipitation occurrence model. The synthetic sequence of daily precipitation time series is generated by the estimated models over the same 51-year period for which observational data are available, and aggregated statistics are computed. This simulation process is repeated 500 times using the proposed model to evaluate its performance. From these 500 realizations, key statistical features are derived, and the ability of the precipitation generator to reproduce selected daily statistics is assessed.

Figure 3 demonstrates the effectiveness of the proposed model in reproducing the variance of annual and summer total precipitation, depicted through boxplots of standard deviations (SD) computed from the aggregated statistics of the 500 simulation runs. Specifically, Figure 3 presents the minimum, lower quartile, median, upper quartile, and maximum SD of these aggregated statistics alongside their observed counterparts derived from the historical 51-year dataset. Boxplots effectively illustrate the variability range of simulated statistics and allow for direct comparison with the observed historical data. The proposed model significantly reduces overdispersion, particularly for annual and summer total precipitation. In winter, precipitation variability is naturally lower, allowing even the original model to reproduce winter variability effectively.

To further evaluate the performance of the proposed GLM weather generator, various useful daily statistics are considered. Figure 4 compares the observed and simulated distributions of dry spells during summer in Seoul across the 51-year study period. Additionally, Figure 5, Figure 6 and Figure 7 illustrate temporal variations in transition probabilities (

p_{11} (t)

and

p_{01} (t)

), unconditional probability of rain, first-order autocorrelation coefficients, and the mean and SD of daily precipitation intensity, respectively. The mean curves generated by the proposed stochastic precipitation model closely match the observed daily statistics. Results reveal prominent mid-summer maxima in transition probability (

p_{01} (t)

), unconditional rain probability, and the mean and SD of precipitation intensity, which the proposed model effectively captures. Importantly, the proposed method adequately generates precipitation intensity during wet seasons, thereby addressing the tendency of previous models to underestimate the observed interannual variance of seasonally aggregated variables during summer.

4. Conclusions

This study proposed and evaluated a GLM-based stochastic weather generator enhanced with a modified hybrid gamma–generalized Pareto (MHGGP) distribution to overcome limitations in simulating precipitation intensity especially during wet season. The results demonstrated that the conventional gamma-only framework systematically underestimates precipitation variability, particularly during wet seasons, leading to pronounced overdispersion. By introducing the MHGGP distribution, the generator effectively reproduced both the bulk distribution of light-to-moderate rainfall and the heavy-tailed nature of extreme precipitation events, thereby providing a more robust probabilistic representation of daily rainfall sequences. The ability of the enhanced generator to reproduce realistic seasonal statistics—such as variance in annual and summer rainfall, distribution of dry spells, and intensity of extreme wet events—positions it as a valuable tool for diverse applications in climate-sensitive sectors. In particular, the model’s demonstrated capacity to mitigate overdispersion makes it suitable for integration into broader modeling frameworks, including crop yield forecasting, flood risk assessment, and infrastructure design under climate uncertainty.

Nonetheless, some challenges remain. The current study relied on long-term precipitation data from a single urban site (Seoul), which, while illustrative, limits immediate generalization to other climatic regimes. Further work is warranted to test the robustness of the approach across different geographic contexts and to explore its adaptability to regions with distinct precipitation dynamics. In addition, incorporating larger-scale climatic drivers (e.g., ENSO or monsoon indices) or coupling with regional climate model outputs could enhance predictive power, particularly for seasonal forecasting applications. Future research will also involve comparative analyses of the proposed MHGGP–GLM framework against other statistical distributions that have been widely applied in hydrological problems and extend the model to multisite or spatially explicit settings. In particular, distributions such as the TCEV distribution (designed for hydrological extremes), the five-parameter Lambda distribution, and the Wakeby distribution (constructed from two Pareto components) have demonstrated strong performance in modeling precipitation extremes and flood frequency analysis [32,33]. Conducting systematic comparisons with these established models will provide a more objective evaluation of the strengths and limitations of the MHGGP–GLM framework and further enhance its applicability to hydrological risk assessment.

In conclusion, the modified hybrid gamma–generalized Pareto distribution offers a flexible and powerful extension to GLM-based weather generators. By better capturing the dual behavior of precipitation intensities, this approach reduces long-standing issues of underdispersion while ensuring realistic simulation of extremes. As climate variability and extremes become increasingly central to planning and risk management, the proposed framework represents a meaningful step toward more reliable stochastic weather generation. Continued development and rigorous validation of this framework will further solidify its practical utility as a dependable tool for generating realistic daily precipitation scenarios aligned with evolving climatic conditions.

Author Contributions

H.G.J.: Conceptualization, Methodology, Writing—original draft; S.H.: Data curation, Methodology, Writing—original draft; Y.K.: Conceptualization, Formal Analysis, Writing—original draft, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Global-Learning & Academic research institution for Master’s PhD students, and Postdocs (G-LAMP) Program of the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (No. RS-2023-00301914) and R&D Program for Forest Science Technology (RS-2025-02213492) provided by Korea Forest Service (Korea Forestry Promotion Institute).

Data Availability Statement

Within the paper, references to the data analyzed are listed.

Acknowledgments

We are grateful to the Editor-in-chief, Associate Editor and the anonymous referees for all valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wilks, D.S.; Wilby, R.L. The weather generator game: A review of stochastic weather models. Prog. Phys. Geogr. 1999, 23, 329–357. [Google Scholar] [CrossRef]
Grondona, M.O.; Podestá, G.P.; Bidegain, M.; Marino, M.; Hordu, H. A Stochastic Precipitation Generator Conditioned on ENSO Phase: A Case Study in Southeastern South America. J. Clim. 2000, 13, 2973–2986. [Google Scholar] [CrossRef]
Richardson, C.W. Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resour. Res. 1981, 17, 182–190. [Google Scholar] [CrossRef]
Wilks, D.S. Multisite generalization of a daily stochastic precipitation generation model. J. Hydrol. 1998, 210, 178–191. [Google Scholar] [CrossRef]
Bromhead, E.; Robertson, D.E.; Taylor, C.M. A hidden Markov model for rainfall occurrence and intensity. Hydrol. Earth Syst. Sci. 2005, 9, 177–190. [Google Scholar]
Bárdossy, A. Generating precipitation time series using simulated annealing. Water Resour. Res. 1997, 33, 2861–2868. [Google Scholar]
Rajagopalan, B.; Lall, V. A k-nearest neighbor simulator for daily precipitation and other weather variables. Water Resour. Res. 1999, 35, 3089–3101. [Google Scholar] [CrossRef]
Apipattanavis, S.; Podestá, G.P.; Rajagopalan, B.; Katz, R.W. A semiparametric multivariate and multisite weather generator. Water Resour. Res. 2007, 43, W11401. [Google Scholar] [CrossRef]
Furrer, E.M.; Katz, R.W. Generalized linear modeling approach to stochastic weather generators. Clim. Res. 2007, 34, 129–144. [Google Scholar] [CrossRef]
Benestad, R.E.; Hanssen-Bauer, I.; Chen, D. Empirical-Statistical Downscaling; World Scientific: Singapore, 2008. [Google Scholar]
Buishand, T.A. Some remarks on the use of daily rainfall models. J. Hydrol. 1978, 47, 235–249. [Google Scholar] [CrossRef]
Katz, R.W.; Parlange, M.B. Overdispersion phenomenon in stochastic modeling of precipitation. J. Clim. 1998, 11, 591–601. [Google Scholar] [CrossRef]
Kim, Y.; Katz, R.W.; Rajagopalanc, B.; Furrer, E.M.; Podestá, G. Reducing overdispersion in stochastic weather generators using a generalized linear modeling approach. Clim. Res. 2012, 53, 13–24. [Google Scholar] [CrossRef]
Cleveland, W.S. Robust locally-weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 1979, 74, 829–836. [Google Scholar] [CrossRef]
Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models; Chapman and Hall: New York, NY, USA, 1990. [Google Scholar]
Kim, Y.; Rajagopalanc, B.; Lee, G.W. Temporal statistical downscaling of precipitation and temperature forecasts using a stochastic weather generator. Adv. Atmos. Sci. 2016, 33, 175–183. [Google Scholar] [CrossRef]
Kim, Y.; Lee, G.W. Stochastic precipitation generator with hidden state covariates. Asian-Pac. J. Atmos. Sci. 2017, 53, 1–7. [Google Scholar] [CrossRef]
Zucchini, W.; MacDonald, I.L. Hidden Markov Models for Time Series; Chapman and Hall: New York, NY, USA, 2009. [Google Scholar]
Fatichi, S.; Ivanov, V.Y.; Caporali, E. A mechanistic ecohydrological model to investigate complex interactions in cold water ecosystems under climate change. Environ. Model. Softw. 2016, 76, 45–59. [Google Scholar]
Coe, R.; Stren, R.D. Fitting models to daily rainfall data. J. Appl. Meteorol. 1982, 21, 1024–1031. [Google Scholar] [CrossRef]
Hanum, H.; Hamim, A.; Djuraidah, A.; Mangku, I.W. Modeling extreme rainfall with gamma-Pareto distribution. Appl. Math. Sci. 2015, 9, 6029–6039. [Google Scholar] [CrossRef]
Li, C.; Singh, V.P.; Mishra, A.K. Simulation of the entire range of daily precipitation using a hybrid probability distribution. Water Resour. Res. 2012, 48, W03521. [Google Scholar] [CrossRef]
Kim, Y.; Kim, H.; Lee, G.; Min, K. A modified hybrid gamma and generalized Pareto distribution for precipitation data. Asia-Pac. J. Atmos. Sci. 2019, 55, 609–616. [Google Scholar] [CrossRef]
Nguyen, V.T.; Mai, T.; Pham, T.T.; Bárdossy, A. A nonstationary climate-informed regional weather generator for climate change impact assessment. Adv. Sci. Res. Clim. Model. Obs. 2024, 10, 195–210. [Google Scholar]
Guan, X.; Zhang, Z.; Fischer, T.; Bárdossy, A. Evaluation of a non-stationary regional weather generator for heavy precipitation across spatial and temporal scales in Germany. Nat. Hazards Earth Syst. Sci. Discuss. 2024. preprint. [Google Scholar]
Abbas, K.; Akhundjanov, S.B.; Ozkan, S. A zero-inflated extended generalized Pareto distribution for daily precipitation modeling. arXiv 2025, arXiv:2504.11058. [Google Scholar]
Reulen, N.; Mehrkanoon, S. GA-SmaAt-GNet: Attention-enhanced generative adversarial networks for extreme precipitation nowcasting. arXiv 2024, arXiv:2401.09881. [Google Scholar]
Stern, R.D.; Coe, R. A model fitting analysis of daily rainfall data. J. R. Stat. Soc. Ser. A 1984, 147, 1–34. [Google Scholar] [CrossRef]
Klugman, S.A.; Panjer, H.H.; Willmot, G.E. Loss Models: From Data to Decisions, 2nd ed.; John Wiley and Sons: New York, NY, USA, 2004. [Google Scholar]
Nadarajah, S.; Bakar, S. New composite models for the Danish re insurance data. Scand. Actuar. J. 2014, 2014, 180–187. [Google Scholar] [CrossRef]
Bakar, S.; Hamzah, N.A.; Maghsoudi, M.; Nadarajah, S. Modeling loss data using composite models. Insur. Math. Econ. 2015, 61, 146–154. [Google Scholar] [CrossRef]
Martins, E.S.; Stedinger, J.R. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resour. Res. 2000, 36, 737–744. [Google Scholar] [CrossRef]
Rossi, F.; Fiorentino, M.; Versace, P. Two-component extreme value distribution for flood frequency analysis. Water Resour. Res. 1984, 20, 847–856. [Google Scholar] [CrossRef]

Figure 1. Modeled mean (left) and standard deviation (right) of daily precipitation intensity in Seoul, including seasonal dry and wet indices. Dots represent the empirical daily mean and standard deviation of precipitation intensity, calculated separately for each day of the year.

Figure 2. Histogram of rainfall data in several weather stations in Seoul during 1961–2011.

Figure 3. Standard deviations of annual and summer total precipitation (mm) in Seoul, based on the basic model (left) and the proposed model (right). Horizontal solid lines indicate the observed values from the historical data.

Figure 4. Distribution of observed (dashed line) and simulated (solid line) summer dry spell durations in Seoul over a 51-year period.

Figure 5. Modeled transition probabilities

p_{11} (t)

(left) and

p_{01} (t)

(right) for Seoul based on the proposed model. Dots represent empirical transition probabilities, calculated as the observed transition frequencies for each day of the year.

Figure 5. Modeled transition probabilities

p_{11} (t)

(left) and

p_{01} (t)

(right) for Seoul based on the proposed model. Dots represent empirical transition probabilities, calculated as the observed transition frequencies for each day of the year.

Figure 6. Modeled unconditional probability of precipitation (left) and first-order autocorrelation coefficient (right) for Seoul, based on the proposed model. Dots represent empirical estimates, calculated separately for each day of the year.

Figure 7. Modeled mean (left) and standard deviation (SD) (right) of daily precipitation intensity in Seoul, based on the proposed model. Dots represent empirical mean and SD values, calculated separately for each day of the year.

Table 1. Descriptive statistics for rainfall data in Seoul during 1961–2011.

n	Mean	Min	Max	Median	$Q_{1}$	$Q_{3}$
2051	22.27	0.10	332.80	8.30	1.50	28.80

Table 2. Approximate maximum likelihood estimates of modified hybrid gamma and generalized Pareto distribution using rainfall data in Seoul during 1961–2011.

Parameter	$α$	$β$	$ξ$	$σ$	$θ$
Est.	1.9576	0.1519	0.6260	5.9313	0.1455
SD	0.7070	0.0010	0.0316	0.1842	0.0132

Table 3. Estimated coefficients (Coef) and standard errors (SEs) for all components of the stochastic precipitation generator model for Seoul.

Covariate Category	Precipitation Occurrence			Precipitation Intensity (mm)
	Term	Coef.	SE	Term	Coef.	SE
Mean	$μ$	−1.67	0.032	$μ$	2.18	0.025
Autocorrelation	$J_{t - 1}$	1.17	0.040	−	−	−
Seasonality	$C_{t}$	0.69	0.027	$C_{t}$	0.91	0.034
	$S_{t}$	0.25	0.027	$S_{t}$	0.30	0.034
Interaction	$C_{t} J_{t - 1}$	0.06	0.056	−	−	−
	$S_{t} J_{t - 1}$	0.06	0.053	−	−	−
AIC	19,097			32,881
BIC	19,150			32,903

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, H.G.; Hong, S.; Kim, Y. Synthetic Rainfall Modeling Using a Modified Hybrid Gamma-GP Distribution. Appl. Sci. 2025, 15, 9563. https://doi.org/10.3390/app15179563

AMA Style

Jin HG, Hong S, Kim Y. Synthetic Rainfall Modeling Using a Modified Hybrid Gamma-GP Distribution. Applied Sciences. 2025; 15(17):9563. https://doi.org/10.3390/app15179563

Chicago/Turabian Style

Jin, Hyang Gon, Seunghyun Hong, and Yongku Kim. 2025. "Synthetic Rainfall Modeling Using a Modified Hybrid Gamma-GP Distribution" Applied Sciences 15, no. 17: 9563. https://doi.org/10.3390/app15179563

APA Style

Jin, H. G., Hong, S., & Kim, Y. (2025). Synthetic Rainfall Modeling Using a Modified Hybrid Gamma-GP Distribution. Applied Sciences, 15(17), 9563. https://doi.org/10.3390/app15179563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Synthetic Rainfall Modeling Using a Modified Hybrid Gamma-GP Distribution

Abstract

1. Introduction

2. GLM Precipitation Generator Using Modified Hybrid Gamma with GP Distribution

2.1. GLM Precipitation Generator

2.2. Modified Hybrid Gamma with GP Distribution

2.3. Statistical Analysis

3. Real Data Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI