Stochastics Modelling of Rainfall Process in Asia Region: A Systematics Review

: In recent years, the stochastic model has been growing due to the high complexity and dynamics of the atmosphere, especially the rainfall process. Various concepts have been applied to rainfall modeling, ranging from simplistic approaches to more complex models. It is important to understand different stochastic rainfall modeling approaches as well as their advantages and limitations. This paper determines the development of the latest stochastic rainfall models in the Asia region, where different concepts of stochastic rainfall models were highlighted. It reviews different methodologies used, including rainfall forecasting, spatio-temporal analysis, and extreme event. We selected 30 articles from 1571 literature published between 2013 – 2022 from the Scopus database. The results show that the stochastic models often used in the literature consist of Markov Chain, Weather Generator, Probability Distribution, ARIMA, and Bayesian Model. In the recent development in Asia, stochastic models in rainfall modeling research are widely used to generate the occurrence and amount of rainfall data, statistical downscaling, future rainfall trends, and estimation of extreme values. The difference in Spatio-temporal, climate conditions, and the parameters model cause the performance of each model can be different.


Introduction
The development of a climate model is an attempt to simplify the understanding of the climate system. Stochastic models are now an important topic in climate research and are starting to be widely used in more comprehensive climate predictions. Stochastic methods for numerical weather and climate prediction allow for an accurate representation of uncertainty, reduced bias, and improved representation of long-term climate variability. Research related to systematic reviews of stochastic climate models, especially rainfall models, is not yet available for the Asian region. The stochastic model method and assessing the accuracy of the output from these models will provide an overview of the robustness of the stochastic model in representing the rainfall model.

Markov Chain
The Markov chain defines the state of a particular day as a "wet" or "dry" day and describes the relationship between today's state and the previous day. The use of previous rainfall data was quite varied, such as 1 day [2,6], 0-3 days [8], and 0-5 days before [9]. Most of the Markov chain models mentioned in the literature are first-order models. Although the first-order is satisfactory, the most prolonged simulated dry spell results are slightly shorter than the observed results, which may be due to the short-term memory of the first-order Markov model [6]. The solution is to use a Markov chain of order 2 or higher to overcome this limitation. In tropical areas such as Malaysia, the second-order Markov chain has the most optimum value for estimating monthly rainfall, while the third-order is best for estimating annual rainfall [9]. While in sub-tropical areas, which have 4 seasons, the prediction of daily rainfall in summer is better than in other seasons [8].
Along with the development of stochastic model research, the use of Markov chain models has been modified to improve its accuracy, such as Modified Markov Models (MMM) [2], Hidden Markov Model (HMM) [11], Non-Hidden homogeneous Markov Models (NHMM) [12], Decadal and Hierarchical Markov Chain (DHMC) [13], Stochastic Daily Rainfall Model-Markov Chain Rainfall Event Model (SDRM-MCRE) [5]. MMM includes atmospheric predictors that predict the effect of changing climatic conditions and other variables to represent specific rainfall characteristics. HMM contains hidden and unknown parameters (event). Compared to the HMM, the NHMM model introduces nonhomogeneity by allowing for different components in the transition matrix or emission matrix, depending on other relevant variables. SDRM-MCRE can comprehensively maintain the rainfall characteristics of the rainfall time series (e.g., monthly mean rainfall and extreme rainfall percentiles) and rainfall event characteristics (e.g., different classes of rainfall duration, rainfall depth, rainfall intensity, and drought).
Most studies state that distribution with three parameters shows better results than other models [6,14]. Three-parameter distributions, especially the Mixed exponential distribution, perform better in reproducing the daily rainfall variance in subtropical regions such as China. In contrast, the skewed normal and Weibull distributions better simulate extreme rainfall characteristics at >95th percentile [6]. In the tropics, Mixed Exponential (three parameters) is very suitable for estimating the average and maximum values on an hourly scale quite well compared to Weibull (two parameters) [14].
Spatiotemporal differences also affect the application of the distribution model so that not all three-parameter distributions are always better. For example, in the Kelantan watershed, Malaysia, the Mixed exponential distribution was not chosen as the best distribution. Statistical tests proved no significant difference between the performance of one, two, and three-parameter distributions [9]. At extreme values, the exponential (1 parameter) and log-normal (2 parameters) distributions perform better than other distributions [9], while the double gamma distribution (2 parameters) can capture extreme rainfall as well as average rainfall at the same time [5].
From several studies, models based on the spell length approach, such as LARS-WG, have worse performance than Markov chain-based models like WeaGETS [7] and SDSM, a hybrid model that combines a regression model and stochastic weather generators [3].
Chen and Brissette [7] compared 5 weather stochastics to generate rainfall data in China's Loss Plateau. The WGEN, CLIMGEN, and CLIGEN use first-order 2-state Markov chains to generate precipitation events. In calculating the amount of rainfall, WGEN uses the Gamma, CLIMGEN uses the Weibull, while CLIMGEN uses the Skewed normal. WeaGETS uses a combination of third-order Markov chains and a mixed exponential distribution. For simulating daily rainfall amounts, weather generators based on three-parameter (CLIGEN and WeaGETS) generally perform better than two-parameter distributions (WGEN and CLIMGEN), especially in simulating extreme rainfall.
Stochastics Weather generators commonly used for Multisite are MSRG [17] MulGETS [16], and the new multivariate-multisite WG [18]. MSRG can simultaneously simulate the spatial dependence of the occurrence and amount of multisite daily rainfall using the SSRN method. MSRG also has the potential to be applied in relatively large basins or areas [17]. MulGETS is an extension of SSWG created by driving a single site model with temporally dependent and spatially correlated random numbers. Ahn [18] combines annual and daily weather generators to overcome the limited variability of lowfrequency climate variables and generate extreme rainfall events.
The STREAP WG model for remote sensing data such as radar [19] was used to measure the sub-pixel variability of the radar from extreme rainfall to downscaling the radar rainfall recorded at a particular pixel. Peleg and Morin [20] developed a slightly different WG based on rainfall field analysis derived from weather radar data in addition to synoptic parameters that explicitly represent convective rain cell elements that are known to have a significant impact on the catchment hydrological response. HiReS-WG is used to periodically generate rain fields with a high spatial and temporal resolution.

ARIMA
ARIMA is a typical statistical analysis model that uses time-series data to predict future trends. The ARIMA model approach can outperform most other statistical models, such as in hydrological time series. The relative advantage of the ARIMA model is due to its statistical nature, as well as the well-known methodology in building the model [21]. The ARIMA model is a combined model between Autoregressive (AR) and Moving Average (MA) as well as an order d differencing process for data at seasonal and non-seasonal levels and is included in the linear forecasting group [8]. The ARIMA model is a model that has been widely applied in rainfall data analysis for various purposes, especially in drought analysis [21,22]. ARIMA model is a time series forecasting approach.

Bayesian
The Bayesian approach is used in many hydrological studies such as uncertainty quantification, water quality modeling, and hydroclimatic analysis. One of the studies was carried out by developing a Bayesian model to evaluate changes in the maximum thickness of seasonally frozen ground (MTSFG) [23]. The application of this Bayesian method has been used to estimate snow depth and soil organic carbon content in permafrost areas using the Markov Chain Monte Carlo (MCMC) sampling method [24,25].
Currently, the use of the Bayesian model has been modified to improve its accuracy in rainfall analysis. Some of those mentioned in the literature are the Gaussian Copula Model, the Bernoulli-Gamma hierarchical Bayesian Model, and the Bayesian-Time Varying Downscaling Model (TVDM). The use of the Gaussian Copula model to explore future extreme rainfall changes [26]. In addition, this Gaussian Copula can be used as a new scheme to correct for biases in the spatial correlation as well as the marginal distribution of the simulated rainfall. Bernoulli-Gamma hierarchical Bayesian Model was used to simulate rainfall to build a hierarchical Bayesian mixture model for daily rainfall forecasts using endogenous and external information [27]. The proposed Bayesian-Time Varying Downscaling Model (TVDM) is used to derive monthly rainfall in India using the largescale output of general circulation models (GCM) [28]. The methodology proposed by TVDM was developed using a Bayesian approach in updating the parameters previously adopted in the Bayesian dynamic linear model.

Strengths and Limitation Model
Markov chain has the advantages of being easy to use, able to simulate rainfall in the station network while maintaining influential spatial attributes, maintaining rainfall characteristics from the rainfall time series, having great potential to be used for flood and drought risk assessment, and being able to simulate monthly and annual rainfall events. The ARIMA model in the literature is mainly carried out in dry areas (Pakistan, Saudi Arabia and Iraq). This is because ARIMA can forecast drought on different timescales and outperform most other statistical models. The model's weakness is that it cannot predict for long periods. The WG model is excellent for generating data in small areas. The Bayesian model found in this literature review contains pure Bayesian but also a combination model, like TVDM and the Bayesian NHMC. WGs are appropriate for climate change projections because their time-varying components allow for variations in transition probabilities or emission probabilities depending on external factors. In the k-nearest neighbor resampling, little effort is needed to estimate the parameter. This model can be an excellent alternative to simulating multisite pre-precipitation events.

Conclusions
Research related to stochastic models on rainfall modeling in the Asian region is a very complex study. The variety of scopes, approaches, focuses, methodologies, and limitations used in rainfall modeling hinders a common understanding of the stochastic models used. The results of the study indicate that the research objectives of using stochastic models in rainfall modeling research include climate data generator, statistical downscaling, future rainfall trends, estimation of extreme values, and so on. Of these purposes, the stochastic model is the most widely used for climate data generation and statistical downscaling. The rainfall data generator is used to estimate the occurrence and amount of rainfall. Various stochastic models that are often used in the literature consist of Markov Chain, Weather Generator, Probability Distribution, ARIMA, and Bayesian Model. The performance of these stochastic models will be different for each region in Asia. The spatiotemporal differences, the study area, and the use of parameters can be the cause of the difference in the results of each model. The stochastic model is easy and good to use and the temporal, spatial scale, and type of model can be adapted to the research objectives, where the more combinations of models the better the results. Therefore, in general, the stochastic model is very flexible depending on user needs.