Towards Generic Simulation for Demanding Stochastic Processes

We outline and test a new methodology for genuine simulation of stochastic processes with any dependence structure and any marginal distribution. We reproduce time dependence with a generalized, time symmetric or asymmetric, moving-average scheme. This implements linear filtering of non-Gaussian white noise, with the weights of the filter determined by analytical equations, in terms of the autocovariance of the process. We approximate the marginal distribution of the process, irrespective of its type, using a number of its cumulants, which in turn determine the cumulants of white noise, in a manner that can readily support the generation of random numbers from that approximation, so that it be applicable for stochastic simulation. The simulation method is genuine as it uses the process of interest directly, without any transformation (e.g., normalization). We illustrate the method in a number of synthetic and real-world applications, with either persistence or antipersistence, and with non-Gaussian marginal distributions that are bounded, thus making the problem more demanding. These include distributions bounded from both sides, such as uniform, and bounded from below, such as exponential and Pareto, possibly having a discontinuity at the origin (intermittence). All examples studied show the satisfactory performance of the method.


Introduction
Reviews on the historical evolution of simulation of stochastic processes, with its different schools, have recently been provided by Koutsoyiannis [1,2] and Beven [3]. In most scientific disciplines, the dominant methods are those of the so-called time series school, which developed families of models, known by the acronym ARMA (autoregressive-moving average). These are also called Box-Jenkins models, after the influential book by these authors [4], thus confirming Stigler's law of eponymy [5], because, in fact, they were introduced earlier by Whittle [6][7][8]. Despite their popularity, these models have several problems, such as their lack of parsimony (except for the simplest of them, e.g., the ARMA(1,1), summarized in the Appendix A), as well as the inability to model long-range dependence (LRD) and to simulate non-Gaussian processes. On the other hand, both of these features are profoundly present in most geophysical processes [9]. An extension of these models, applicable to processes with LRD, was proposed by Hosking [10] under the acronym ARFIMA (with the letter 'F' standing for fractional differencing and the letter 'I' for integrated). Again, these are good for Gaussian processes. Koutsoyiannis (2000) [11] introduced the symmetric moving average (SMA) scheme to replace ARMA models with a generic approach (more recently advanced in [12]), capable of reproducing any aspect of time dependence, short-range (SRD) or long-range (LRD), in a parsimonious manner, i.e., with a low number of parameters that are estimated from the data. This scheme can also preserve the skewness of non-Gaussian processes, but has difficulty in dealing with higher-order moments, particularly with strongly intermittent processes, such as rainfall at small time scales.
For the latter, point process (clustered) models were devised [13][14][15][16]. One advantage of these types of models is the mechanistic representation of certain aspects of the process, such as the arrival and cease of a storm event. The disadvantages are mainly focused on the preservation of the dependence structure at multiple scales and their difficulty in application in multivariate or multiscale schemes. For this reason, Koutsoyiannis et al. [17], even though they used a 3D extension of a point process model (the so-called Gaussian displacement spatial-temporal rainfall [18]), resorted to a linear generation scheme for an application to multivariate rainfall disaggregation.
Several other modelling schemes use transformations of the process of interest, mostly within a copula context [19][20][21][22], with the most widely applied transformation resulting in a Gaussian process (normalization) [23,24]. However, such transformation schemes inherit some of the limitations of the parent process. For example, it is well known that a Gaussian process is necessarily symmetric in time and, thus, cannot capture time directionality, otherwise known as irreversibility or time's arrow [25]. On the other hand, it is known that, in several natural processes, time's arrow is present [26,27], and to reproduce it, we need processes with asymmetric distributions, which can also exhibit asymmetry in time.
A more general algorithm for generation of any type of marginal distribution was recently proposed by Lombardo et al. [28], but only under the condition of Markov dependence, thus leaving out problems with more complex dependence, including LRD. Recent advances include the use of machine learning methods in stochastic simulation, e.g., [29], which, however, have the disadvantages of being implicit in their mathematical structure, and non-parsimonious.
For these reasons, it is necessary to develop genuine stochastic simulation procedures, which will be able to generate non-Gaussian processes without any transformation to a Gaussian or other distribution. Such procedures have already been discussed in earlier works, referring to the explicit preservation of four moments in a time-symmetric setting [30] as well as preservation of distributions in terms of cumulants, rather than moments [2,31]. However, the general idea of the latter works has never been applied in practice to test its effectiveness. This is the subject of this paper.
The new methodology advances the state-of-the-art in stochastic generation by providing a general framework, capable of dealing with challenging Monte Carlo applications within geophysics, engineering, and other fields. The merits of the methodology rely on its ability to cope with the following aspects: 1. Complex dependence structures that extend way beyond the Markov dependence, and incorporate long-range dependence and short-scale fractal (smoothness/roughness) behavior. This is achieved by using a symmetric moving average scheme, which can involve a large number of white noise terms, with their weights determined in an explicit analytical manner. 2. Marginal distributions that extend beyond Gaussian and incorporate heavy tails, boundedness, and intermittence. This is achieved by using an appropriate number of cumulants, analytically determined from the distribution function, thus resulting in genuine simulation of the process (without a transformation). 3. Time asymmetry (irreversibility), achieved by using a non-Gaussian distribution function, combined with an asymmetric moving average scheme, with the weights again determined in an explicit analytical manner.
In the following sections, we outline the new methodology for genuine simulation (Section 2), and illustrate it in a number of synthetic and real-world applications (Section 3). In addition, we study the problem of approximating any distribution, if a number of its cumulants are known, in a manner that can readily support the generation of random numbers from that approximation (Section 2.5 and discussion in Section 4). Such approximation is suitable for analytical derivations, as well as for stochastic simulation in geophysical and engineering applications and beyond.
The simulation model developed is a linear stochastic model. As nonlinearity is fashionable, some may think that the linearity of the approach proposed is a limitation or even a severe drawback. The reality however is different because linearity and nonlinearity have different meaning in deterministic and stochastic approaches. In the latter, linearity is a powerful characteristic, enabling its extension in demanding problems, such as multivariate models and coupling of models of different temporal or spatial scales [32] (also known as downscaling or disaggregation). In this respect, it is relevant to recall the notion of Wold decomposition of stochastic processes. Specifically, Wold [33,34] proved that any stochastic process (even though he referred to it as a time series) can be decomposed into a regular process (i.e., a process linearly equivalent to a white noise process) and a predictable process (i.e., a process that can be expressed in terms of its past values). Thus, nonlinearity is relevant to the predictable part, as this is purely deterministic, while for the regular part linearity suffices.

Preliminaries
We denote a stochastic (random) variable (underlining its symbol in order to distinguish it from a regular variable), ( ) ≔ � ≤ � its probability distribution function, ( ) ≔ 1 − ( ) = � > � its tail function (probability of exceedance) and ( ) ≔ d ( )/d its density function. Furthermore, we denote ( ) a stochastic process at continuous time t (i.e., a family of stochastic variables indexed by time t) and d its discrete time representation at equidistant times with temporal resolution D, i.e., = , for an integer τ. In a discrete-time stochastic process, it is convenient to define the return period, Τ, of the event � > � as the average time between two occurrences of the event. It is shown [2] that the following relationship holds true for any stochastic process (irrespective of time dependence): In other words, this one-to-one correspondence allows the return period to be used in place of the tail function or the distribution function in several applications (e.g., in probability plots); this has been the case for many years, particularly in engineering applications.

Moments and Cumulants
The expectation of any function � � of the stochastic variable is defined as: where we remind that � � is a stochastic variable per se. For � � = , we get the noncentral moment of order p (or pth raw moment or pth moment about the origin): with the particular case = 1 defining the mean: The central moment of order p is the expectation of � � = ( − ) : with the particular case = 2 defining the variance: where its square root σ is the standard deviation. By choosing � � = e for any t, the logarithm of the resulting expectation is called the cumulant generating function: The power series expansion of the cumulant generating function, i.e., defines the cumulants . It is noted that the cumulants were introduced by Thielle as early as in 1889 [35] and refined in 1899 [36,37] under the name half-invariants. The name cumulants was first used by Fisher [38] by the suggestion of Hotelling [39].
Cumulants are related to non-central moments of the same and lower order by: with 0 ′ = 1. A simple proof of these equations has been provided by Smith (1995) [40], but the recursive relationships had been already implied by Thielle [35,37]. Note that Equation (9) links cumulants with non-central moments. The relationship of cumulants with central moments is generally more complex, but for small p it takes the following simple forms: Equation (9) is very powerful as it allows simple calculation of cumulants from noncentral moments and vice versa in a recursive manner. Notably, for the calculation of the moment or the cumulant of order p, the sums appearing in Equation (9) contain terms of order not higher than p.
The importance of cumulants results from their homogeneity and additivity properties, as seen in Table 1. Most importantly, for a stochastic variable that is the linear combination (weighted sum) of r independent variables, the cumulants of the resultant are also a linear combination of the cumulants of the constituents. On the other hand, application of conditioning, also contained in Table 1, is similarly useful as it allows simulation of distributions that are mixtures of other distributions or have discontinuities in their distribution functions. As seen in Table 1, the effect of conditioning is more easily expressed in terms of moments, but Equation (9) readily allows the subsequent evaluation of cumulants.

Operation
Mathematical Relationship Eqn. no.

Shift of origin
Multiplication by a constant ( ) � � = � � (12) Linear combination of independent variables [ 1 1 + ⋯ + ] = 1 � 1 � + ⋯ + � � (13) Conditioning on an event 1 with probability 1 ≔ ( 1 ), where the complementary event 2 has probability 1 − 1 = ( 2 ) Conditioning on an event 1 with probability 1 ≔ ( 1 ), where = (constant) upon the complementary event 2 Conditioning on an event 1 with probability 1 ≔ ( 1 ), where = 0 upon the complementary event 2 All common distribution functions used in a wide range of stochastic applications have elegant analytical expressions either of their moments or the cumulants of any order, and in some cases of both. These are gathered in Table 2 for distributions with finite domain, in Table 3 for distributions with infinite domain, but with all their moments finite, and in Table 4 for the heavy-tailed distributions with upper-tail index ξ; in the latter case, both moments and cumulants exist for < 1/ and are infinite for larger p. The following notes apply to these tables: 1. The meaning of the parameters is the following. 3. Distributions named "half" have their "full" version whose density ( ) and tail function � ( ) are obtained by dividing those given in the tables by 2. The "half" version given in the tables corresponds to ≥ 0, while in the "full" version ∈ ℝ. The moments ′ of the "full" version is: (a) for even p, 0; (b) for odd p, equal to those of half version. 4. All other distributions, defined for ≥ 0 but not named "half", can also be extended to the whole real line by replacing with | | and dividing ( ) by 2. Again, the moments ′ of this extended version is: (a) for even p, 0; (b) for odd p, equal to those of original version.

Second Order Properties
For a stochastic process ( ) in continuous time t or in discrete time τ, we define the cumulative process ( ) ≡ , for continuous time scale ≔ , where κ denotes discrete time scale, as: The time average of the original process for discrete time scale κ is The variability of the time-averaged process is quantified by the variance: This can be extended to a continuous-time process, for which Clearly, this is a function of the time-scale κ and is termed the climacogram of the process, from the Greek climax (κλίμαξ, meaning scale) [41].
For sufficiently large k (theoretically as → ∞ ), we may approximate the climacogram as: where is termed the Hurst parameter. The theoretical validity of such (power-type) behavior of a process was implied by Kolmogorov (1940) [42,43]. The quantity 2 − 2 is visualized as the slope of the double logarithmic plot of the climacogram for large timescales. In a purely random process, = 1/2, while in most natural processes 1/2 ≤ ≤ 1, as first observed by Hurst in 1951 [44]. This natural behavior is known as LRD, (longterm) persistence or Hurst-Kolmogorov (HK) dynamics. A high value of (approaching 1) indicates enhanced presence of patterns, enhanced change and enhanced uncertainty (e.g., in future predictions). A low value of (< 1/2) indicates enhanced fluctuation or antipersistence.
A stochastic process ( ) for which the property (21) is valid not only asymptotically, but precisely for any scale k, i.e., where and are scale parameters with units of time and [ 2 ], respectively, is termed the Hurst-Kolmogorov (HK) process [12].
The HK process is a simple mathematical model offering acceptable approximations for large scales, but it is not physically plausible for small scales because it yields infinite variance of the instantaneous process (as → 0) [45]. Therefore, filtered versions thereof (FHK) with finite variance at all scales are better options to model natural processes. Here we use two versions of FHK, namely: • The generalized Cauchy-type FHK (FHK-C) with climacogram: • The mixed Cauchy-Dagum-type FHK (FHK-CD) climacogram: In addition to the Hurst parameter , which characterizes the global scaling behavior, when → ∞ , the filtered models include a second scaling exponent characterizing the local scaling (or smoothness or fractal behavior) when → 0 . Furthermore, the FHK-CD model contains two scale parameters of state, 1 and 2 , instead of the single of the FHK-C, offering greater flexibility.
Once the model climacogram is given, all other second-order properties of the process are uniquely determined through simple mathematical expressions. Thus, the autocovariance function in continuous and discrete time, for lags ℎ and = ℎ/ , respectively, is derived from the climacogram through the relationships [2,12]: for continuous time and for discrete time, where cov[] stands for covariance. Finally, the power spectrum ( ) of the process is the Fourier transform of the autocovariance, so that: for continuous time and for discrete time.

Stochastic Simulation
To simulate the discrete-time stochastic process with any autocovariance function we can use the generalized moving average scheme [1,11,12]: where are weights to be calculated from the autocovariance function, is white noise averaged in discrete-time (in the general case assumed non-Gaussian) and J is theoretically infinite, so that in all theoretical calculations we assume = ∞, while in the generation case J is a large integer chosen so that the resulting truncation error be negligible.
As explained in [1], the above scheme is opposite to the common schemes of the time series school. Specifically, (a) we use a purely moving average scheme without any autoregressive term and (b) we do not connect the generating scheme with observations, as the observations have already been used in the model-fitting phase, which is totally isolated from generation. Specifically, the fitting consists of a choice of an appropriate climacogram expression such as (23) or (24) and the estimation of its parameters, as well as the choice of a distribution function, such as those contained in Tables 2-4, and the estimation of its parameters. This tactic assures modelling parsimony. More details on the fitting procedure, which is not covered here, can be found in [2]. Here we only stress the methodological suggestion that we never estimate from data classical moments and cumulants of order greater than 2, because these are unknowable from data [31]. While the methodology that we follow heavily depends on high-order moments and cumulants, it is stressed that these are determined by theoretical calculations and never from the data.
Assuming unit variance of the white noise , writing Equation (29) for + , multiplying it by (29) and taking expected values we find the convolution expression for = ∞: We need to find the sequence of , = ⋯ , −1,0,1, …, so that (30) holds true. The following generic solution of the generating scheme, giving the coefficients , has been proposed by Koutsoyiannis [1]: where i ≔ √−1, ( ) is any (arbitrary) odd real function (meaning (− ) = − ( )) and As proved by Koutsoyiannis [1], the sequence of : 1. Consists of real numbers, despite the expression in (31) involving complex numbers; 2. Satisfies precisely Equation (30); and 3. Is easy and fast to calculate using the fast Fourier transform (FFT).
This theoretical result is readily converted into a numerical algorithm, which consists of the following steps [1]: 1. From the continuous-time stochastic model, expressed through its climacogram ( ), we calculate its autocovariance function in discrete time (assuming time step ) by Equation (26). (This step is obviously omitted if the model is already expressed in discrete time through its autocovariance function). 2. We choose an appropriate number of coefficients J that is a power of 2 and perform inverse FFT (using common software) to calculate the discrete-time power spectrum and the frequency function R ( ) for an array of = 1 , = 0,1, … , , 1 ≔ 1⁄ : 3. We choose ( ) (see below) and we form the arrays (vectors) R and I , both of size 2J indexed as 0, … , 2 -1, with the superscripts R and I standing for the real and imaginary part of a vector of complex numbers, respectively: 4. We perform FFT on the vector R + i I (using common software), and get the real part of the result, which is precisely the sequence of .
By choosing J as a power of 2, the vectors R and I will have size 2J which is also a power of 2, thus maximizing the speed of the FFT calculations. (More details are contained in a supplementary file in [1], which includes numerical examples along with the simple code needed to do these calculations on a spreadsheet).
Remarkably, Equation (31) gives, instead of a single solution, a family of infinitely many solutions. All of them preserve exactly the second-order characteristics of the process and each of them is characterized by the chosen function ( ). Even assuming ( ) = 0 sign with constant 0 , again there are infinitely many solutions, each one characterized by the value of 0 . Also, even if the sequence of � � is constructed as a sequence of random numbers, again Equation (30) will be satisfied and the resulting can be directly used in generation. The availability of infinitely many solutions enables preservation of additional statistics, such as those related to time asymmetry [1,27].
The special case ( ) = 0 gives a symmetric solution with respect to positive and negative η: where the superscript S stands for symmetric. This has been known as the symmetric moving average (SMA) scheme [11], while any other solution denotes an asymmetric moving average (AMA) scheme.
In addition, there exist several options related to the distribution of the white noise , which in general is not Gaussian. Hence, preservation of moments and cumulants of any order becomes possible. Specifically, by virtue of Equation (13), the pth cumulants and ( ) of the processes and , respectively, are related by: Solving for ( ) we find: Given the so-calculated ( ) for any order p, the distribution function of the white noise is fully determined.

Distribution Function Approximation
A problem usually met in practice, including in the present simulation framework, is to approximate a distribution function up to an order max . A convenient way to make the approximation is to choose a number L of elementary distribution functions from Tables  2-4 , thus, defining the white-noise processes , = 1, … , , and obtaining the approximation ′ of as a linear combination of with weights ′ , i.e.,: The cumulants ( ) of are then determined from Tables 2-4 and those of ′ , by virtue of (13), are: The goodness of the approximation up to order max is given by an error expression such as: where the second form ( 2 ) is more appropriate if all cumulants are positive and increasing fast. In order for the above equations to work in all cases, even when is negative and p is even, the quantity � � 1⁄ is meant to denote the quantity sign� � | | 1⁄ ; this convention is followed throughout the entire paper. By minimizing either 1 or 2 using a common solver, we simultaneously find the series of weights ′ and the parameters of the marginal distribution of each of . Further details will be given in the applications of Section 3, where it will also be seen that, for a sufficient approximation, the number of constituent distributions L of is small, usually 1 or 2. It is stressed that, in each of the above error expressions, we have intentionally excluded the error of the cumulants of order 1, i.e., the mean values. Therefore, we expect that with this procedure the mean will not be preserved. However, this can be easily tackled by adding a constant to ′ . Apparently, the required shift should be Based on the above approximation, the generation process will produce the stochastic process where, if the approximation is satisfactory, we reasonably expect that the statistical properties of ′ will be equal to those of . This proves to be always the case if the domain of the stochastic variable is unbounded in both directions (i.e., ∈ ℝ ), but some additional manipulation (post-processing) may be needed if the domain of is not the entire real line, or if the distribution function of has discontinuities, as will be illustrated in the applications of the next section.

Applications and Results
We illustrate the methodology by five applications for bounded as this case is more demanding (the unbounded case is much easier). Three applications are synthetic mathematical examples used as benchmarks, namely the exponential distribution, which is bounded from below, and the uniform distribution, which is bounded from both below and above. The next two are real-world applications dealing with one of the most challenging natural processes, namely the precipitation process, which is bounded from below (by 0), highly intermittent, and with heavy distribution tail. The latter two applications refer to two different time scales, fine (hourly) and coarse (annual). In the synthetic example with the exponential distribution and in the two real-world applications, the stochastic processes are persistent with a large Hurst parameter, ranging from 0.80 to 0.92. In the synthetic examples of the uniform distribution, we use both a persistent and an antipersistent process, with Hurst parameters 0.70 and 0.20, respectively.

Simulating a Persistent Process with Exponential Distribution
For a process with exponential distribution, which is a subcase of the gamma distribution, there exist generation algorithms for the case of short-range (Markov) dependence (e.g., [46]). As already mentioned, a more general algorithm for generation of any type of marginal distribution has recently been proposed by Lombardo et al. [28], but again under the condition of the Markov dependence. However, the method proposed here can generate such a process irrespective of the type of the dependence, whether SRD or LRD.
For illustration we assume an FHK-C model (Equation (23)) with parameters = 0.8, = 0.5, = 1, 0 = 1.32, so that 1 = 1. The FHK-C climacogram is shown in Figure  1b, marked as "theoretical", while the resulting autocorrelation function is shown in Figure  1c. As in the exponential distribution (from Table 3), = √ 1 = 1, the cumulants of the process are = ( − 1)!. These are depicted in Figure 1a, along with the cumulants of determined from Equation (38), where, to avoid big numbers, the quantities 1/ are plotted. The coefficients , needed to evaluate ( ) in Equation (38), are determined from the SMA (symmetric) generation scheme (Equation (36)) with = 1024. Coming to the approximation ′ of , we use two constituents with gamma distributions and allow a discontinuity at = 0 in each of them. Assuming unit variance in each of them, from the equations of Table 3 we have 2 = 1, so that the continuous part of the distribution is fully determined by the shape parameter . Hence, the approximation ′ , according to Equation (39), is determined by the parameters  Figure 1a, where it can be seen that they are indistinguishable from those of and thus the achieved approximation is very good.
The generation of values of ′ is quite easy using a random number generator for the gamma distribution. From a series of random numbers ′ , a total of = 10,000 values of are then determined from Equation (29). A small number (6.6%) of them are small negative values. To remedy this problem, we reflect these values about zero, or, in other words, replace with − . Theoretically, this remedy will have a distorting effect in the multivariate distribution of , but in fact, this effect turns out to be negligible.
Comparison of the theoretical statistical characteristics of the distribution of to the empirical ones of the generated sample are shown in the panels of Figure 1. In the  (Figure 1b), the plotted points correspond to unbiased estimates of variance; this is achieved by adding the quantity ( ) = 0.0331 to the classical statistical estimates, as explained in [2]. The empirical climacogram agrees well with the theoretical one. The empirical autocorrelation is shown in Figure 1c. Here, the bias correction was applied using an approximate method from [47], according to which the unbiased estimate is the weighted sum of the classical autocorrelation estimate and the number 1, with the weight of the latter being equal to 1/ ', where ′ ≔ (1)/ ( ) is the so-called equivalent sample size of any process, and differs substantially from if the process is persistent [48]. (We note that a precisely unbiased estimate of autocovariance has been provided by [49] but this is more laborious). Finally, Figure 1d shows a comparison of the theoretical and empirical marginal distribution of . The empirical distribution of each value of the generated time series, arranged in ascending order, so that ( : ) be the ith smallest value of the series of n values, was estimated on the basis of unbiasedness of the logarithm of return period ( : ) . As shown in [2], this estimate is Again, the agreement between theoretical and the empirical distributions is very good. For comparison, a conventional method using an ARMA(1,1) model and a normalizing transformation is given in the Appendix A for the same case study.

Simulating a Persistent Process with Uniform Distribution
The simulation of a persistent process with uniform distribution is more demanding because of the double boundedness and the sharp discontinuities of the density function at the bounds, while linear generation procedures tend to generate unbounded processes with smooth density. On the other hand, the double boundedness offers an option of approximation with a process ′ that takes on a finite number of values. In other words, we assume that the stochastic variable ′ is discrete, taking on values ′ with probabilities , as illustrated in Figure 2. The details of this approximation will be explained in a while. Despite ′ being assumed discrete, thanks to the fact that the generation of via Equation (29) involves a linear combination of very many variables ′ , the variable will in effect be continuous. As in the previous case, for illustration, we assume an FHK-C model (Equation (23)) with 1 = 1. We note that the fourth cumulant of this uniform distribution, which in this The fourth cumulant of ( 4 ( ) ) should necessarily be lower than that ( 4 ( ) < −1.2) for a persistent process. On the other hand, it is known than the kurtosis of any distribution cannot be lower than −2. Therefore, the margin for having a positively autocorrelated process with uniform distribution is rather small. An FHK-C model with parameters = = 0.7, = 1, 0 = 1.346 (so that  Figure 3b, marked as "theoretical", while the resulting autocorrelation function is shown in Figure 3c. In order for the uniform distribution to have variance 1 = 1, its upper bound should be = √12 = 3.464, with lower bound = 0. The cumulants of the process , determined from Table 2 and Equation (9), are shown in Figure 3a, along with the cumulants of determined from Equation (38) (for the convention used for 1/ for negative quantities and p even, see the note in Section 2.4 below Equation (41)). The coefficients , needed to evaluate ( ) in Equation (38), are determined from the SMA (symmetric) generation scheme (Equation (36)) with = 1024.  Figure 1. A difference is that in panel (d), instead of estimating the return period of each ( : ) (the ith smallest value of the series of n values), we give the non-exceedance probability ( ), estimated on the basis of its unbiasedness. In this case, the unbiased estimate is [2]: The approximation ′ of is done through the discretization of the former described above. Twenty equidistant ′ with probabilities are assumed, where ′ = ⁄ , = 1, … ,20 . The distribution of ′ was assumed symmetric, i.e., = 21− , so that the unknown parameters to be optimized are ten, namely, 1 , … , 10 . These are calculated by minimizing 1 in Equation (41), assuming max = 10. The resulting values are shown graphically in Figure 2. It is remarkable that the distribution of ′ is far from uniform, despite the fact that the cumulants of ′ , as seen in Figure 3a, are not very different from those of , which has uniform distribution. The cumulants of ′ , also plotted in Figure  1a, are indistinguishable from those of ; thus, the achieved approximation is very good. An exception is seen in the first cumulants of ′ and , which are quite different; thus, the required shift of Equation (42) is not negligible, namely = −1.503.
The generation phase is quite easy, as values of ′ are readily generated by inversetransform sampling, given the staircase-like distribution function of a discrete stochastic variable. A total = 10 000 values of are then generated from Equation (29). A small number (~2%) of them are either small negative values or somewhat greater than . As in the previous case, we reflect the negative values about zero, replacing with − . Likewise, we reflect the very high values about , replacing with 2 − .
In all panels of Figure 3, the agreement between theoretical and the empirical characteristics is very good.

Simulating an Antipersistent Process with Uniform Distribution
For further illustration, we examine the same uniform distribution as above but for an antipersistent process (with < 1 2 ⁄ ). Actually, this case is easier as the changes in kurtosis is smaller than in the previous case; thus, feasibility of the solution is assured.
Again, an FHK-C model was assumed, now with parameters = 0.2, = 0.8, = 1, 0 = 2 (so that 1 = 1, while 4 ( ) = −1.265). All other choices are the same as in the previous application (e.g., upper bound = √12 = 3.464, etc.) The approximation ′ of through discretization is depicted in Figure 4. Again, this differs substantially from the uniform distribution, even though the cumulants of ′ , as seen in Figure 5a, are virtually indistinguishable from those of and . Yet there is a substantial difference in the first cumulants of ′ and , so that the required shift of Equation (42)    Cumulant's pth Comparisons of the theoretical statistical characteristics of the distribution of to the empirical ones of the generated sample are shown in the panels of Figure 5. In all panels the agreement between theoretical and the empirical characteristics is very good.

Simulating the Precipitation Process at the Hourly Time Scale
Here we use a recently developed [2] full stochastic model of the precipitation process at any time scale k. This model gives directly the ombrian relationships (else known as intensity-duration-frequency curves) but it also provides any stochastic characteristic of the precipitation process that is required for stochastic simulation. Furthermore, in [2] this model has been applied to construct the ombrian curves by fitting the model in some locations, but the model was not used for stochastic simulation. Among the locations studied in [2], here we provide a stochastic simulation for rainfall in Bologna, using the parameter values fitted there. The application in this subsection is for the hourly scale, while an additional application for the annual scale is given in the next subsection.
The model is based on the following assumptions, which are mathematically consistent (with one exception as detailed below): 1. Pareto distribution with discontinuity at the origin for small time scales (Table 5, Equation (46), left). The tail index ξ is constant for all time scales k, while the probability wet, 1 ( ) , and the state scale parameter, ( ), are functions of the time scale k. 2. Continuous PBF distribution, possibly with discontinuity at zero, for large time scales (Table 5, Equation (46), right). In this case, a new parameter ( ) is introduced, which is again a function of time scale. The Pareto distribution is a special case of the PBF for ( ) = 1. In contrast to the Pareto distribution, whose density is a consistently decreasing function of , the PBF tends to be bell-shaped for increasing ( ), a property consistent with empirical observation and reason. 3. Constant mean of the time-averaged process.

4.
Climacogram of type FHK-CD (Equation (24)), where to reduce the number of parameters it is assumed that = 1 − , thus getting Equation (48) in Table 5. By inspection of Equation (48), it is seen that, as → ∞, ( ) → 0, which makes the process ergodic; for = 0 , (0) = 0 = 1 + 2 , which is finite, as required for physical consistency. 5. Probability wet and dry, 1 ( ) = 1 − 0 ( ) , varying with time scale according to Equation (49) in Table 5. It is clarified that two different expressions are used for the small and the large scales, where the transition time scale from the Pareto to the PBF distribution is denoted as * . In the Pareto case, 1 ( ) can be determined directly from the climacogram and the mean (left column of Equation (49) in Table 5). For the PBF case, an additional equation is required, which has been derived based on maximum entropy considerations [50] and involves an additional parameter (0 ≤ ≤ 1). Continuity of the transition demands that ( * ) = 1. Distribution function, Climacogram, ( ) 1 �1 + � 2 −2 Probability wet, 1 Lower tail index (inverse), Upper tail index, Scale parameter (inverse), Both the decreasing (Pareto) and the bell-shaped (PBF) types of probability densities are consistent with natural behaviors for small and large time scales, respectively. It can be seen that the tail index of the PBF distribution in the form in Table 4, is not but ′ = / ( ) and tends to zero as → ∞. For large time scales, this violates a requirement of a constant tail index, which is theoretically justified in [2]. The alternative to keep a constant tail index would result in a finite variance as → ∞ (with a coefficient of variation /�1 − 2 ), i.e., in a nonergodic process, which clearly is not an option in stochastic simulation.
To complete the model, the functions ( ) and ( ) should be determined from the mean μ and the climacogram ( ). This has been done in [2] and the results are shown in Table 5. The final relationships rely on the mean μ, the climacogram ( ), the probability wet 1 ( ) and the tail index . For the precipitation process in Bologna, the following model parameters have been estimated in [2], while the transition time scale was set * = 96 h:  Figure 6b (marked as "theoretical"), while the resulting autocorrelation function is shown in Figure 6c. The cumulants of the process are shown in Figure 6a, along with the cumulants of determined from Equation (38). The coefficients , needed to evaluate ( ) in Equation (38), are determined from an AMA (asymmetric) generation scheme (Equation (31)) with = 1024 and phases generated randomly (this contributes to a realistic shape of generated rainfall events). For the approximation ′ of , we use a single Pareto distribution and allow a discontinuity 1 at ′ = 0 . For mathematical consistency, the tail index of ′ should necessarily be = 0.121, so that the moments of order beyond 1/ξ = 8.2 be infinite as is the case with the moments of . The other parameters of the Pareto distribution of ′ are calculated by minimizing 2 in Equation (41), setting max = 8, and are found to be ( ′ ) = 3.681, 1 ( ′ ) = 0.0171, while the required shift of Equation (42) is negligible ( = 0). The cumulants of ′ are also plotted in Figure 6a, where it can be seen that they are indistinguishable from those of and thus the achieved approximation is very good.
Because of the very small value of 1 ( ′ ) , a very large number of ′ (98.3%) will be zero. The nonzero values will determine the locations of rainfall events, i.e., sequences of non-zero . It is not reasonable to make these locations purely random and for this reason we devised the following procedure. A first model run is done with 1 ( ′ ) = 1 (no zeros).
Subsequently, we find a threshold 0 so that the fraction of values that are greater than . In a second model run we set ′ = 0 at those τ where in the first run < 0 . For the remaining τ, we generate ′ from the continuous part of ′ . This procedure allows clustering of the precipitation events, as typically happens in reality. Cumulant's pth The values in the second run will unavoidably be nonzero, because the generating Equation (29) involves a linear combination of very many ′ and this can hardly result in zero values. Therefore, post-processing of the generated time series is required, in order to reinstate the required number of zeros. This consists of replacing by ′ , determined as: where 1 , and are the parameters of the post-processing phase. These are determined by minimizing the total error (in effect making it zero) in preserving the probability wet, and the first and second cumulants of the distribution. In our application, the postprocessing parameters have been found to be 0 = 3.18 mm h ⁄ , 1 = 1.15 mm h ⁄ , = 1.877, = 0.832.
Comparisons of the theoretical statistical characteristics of the distribution of to the empirical ones of the generated sample, both before and after post-processing, are shown in the panels of Figure 6. The empirical climacogram is shown in Figure 6b. Before post-processing, there is a marked difference of the empirical climacogram from the theoretical. This does not indicate a weakness of the algorithm. It just reflects the fact that, with a Hurst parameter as high as = 0.92, there is high uncertainty and variability, while a sample of = 10 000 is too short to eliminate this uncertainty; note that the equivalent sample size (which indicates the sampling variability) in this case is ′ ≔ (1)/ ( ) ≈ 7 instead of = 10 000 . Interestingly, the post-processing substantially decreases the difference from the theoretical curve. The improvement due to postprocessing is spectacular in panel (d), which shows a comparison of the theoretical and empirical marginal distribution of . Before post-processing, even though the cumulants are preserved, the initially generated small values are problematic as no zero values are generated. This is fully remedied by the post-processing technique. Finally, panel (c) shows that the autocorrelations are well preserved both before and after post-processing.
Further information on the form of the generated time series is provided in Figure 7

Simulating the Precipitation Process at the Annual Time Scale
The same precipitation model as in the previous subsection was used for generation at the annual scale. Now the distribution is no longer Pareto but PBF, whose treatment is more laborious. On the other hand, the probability dry at the annual scale is zero, and thus the distribution is continuous. This makes the generation simpler as no postprocessing is required.
While at the hourly scale all cumulants are positive, tending fast to infinity ( Figure  6a), at the annual scale, some of the cumulants (most notably the fourth) are negative (Figure 8a). According to the model, again the cumulants tend to infinity, but for much higher p (>33) as now ′ = 0.030. The other parameters of the PBF distribution are ( ) = 4.00 and ( ) = 0.089 mm/h. The approximation ′ of is made by another PBF distribution with slightly different parameters, ( ′ ) = 4.01 and ( ′ ) = 0.098 mm/h. As seen in Figure 8a, the achieved approximation is good, except for a substantial difference in the first cumulants of ′ and , so that the required shift of Equation (42) is not negligible, = 0.0871 mm/h.
Comparisons of the theoretical statistical characteristics of the distribution of to the empirical ones of the generated sample are shown in the panels of Figure 8. In all panels, the agreement between theoretical and empirical characteristics is very good.

Discussion and Conclusions
Stochastic simulation of complex processes necessarily relies on approximations of distribution functions. Typically, these approximations are made with reference to the normal distribution, e.g., the Gram-Charlier series, the Edgeworth approximation, etc. [37,51]. These, however, are not good for simulation as no generic random number generation algorithms are available for such type of approximations. They can also be too complicated. Here, we provide more general and more powerful approximations of distribution functions based on cumulants. These are quite flexible and can have several forms, such as (a) the sum of a few (e.g., two or even just one) stochastic variables with typical distributions of an appropriate type (such as those contained in Tables 2-4); (b) the occasional involvement of discontinuities in constituent distributions (usually at their lower bounds); and (c) the discretization of the stochastic variable, in the case that its domain is bounded from both above and below. As random number generation algorithms are readily available for these typical distributions, the proposed approximation is useful in stochastic simulation.
The approximation of a distribution via cumulants turns out to provide very powerful means for stochastic simulation of processes of any type, with short-and longrange dependence. The combination of this approximation with the asymmetric (AMA)  or symmetric (SMA) moving average generation schemes can tackle demanding simulation processes. The genuine stochastic simulation approach that is studied, which does not perform transformations of the stochastic variables involved, is useful, convenient, and powerful. This is particularly the case for problems where time directionality is important; it is reminded that a Gaussian process, even when (back) transformed to non-Gaussian by any nonlinear transformation, cannot provide a process with time asymmetry.
The case studies conducted confirm the excellent performance of the method for a variety of demanding problems and a variety of distributions and time scales. In particular, the long-range dependence, however high, as well as the antipersistence, do not entail any difficulty in applying the method. In contrast, some characteristics of the marginal distribution, such as single or double boundedness, and especially the possible intermittence, may cause difficulties. For this reason, all case studies conducted involve non-Gaussian marginal distributions that are bounded, thus making the problems more challenging. These include distributions double-bounded, such as uniform, and singlebounded, such as exponential, Pareto and PBF, with the Pareto distribution also having a discontinuity at the origin (intermittence). The examples studied show how the problems of boundedness and discontinuity can be handled through simple post-processing procedures, thus achieving an overall satisfactory performance.
In conclusion, the method seems promising and expandable to several future research directions, such as multivariate stochastic modelling, downscaling, disaggregation, and stochastic modelling of two or more processes simultaneously, particularly in cases where time directionality is important (e.g., rainfall-runoff modelling at small time scales).
Stochastic simulation has recently acquired tremendous importance, as conventional energy sources are being replaced with renewables, whose nature is stochastic and, thus, their assessment needs stochastic tools. Its utility should now be appreciated more than ever, after various spectacular failures of aspirations to achieve satisfactory predictions of geophysical processes in deterministic terms, and after reconciliation with the fact that uncertainty is an intrinsic characteristic of nature, not subject to elimination. Acknowledgments: Discussions with Theano Iliopoulou, Federico Lombardo, and Ioannis Tsoukalas helped us with the method conceptualization. We thank the two anonymous reviewers for the positive evaluation of the paper and their helpful comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Comparison with a Conventional Approach
This Appendix A (not contained in version 1 of our paper) was added, following a suggestion by an anonymous reviewer that it would be good if the paper contained comparisons with traditional approaches, which include transformations from Gaussian processes. As a traditional approach we choose the ARMA(1,1) model, and as a case study, we use the one presented in Section 3.1, which deals with the exponential distribution. (The case studies of the Sections 3.2-3.5 can hardly be dealt with using traditional approaches). As the exponential distribution assumed in this case study is a special case of the gamma distribution, we use the traditional Wilson-Hilferty-Kirby transformation [52], which transforms a standard Gaussian variable to a variable with approximately (three-parameter) gamma distribution with mean 0, standard deviation 1 and coefficient of skewness s . In its original (Wilson-Hilferty) form, the transformation is: Kirby [52] gave a better approximation by modifying the transformation in the following form: where , , , are coefficients depending on s , given by Kirby [52] in tabulated form, except for , which is calculated as Plugging in Equation (A2) we see that if the value of is too low (strongly negative), then the lowest admissive value of M is −2/ s . For the exponential distribution, s = 2 and the tabulated values are = 1.03571, = 0.99968, = 1.93606, while is calculated to 0.32446. We note that the so-calculated variable M has lower bound −1, and hence to achieve the standard exponential distribution we have to take M + 1.
The ARMA(1,1) model for the Gaussian process is where is Gaussian white noise with mean 0 and variance 2 , and and are model parameters. Given the model parameters, the autocovariance of the process is given as follows [2]: In our case study, we have 0 = 1, 1 = 0.701, 2 = 0.509, while the model cannot preserve autocovariances for lag higher than 2. The resulting model parameters (obtained by a solver) are 2 = 0.509, = 0.727, = −0.0517.
We expect that the approximate transformation (A2), by construction, will give variance 1, which in the case study is equal to 0 . However, there is no guarantee that the values of 1 , 2 will be preserved after applying the transformation. An analytical calculation of the values of 1 , 2 after the transformation is not possible and therefore, we have to resort to numerical methods [19][20][21][22][23][24], of which a Monte Carlo method is the easiest. However, for simplicity here we assume that the changes in the autocorrelations, 1 = 1 / 0 , 2 = 2 / 0 are negligible. With this assumption, we easily run the model to generate 10 000 synthetic values, from which we constructed Figure A1. This should be viewed in comparison to Figure 1. One can see in Figure A1c that the transformation is satisfactory in preserving the marginal distribution. The problems appear in the climacogram and the autocorrelogram. Clearly, the conventional ARMA model cannot reproduce the LRD. On the other hand, the autocorrelations 1 , 2 are preserved and indeed the changes due to the transformation are negligible, which confirms the validity of our assumption.