#### 4.1. Preprocessing: Parameter Estimation and Matrix Preparation

In order to specify the wet and dry occurrences, a random uniform variate y ~U(0, 1) must be drawn and compared with the transition probabilities obtained from Equations (1) and (2). For multi-site precipitation, the anomalies (referred to as Y ϵ ℝ) that identify the states in k locations must be correlated so that the generated states S are correlated to the historic observations. Wilks’ method was selected to generate correlated anomalies Y~ N(0,1) at multiple sites. It is simple and more efficient than hidden Markov and k-nearest neighbor methods [

52], accurate in generating the correlations of monthly interstations [

53], and the most cited method compared to other approaches [

54].

Assume S (1,m) and S (2,m) are the precipitation states on month m at sites k = 1 and k = 2. To generate realistic sequences of the precipitation states at these two sites, the correlation (ω) between their corresponding anomalies Y, ω

_{(1,2)} = corr (Y

_{(1,m)}, Y

_{(2,m)}) must be computed. The parameter ω was determined by generating different sets of Ý at the two sites with different arbitrary correlation values {ώ

_{1}, ώ

_{2}, …}, ώ

_{1} =corr (Ý

_{(1,m)}, Ý

_{(2,m)}), identifying the precipitation states at the two locations Ś

_{1} and Ś

_{2}, and calculating the corresponding correlation {έ

_{1}, έ

_{2}, …}, έ

_{1(1,2)} = corr (Ś

_{(1,m)}, Ś

_{(2,m)}). Then, a regression line between έ and ώ sets was fitted to identify the relationship between them. Using this regression equation with the observed precipitation state correlation ξ, the parameter ω can then be found. A synthetic example is shown in

Figure 2a, in which selecting a 0.858 correlation between the pair anomalies (ω) will produce 0.785 correlation between the pair states (ξ) at the two locations.

The process should be repeated for each station pair and lead to the number of realizations of k (k-1)/2 and be repeated for each month m to create the anomalies matrix ω

_{s} ∈ ℝ. The ω

_{s} matrix is then used to develop Y that produces correlated precipitation states in k locations for month m, using the multivariate normal distribution as follows,

The variable µ_{y} denotes the 1-D mean vector for the anomalies Y, Σ denotes the covariance matrix, and d is an independent parameter. In this case, µ = [0, 0, …, 0]_{k×1} and the variance is 1, so the covariance matrix Σ_{s} becomes the correlation matrix ω_{s}.

The matrix

ω_{s} must be a positive-definite matrix (e.g., the matrix is symmetric and all its eigenvalues are positive) to be implemented in Equation (13). Since the elements of

ω_{s} were calculated empirically,

ω_{s} is usually a non-positive matrix. Comparing to the work of others, the most precise method to obtain a positive-definite matrix is the iterative spectral with Dykstra’s correction (ISDC) [

55], as follows:

- 1)
Assume ${\omega}_{i}=\omega $, $\Delta {\Omega}_{i}=0$, and $i=1$, in which ω is a non-positive-definite correlation matrix.

- 2)
Let ${R}_{i}={\omega}_{i}-\Delta {\Omega}_{i}$.

- 3)
Find ${L}_{i}$, and ${\Omega}_{i}$, such that ${R}_{i}={\Omega}_{i}{L}_{i}{\Omega}_{i}^{T}$.

- 4)
Replace the negative eigenvalues of ${L}_{i}$ by a small positive value to construct ${L}_{i}^{+}$.

- 5)
Set ${\omega}_{i+1}={\Omega}_{i}{L}_{i}^{+}{\Omega}_{i}^{T}$ and ${\Delta \Omega}_{i+1}={\omega}_{i+1}-{R}_{i}$. Then, replace all ${\omega}_{i+1}$ diagonal elements with 1.

- 6)
Test whether ${\omega}_{i+1}$ is a positive-defined matrix or not. If not, repeat the steps from two to six by making $i=i+1$ and ${\omega}_{i}={\omega}_{i-1}$.

After generating the matrix S at k and m, the next step is to simulate the weather variables (e.g., P, T

_{X}, T

_{N} and WS). The idea here is to examine the anomalies of these variables and generate the weather variables with the same observation properties. To account for all the spatial and cross correlation between the variables, their anomalies (

θ,

ʋ,

δ, and

λ) must be correlated. The temporal correlation, identified by the Lag-1 day auto-correlations, for the T

_{X}, T

_{N} and WS must also be considered. Since the precipitation amount is an intermittent variable, the auto-correlation is not considered. The following procedure was suggested to achieve this purpose. First, arrange the weather variable matrix V as follows,

where,

V represents the observed weather variable value and n denotes the weather variable rank (P, T

_{X}, T

_{N} and WS),

n = {1, 2, 3, 4}. The total number of the rows will be T = month days × year numbers, the columns will be K × N, and the aisle will be M. This matrix arrangement enables us to consider all the spatial and cross correlations between the weather variables. Next, extract the anomalies matrix Z ∈ ℝ from V using Equations (3) and (4) for P; Equations (6)–(9) for T

_{x} and T

_{N}; Equations (11) and (12) for the WS, after estimating their parameters (e.g., µ

_{p}, σ

_{p}, γ

_{p} for P, μ

_{X0}, μ

_{X1}, σ

_{X0}, σ

_{X1} for T

_{x}, μ

_{N0}, μ

_{N1}, σ

_{N0}, σ

_{N1} for T

_{N}, and α

_{0}, α

_{1} β

_{0}, β

_{1} for the WS).

The Z matrix represents the anomalies of the weather variables and their elements have spatial, cross-, and auto-correlation magnitudes. To generate the Z matrix with the same observation properties, these correlations must be preserved. The first step done here was to estimate autoregressive model of order 1, AR(1), coefficients for the anomalies (φ_{z}) so that the generated variables have the observed AR(1) value (φ_{v}) applying Wilks’ technique. For illustration, synthetically assume that the values of μ_{X0}, μ_{X1}, σ_{X0}, σ_{X1} are 11.72, 9.12, 3.71, 2.21 (C^{o}/day), respectively, and φ_{v} is 0.82 at station k of month m. The adopted procedure for obtaining the φ_{z} is as follows:

- 1)
Generate the standard normal random deviate set y; y ~ N (0,1).

- 2)
Use y with Equations (1) and (2) to identify the dry and wet days.

- 3)
Generate a standard normal random deviate set x; x ~ N (0,1).

- 4)
Apply the AR(1) of arbitrary values between –1 and 1 (e.g., φ’_{z}).

- 5)
Obtain the anomalies z by standardizing x of Step 4.

- 6)
Apply Equations (6) and (7) to obtain T’_{X}.

- 7)
Calculate the AR (1) of T_{X} (e.g., φ’_{v}) and plot versus the φ’_{z}, then regress them.

- 8)
Use the regression equation obtained in Step 7 with the observed value φ

_{v} (e.g., 0.82) to determine φ

_{z}. In this case, 0.88 (as shown in

Figure 2b).

This procedure must be done for all T

_{x} and T

_{N} of each k and m. For the WS, the procedure is the same except for Step 5, converting x so it is uniformly distributed to get the WS anomalies. For example, let us assume that α

_{0}, α

_{1}, β

_{0}, and β

_{1} are 4.04, 3.22, 0.62, 0.71, respectively, and the φ

_{v} is 0.54. The corresponding φ

_{z} will be 0.56, as shown in

Figure 2c. This procedure allows us to preserve the auto-correlation of T

_{x}, T

_{N}, and the WS.

The final step of the preprocessing stage is to construct the positive-definite correlation matrix of the variable anomalies ω_{V}, as done for precipitation states using ISDC. Building the ω_{V} allows us to preserve all the spatial, temporal, and cross correlations between the variables.