Simulation of Wave Time Series with a Vector Autoregressive Method

Valsamidis, Antonios; Cai, Yuzhi; Reeve, Dominic E.

doi:10.3390/w14030363

Open AccessFeature PaperArticle

Simulation of Wave Time Series with a Vector Autoregressive Method

by

Antonios Valsamidis

¹,

Yuzhi Cai

²

and

Dominic E. Reeve

^1,*

¹

Coastal Engineering Research Group, College of Engineering, Bay Campus, Swansea University, Swansea SA1 8EN, UK

²

School of Management, Bay Campus, Swansea University, Swansea SA1 8EN, UK

^*

Author to whom correspondence should be addressed.

Water 2022, 14(3), 363; https://doi.org/10.3390/w14030363

Submission received: 7 December 2021 / Revised: 17 January 2022 / Accepted: 20 January 2022 / Published: 26 January 2022

(This article belongs to the Section Oceans and Coastal Zones)

Download

Browse Figures

Versions Notes

Abstract

:

Joint time series of wave height, period and direction are essential input data to computational models which are used to simulate diachronic beach evolution in coastal engineering. However, it is often impractical to collect a large amount of the required input data due to the expense. Based on the nearshore wave records offshore of Littlehampton in Southeast England over the period from 1 September 2003 to 30 June 2016, this paper presents a statistical method to obtain simulated joint time series of wave height, period and direction covering an extended time span of a decade or more. The method is based on a vector auto-regressive moving average algorithm. The simulated times series shows a satisfactory degree of stochastic agreement between original and simulated time series, including average value, marginal distribution, autocorrelation and cross-correlation structure, which are important for Monte Carlo modelling of shoreline evolution, thereby allowing ensemble prediction of shoreline response to a variable wave climate.

Keywords:

VAR model; wave time series; autocorrelation; cross-correlation

1. Introduction

Simulation of time series plays an important role in many areas due to the high cost in obtaining in situ measurements. In the field of coastal engineering simulation of wave time series has been used for estimating the duration of storm events and their spacing in time, see e.g., [1,2], with the aim of assessing the risk of serious beach erosion [3,4]. Typically, a relatively short record of measured wave conditions is available, hence to perform Monte Carlo simulation of coastal flooding or erosion, wave sequences with similar statistics are required. This problem is particularly challenging and may be stated as simulating a non-stationary non-normal, correlated, trivariate stochastic process, given a sample or realisation of the process. Several methods have been proposed for tackling this type of problem. For example, Li and Winker [5] proposed a Monte Carlo method, or a quasi Monte Carlo method, to obtain simulated time series from a vector autoregressive moving average (VARMA) model fitted to sample data. Barone [6] described a simulation method that can be used to generate realisations of a VARMA process, while Shea [7] discussed a direct method of computing the initial state covariance matrix required by the simulation method. However, these methods cannot ensure the marginal distributions, and/or the autocorrelation patterns of the simulated data are the same as those of the sample. On the other hand, representative wave events obtained via a K-Means algorithm clustering technique has been used for studying port operability and long-term longshore sediment transport, but this suffers from not taking extreme wave events into account [8].

Methods to simplify the problem include a technique for simulating time series of wave height, period and direction, accounting for seasonality on a monthly mean basis [2]. This can suffer from jumps in the mean of variables at the changeover between consecutive months. Focusing on wave height and period only, bivariate autoregressive models were proposed by Soares and Cunha [9]. Recently, Cai [10] and Cai et al. [11] generalised the work of [12,13] in order to obtain simulated multivariate time series. However, these methods are suitable for stationary time series only, and hence, they cannot be used to generate the non-stationary wave conditions observed widely in practice. Although Cai et al. [14] developed a simulation method for non-stationary time series, this method is suitable only for short non-stationary time series due to relatively higher computational cost, and therefore is not suitable for wave time series corresponding to long-term measurements.

The purpose of this paper is to extend the method of [10] to recreate both the statistics and lag correlation properties of a non-stationary, multivariate wave record spanning many years. The data used here are multivariate, correlated and non-stationary. The reader is referred to [15,16,17]. Furthermore, this study relaxes the constraints of describing seasonality of all wave variables with a single functional form. More specifically, the seasonality of wave height and period time series were not assumed to have the same form as the seasonality of wave direction.

This paper is organised as follows. In Section 2, the theoretical background of the model is presented; in Section 3 the measurements from the study site and the approaches we used for removing the trends and seasonality are discussed. The simulation results are demonstrated in Section 4. Finally, discussion and conclusions are presented in Section 5.

2. Methodology

2.1. Outline

Let E_t = (E_1t,E_2t,E_3t) denote observed sea condition data, where E_1t is the wave height, E_2t is wave period and E_3t is the wave direction, and the subscript t indicates waves at time t. As discussed in the previous section, the main objective of this research is to obtain simulated sea condition data, denoted by

{\tilde{E}}_{t} = ({\tilde{E}}_{1 t}, {\tilde{E}}_{2 t}, {\tilde{E}}_{3 t})

, such that the simulated data will have similar trend, seasonality, autocorrelation structures and marginal distributions with those of the observed E_t.

It is worth noting that the method of [10] requires that the observed data are stationary. Hence, it is important to convert the observed wave data into stationary ones. More specifically, our method includes the following steps: (i) Remove the trend and seasonality from the observed data to produce stationary series; (ii) apply the method of [10] to the series obtained from step (i); (iii) include the trend and seasonality into the simulated data obtained from step (ii) to generate the final simulated data that we require. Figure 1 illustrates our proposed methodology:

First note that in this application, we consider the trend and seasonality to be additive. That is, each E_kt for k = 1, 2, 3 can be expressed by E_kt = T_kt + S_kt + y_kt, where T_kt and S_kt represent the trend and seasonality components, respectively, and y_kt is the detrended and deseasonalised data that will be used in step (ii) of our method.

For wave directions, the sample was first transformed to create a variable with similar magnitude and range as that of wave height and period, (see Section 3), before the trend was estimated and removed as per the treatment of wave height and period. Then, a linear trend for each variable and an autoregressive process model of order 5 for wave height and period were estimated, which were subtracted from the original data to give the detrended data E_kt − T_kt.

For seasonality S_kt, as waves are driven by atmospheric winds, their characteristics will reflect the seasonal changes in the weather. However, at any particular location, the wave conditions may be a combination of waves from several sources so that ‘seasonality’ in the literal sense may not be reflected in wave height, period and angle in an identical manner. The seasonality parameters were determined using least squares estimation. Specifically, to estimate the seasonal component of each one of the three wave time series, sums comprising of gradually increasing numbers of sine or Fourier terms were successively tested via the Augmented Dickey-Fuller test [18], until the suitable combination of seasonal components were obtained. Then, the selected seasonal components were removed from the wave time series. To this end, an existing routine in MATLAB [19] was applied for testing the stationarity of the wave time series via the Augmented Dickey-Fuller test. This routine gives as output the value 0 for non-stationary data and 1 for stationary data. Following this method, the seasonality of wave height and period time series were described via a sum of sine terms while the seasonality of wave directions was described via a sum of Fourier terms. After removal of the trend and seasonal component the detrended and deseasonalised series are given by y_kt = E_kt − T_kt − S_kt, which is a vector sequence, (trivariate—with wave height, period, and direction), that is stationary, correlated and non-normal.

Once the detrended and deseasonalised series y_kt for k = 1,2,3 have been computed, and successfully tested for stationarity, a vector VAR (vector auto-regressive) method, (outlined in the next section), may be applied to obtain simulated data for y_kt. This may then be transformed back to obtain simulated data for E_kt with the correct seasonality and trend.

2.2. Detailed Methodology

Following [10], let the base process of the method be a vector AR (auto-regressive) process of order p, denoted by VAR(p), defined by

z_{t} = φ_{1} z_{t - 1} + φ_{2} z_{t - 2} + \dots + φ_{p} z_{t - p} + u_{t}

(1)

where z_t = (z_1t,z_2t,z_3t)′, z_kt ∼ N(0,1) for k = 1, 2, 3, ϕ_i are fixed 3 × 3 coefficient matrices, I = 1, ..., p, and u_t = (u_1t,u_2t,u_3t)′ is a 3-dimensional normal random variable with mean 0 and covariance matrix Σ_usuch that

\begin{array}{l} E (u_{t} u_{t - h}^{'}) = {\begin{matrix} Σ_{u} \begin{matrix}  \end{matrix} if h = 0 \\ 0_{3 x 3} \begin{matrix} otherwise \end{matrix} \end{matrix} \begin{matrix}  \end{matrix} \\ where \\ Σ_{u} = (\begin{matrix} s_{11} & s_{12} & s_{13} \\ s_{21} & s_{22} & s_{23} \\ s_{31} & s_{32} & s_{33} \end{matrix}) \end{array}

(2)

and

\begin{array}{l} s_{i j} = E (u_{i t} u_{jt}) for i, j = 1, 2, 3 \\ with \\ φ_{i} = (\begin{matrix} φ_{i 11} & φ_{i 12} & φ_{i 13} \\ φ_{i 21} & φ_{i 22} & φ_{i 23} \\ φ_{i 31} & φ_{i 32} & φ_{i 33} \end{matrix}) \end{array}

It follows, see e.g., [20], that the correlation matrix function of z_t is given by

Γ (h) = E (z_{t} z_{t - h}^{'})

, where h = 0,1,2,..., H and

Γ (0) = \sum_{l = 1}^{p} ϕ_{l} Γ (0 - l) + Σ_{u}

(3)

Γ (h) = \sum_{l = 1}^{p} ϕ_{l} Γ (h - l)

Γ (h) = {(r_{i j h})}_{3 \times 3} = (\begin{matrix} r_{11 h} & r_{12 h} & r_{13 h} \\ r_{21 h} & r_{22 h} & r_{23 h} \\ r_{31 h} & r_{32 h} & r_{33 h} \end{matrix}), Γ (0) = {(r_{i j 0})}_{3 \times 3} = (\begin{matrix} 1 & r_{120} & r_{130} \\ r_{120} & 1 & r_{230} \\ r_{130} & r_{230} & 1 \end{matrix})

where H is a fixed number that defines the maximum lag value we would like to consider for matching the autocorrelation structures between the simulated and the observed time series, r_ijh is the correlation between z_it and z_jt_+h and i,j = 1, 2, 3, and Γ(−h) = Γ′(h).

The simulation method requires the estimation of ϕ_i, r_ijhand Σ_u. The correlations r_ijhcan be obtained by solving the following non-linear equations

ρ_{i j h} = \frac{\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} F_{i}^{- 1} (Φ (z_{i t})) F_{j}^{- 1} (Φ (z_{j t + h})) ξ_{r_{i j h}} (z_{i t}, z_{j t + h}) d z_{i t} d z_{j t + h} - E (y_{i t}) E (y_{j t + h})}{\sqrt{v a r (y_{i t}) v a r (y_{j t + h})}}

(4)

where ρ_ijh is the correlation between two stationary time series corresponding to the original input data, F_j(·) is the marginal distribution of y_jand hence F_j⁻¹(·) represents its inverse function; Φ(·) is the distribution function of the standard normal distribution;

ξ

_rijh(·,·) is the joint density function of two normal variables with mean zero and correlation r_ijh; E(y_it) is the mean of y_it, and var(y_it) is the variance ofy_it. An explanation about how Equation (4) is derived is presented in the Appendix A.

Thus, given the sample data E_t, we determined detrended and deseasonalised data y_it. Setting F_j(·) equal to the empirical distribution of y_jt and setting E(y_it) and var(y_it) equal to the sample mean and variance of y_itrespectively, and replacing ρ_ijh with the sample autocorrelation of y_itand y_jt+h, the r_ijhcan be obtained by solving the resulting non-linear equations using the methods detailed in the Appendix B. Once the r_ijh’s are available, ϕ_i and Σ_u for i = 1,...,p can be obtained by solving Equation (3).

Hence, with r_ijh, ϕ_i and Σ_u determined the simulated data can be produced as follows: (i) Obtain simulated data from the base process (Equation (1)); (ii) use the simulated z_t= (z_1t,z_2t,z_3t) to obtain simulated y_itfor i = 1,2,3. To this end, the corresponding values of the normal cumulative distribution function, Φ(z_t), are calculated given as input data the simulated values z_t. Then, the Φ(z_t) values are interpolated in the empirical marginal distribution set of values with respect to the wave parameters (height (for i = 1), period (for i = 2) and direction (for i = 3) to yield the simulated data y_it. (iii) Finally, the trend and seasonality components that had been removed before (see Figure 1) are re-added to y_it to yield the simulated time series E_it for i = 1,2,3 as required. The whole process is schematised in Figure 2.

3. Case-Study and Data Processing

For our test case we have chosen Littlehampton which is located in Southeast England (Figure 3a).

Wave measurements were accessed from the Channel Coastal Observatory, a UK organisation which collects and archives coastal field-data. Specifically, time series of significant wave height (Hs), peak wave period (T_p) and wave direction (α) relative to North, between the 1st of July 2003 and the 30th of June 2016 were gathered. Integrated wave parameters were available at an interval of 30 min at a location 4 miles SSE of Littlehampton harbour entrance; observations were made with a Datawell Directional WaveRider Mk III buoy moored in approximately 10 m water. The wave rose of the 13 year record is illustrated in Figure 3b. This shows a wave climate with a predominant southwesterly approach and a secondary peak in waves from the southeast. Waves from the southwest are typically a mixture of locally generated wind waves and swell waves from Atlantic storms. Southeasterly waves are fetch-limited storm waves generated by the northern part of low-pressure systems that track to the south of the UK [21].

The wave records were averaged to create a time series of daily wave conditions, as our focus is on storm events rather than wave by wave fluctuations. The daily data are shown in Figure 4, in which seasonality is visually evident.

If the magnitudes of the variables of interest are very different, it is normal practice to standardise or transform them before beginning the modelling process in order to improve the fitting. Here the range of wave direction is much larger than the other two variables. Hence, the following transformation of wave direction in the data preparation stage was performed:

if 0 \leq α < \frac{π}{2}, θ = \sqrt{1 - \cos^{2} (α)}, - 0 \leq θ < 1 if \frac{π}{2} \leq α < π, θ = 1 + \sqrt{1 - \cos^{2} (α)}, - 1 \leq θ < 2 if π \leq α < \frac{3 π}{2}, θ = - \sqrt{1 - \cos^{2} (α)}, - 1 \leq θ < 0 if \frac{3 π}{2} \leq α < 2 π, θ = - 1 - \sqrt{1 - \cos^{2} (α)}, - 2 \leq θ < - 1

(5)

Before undergoing any further processing of the wave time series, it was important to split the available dataset into a training and a validation subset. The training subset constituted the first 75% of the whole dataset, thus, covering the time period from 1 July 2003 to 31 March 2013. In this time span, the aim was the VAR model to be properly parametrised, specifically, to estimate the trend and the seasonal elements of the wave time series, plus the φ_i and u_t parameters of Equation (1). The validation is performed on the remaining 25% of the whole dataset, namely, from 1 April 2013 to 30 June 2016.

Next, the trend and seasonal elements of the wave timeseries are assessed, and subsequently, removed to achieve stationarity.

3.1. Detrending

To estimate T_kt, the following model was used to represent the trend; coefficients being obtained by fitting this to the observations using least-squares estimation:

x_{t} = a + b t + \sum_{u = 1}^{p} α_{k} x_{t - u} + ε_{t}

(6)

where

ε_{t}

are independent, identically distributed random variables,

p = 0

for wave direction and

p = 5

for wave height and wave period. The value of the order

p

was chosen to ensure the stationarity of the detrended and deseasonalised series. The estimated linear trend for each variable is shown in Figure 5, and the estimated parameter α_k values for wave height are α₁ = 0.5996; α₂ = −0.08316; α₃ = 0.11206; α₄ = −0.02666; and α₅ = 0.0373, while the corresponding values for wave period are: a₁ = 0.5007; a₂ = −0.04946; a₃ = 0.03786; a₄ = 0.0182; and a₅ = −0.019. The trend, T_kt, was taken as the residual series ε_t of the models respectively.

3.2. Seasonality

For seasonality, a best fit curve was chosen to describe the seasonal element for each variable. This curve is given by the following equation corresponding to a sum of sine terms: S_t = a₁ × sin(b₁ × n + c₁) +…+a₈ × sin(b₈ × n + c₈), for significant wave height H_s and peak wave period T_p, while for the wave direction α, the seasonal element was described via an equation comprising of a sum of Fourier terms: S_t = a₀ + a₁ × cos(nw) + b₁ × sin(nw) + a₂ × cos(8nw) + b₂ × sin(8nw) +…a₈ × cos(8nw) + b₈ × sin(8nw), where n is the number of the consecutive temporal step. The fitting parameters a_i, b_i and c_i for wave height and period time series, and the corresponding values a_i, b_i and w for wave direction were estimated via the least-squares method for each of wave height, period and direction. A different formulation for the seasonal components for wave direction was required in order to remove all the non-stationarity in the series. Figure 3 suggests that seasonal components in the wave direction are different from those in the other two wave variables and this is borne out by the nature of the non-stationarity of the respective variables.

The extracted seasonal components are presented in Figure 6. Note that monthly averaged values of wave parameters have been used for visual clarity.

Finally, the augmented Dickey-Fuller test [18] was applied to the detrended and deseasonalised data y_it to determine whether the series was stationary. This process was performed iteratively, adding additional terms to the seasonality model, until the augmented Dickey-Fuller test indicated that y_it were stationary.

4. Simulation Results

As mentioned in the previous section, the trend and seasonal component of the first 75% of the available measurements at Littlehampton were taken into account. The marginal distributions of wave height, period and direction and their autocorrelations and cross-correlations were estimated from this sequence. The correlations were used to estimate the values of

r_{i j h}

, ϕ_iand Σ_uby solving Equations (2) and (3) (see also Appendix B), where

h = 0, 1, \dots, 3

. Hence, in this study we let

H = 3

, corresponding to matching the correlation structure of the data up to three days, or approximately the storm duration at the site. Note that these parameter values define the correlation structure of the model (Equation (1)). Then, the estimated model (Equation (1)) was used to obtain simulated data z_t, i.e., the detrended and deseasonalised synthetic data (see Appendix B). The length of the simulated data was taken to be the same as the original series for illustration purpose. Finally, the trend and seasonality removed in the initial steps were added back to create the output synthetic time-series,

{\tilde{E}}_{t}

(Figure 7).

If the method is working well, the simulated data

{\tilde{E}}_{t}

should have similar statistical properties to those of the original data

E_{t}

. As a check on this, marginal distributions and correlations of the original and simulated series were compared. The marginal distributions of the observed and simulated data were estimated with the non-parametric kernel estimation method [22]. The estimated marginal density functions are given in Figure 8, where the blue curves correspond to the observed data and the red curves correspond to the simulated data. It can be seen that the two sets of density functions are very close for all three sea condition variables. Moreover, the estimated means of the observed and simulated data are shown by the blue and red vertical lines respectively in Figure 8, and show extremely close agreement.

Figure 9 shows the estimated autocorrelation and cross-correlation between variables for the original and simulated data series.

Overall, Figure 9 shows a very good agreement between the original and synthetic data. The final step is the validation of the VAR model on an independent section of measurements; that is, the time period from 1 April 2013 to 30 June 2016. The simulated wave time series, along with their density functions is presented in Figure 10 and Figure 11 respectively.

Figure 11 illustrates good agreement between the means of wave height, period and direction. Some divergence between the original and simulated wave time series, particularly at the peaks of the density functions, is evident.

A comparison between detrended and deseasonalised original and synthetic time series was conducted. Results are shown in Figure 12 which demonstrate a good level of agreement in the correlation structure of the original and synthetic series. The temporal correlation scales in auto- and cross- correlations is captured well although the slight negative cross-correlation between wave height and period at lags up to one week is absent in the synthetic series. Example input data and processed data can be found in Supplementary Materials.

5. Discussion and Conclusions

A methodology for simulating multi-variate wave sequences via a vector autoregressive (VAR) stochastic model was presented in this study and its application illustrated with measurements over a 13-year period taken at Littlehampton, UK. The measurement record was split into two non-overlapping subsets. The first one extending from 1 July 2003 to 31 March 2013 and the second one from 1 April 2013 to 30 June 2016, to provide independent training and validation data sets. The model was successfully calibrated and validated.

The utility of the correlation functions on non-stationary data is arguable and does not appear to rhyme with “A key element of the procedure is a detailed treatment of non-stationarity in the wave time sequences.”

A key element of the procedure is a detailed treatment of non-stationarity in the wave time sequences. The non-stationarity may have different manifestations within each element of the wave conditions; with wave height, period and direction each exhibiting different non-stationarity. Our methodology allows for such variation and is able to create synthetic sequences of wave conditions that have very similar statistical properties to the original dataset.

We applied the method to a site in the UK that experiences mid-latitude storms that have a typical duration of several days. A crucial quantity in the method is the parameter H, which controls the number of lag correlations that are modelled. For our dataset, we found that H = 3, corresponding to a lag of three days, provided a good representation of the storm-scale correlation between wave parameters. Should greater fidelity in the correlation structure be required, for instance to resolve infra-storm conditions, the method allows this. It would require analysing the original data at a finer temporal resolution, say hourly or three hourly, and correlations at a larger number of lags to be found, requiring additional calculation.

We note that the purpose of removing trend and seasonality in our study is to ensure that the detended and deseasonalised series y_kt are stationary so that we can use our simulation method to obtain simulated data for the original wave height, direction and period data. We have used the simplest method of [20] to remove trend and seasonality, where the trend and seasonality are estimated using our methods. Furthermore, we use the ADF test to check the stationarity of detrended and deseasonalised data. On the other hand, after we re-add trend and seasonality to the simulated y_kt, any potential biases in the residuals will disappear.

More sophisticated approaches are available. For example, where seasonal components vary significantly in amplitude and frequency over the dataset a least-squares wavelet analysis applied in a window-wise manner may be more suitable [15]. Other methods, such as the anti-leakage least-squares spectral analysis, allow simultaneous estimation of the trend and seasonal components.

The method described in this paper does not yield unrealistic jumps in the time series, as some earlier techniques did. The vector autoregressive (VAR) stochastic model presented in this study can be developed further to optimise the modelling of the seasonal component. In addition, the choice of parameter H, corresponding to the number of lag correlations that are modelled, could be automated rather than specified via a sequence of trial simulations.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w14030363/s1, Table S1: model training wave data; Table S2: model validation wave data.

Author Contributions

Conceptualisation, Y.C. and D.E.R.; methodology, Y.C.; software, A.V.; validation, A.V., Y.C. and D.E.R.; writing—original draft preparation, A.V. and Y.C.; writing—review and editing, D.E.R.; supervision, Y.C. and D.E.R.; project administration, D.E.R.; funding acquisition, Y.C. and D.E.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under the MORPHINE project (grant EP/N007379/1).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The wave measurements used in this study are available, without charge, from the Channel Coastal Observatory, a UK organisation which collects and archives coastal field-data (https://coastalmonitoring.org/cco/, accessed on 21 January 2022).

Acknowledgments

The support of the UK Engineering and Physical Sciences Research Council (EPSRC) under the MORPHINE project (grant EP/N007379/1) is gratefully appreciated.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The correlation between y_it and y_jt+h is given by the following equation:

ρ_{i j h} = \frac{c o v (y_{i t}, y_{j t + h})}{\sqrt{v a r (y_{i t}), v a r (y_{j t + h})}}

(A1)

However,

\begin{array}{l} c o v (y_{i t}, y_{j t - k}) = E [(y_{i t} - E (y_{i t})) (y_{j t - k} - E (y_{j t - k})] \\ \begin{matrix}  \end{matrix} \begin{matrix} = \end{matrix} E [y_{i t} y_{j t - k} - y_{i t} E (y_{j t - k}) - y_{j t - k} E (y_{i t}) + E (y_{i t}) E (y_{j t - k})] \\ \begin{matrix}  \end{matrix} \begin{matrix} = E [ \end{matrix} y_{i t} y_{j t - k}] - E [y_{i t} E (y_{j t - k}) - y_{j t - k} E (y_{i t})] + E [E (y_{i t}) E (y_{j t - k})] \\ \begin{matrix}  \end{matrix} \begin{matrix} = \end{matrix} E (y_{i t} y_{j t - k}) - E (y_{i t}) E (y_{j t - k}) - E (y_{j t - k}) E (y_{i t}) + E (y_{i t}) E (y_{j t - k}) \\ \begin{matrix}  \end{matrix} \begin{matrix} = \end{matrix} E (y_{i t} y_{j t - k}) - E (y_{i t}) E (y_{j t - k}) \end{array}

where cov is the covariance between y_it and y_jt+h

Thus, Equation (A1) is modified as follows:

ρ_{i j h} = \frac{E (y i t y j t + h) - E (y i t) E (y j t + h)}{\sqrt{v a r (y_{i t}), v a r (y_{j t + h})}}

(A2)

The mean of the probability distribution E of the product

y_{i t} \times y_{j t + h}

can be expressed analytically via the following equation:

E (y_{i t} y_{j t + h}) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} F_{i}^{- 1} (Φ (z_{i t})) F_{j}^{- 1} (Φ (z_{j t + h})) ξ_{r_{i j h}} d z_{i t} d z_{j t + h}

Hence yielding Equation (4) of Section 2.2:

ρ_{i j h} = \frac{\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} F_{i}^{- 1} (Φ (z_{i t})) F_{j}^{- 1} (Φ (z_{j t + h})) ξ_{r_{i j h}} (z_{i t}, z_{j t + h}) d z_{i t} d z_{j t + h} - E (y_{i t}) E (y_{j t + h})}{\sqrt{v a r (y_{i t}) v a r (y_{j t + h})}}

Appendix B

Marginal distribution of y_kt:

The marginal distribution F_k(y) of y_kt is described via an empirical distribution function of the observed time-series y_kt.

Sample mean, variance and autocorrelations:

Let the sample mean of y_kt be

{\bar{y}}_{k}

, sample variance

{\hat{σ}}_{k}

and sample autocorrelation

{\hat{ρ}}_{i j h}

. Then

\begin{array}{l} {\bar{y}}_{k} = \frac{1}{n} \sum_{t = 1}^{n} y_{k t}, \begin{matrix} {\hat{σ}}_{k} = \end{matrix} \frac{1}{n - 1} \sum_{t = 1}^{n} {(y_{k t} - {\bar{y}}_{k})}^{2}, \begin{matrix} k = 1, 2, 3 \end{matrix} \\ {\hat{ρ}}_{i j h} = \frac{\sum_{t = 1}^{n - h} (y_{i t} - {\bar{y}}_{i}) (y_{i t + h} - {\bar{y}}_{j})}{\sqrt{\sum_{t = 1}^{n} {(y_{i t} - {\bar{y}}_{i})}^{2} \sum_{t = 1}^{n} {(y_{j t + h} - {\bar{y}}_{j})}^{2}}}, \begin{matrix} i, j = 1, 2; h \end{matrix} = 0, \dots, H \end{array}

where H is a fixed number that defines the maximum lag value we would like to consider when matching the autocorrelation structures between the simulated and the observed time-series, and in this study has been set equal to 3.

Solve non-linear equations and Yule-Walker Equation (3):

We need to solve the non-linear Equation (4) (here Equation (A3)) for r_ijh for all possible i,j and h.

{\hat{ρ}}_{i j h} = \frac{\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} F_{i}^{- 1} (Φ (z_{i t})) F_{i}^{- 1} (Φ (z_{j t + h})) φ_{r_{i j h}} (z_{i t}, z_{j t + h}) d z_{i t} d z_{j t + h} - {\bar{y}}_{i} {\bar{y}}_{j}}{{\hat{σ}}_{i} {\hat{σ}}_{j}}

(A3)

where Φ(·) is the standard normal distribution function,

φ_{r_{i j h}} (z_{i t}, z_{j t + h}) = \frac{1}{2 π \sqrt{1 - r_{i j h}^{2}}} e x p {- \frac{z_{j t}^{2} - 2 r_{i j h} z_{i t} z_{j t + h} + z_{j t + h}^{2}}{2 (1 - r_{i j h}^{2})}}

(A4)

Then r_ijh can be estimated by solving Equation (A3) using, an iterating technique such as the Newton–Raphson’s method:

r_{i j h}^{(m + 1)} = r_{i j h}^{(m)} - \frac{f (r_{i j h}^{(m)})}{f^{'} (r_{i j h}^{(m)})}, m = 0, 1 \dots,

(A5)

where

f (r_{i j h}^{(m)}) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} F_{i}^{- 1} (Φ (z_{i t})) F_{j}^{- 1} (Φ (z_{i t + h})) φ_{r_{i j h}}^{(m)} (z_{i t}, z_{j t + h}) d z_{i t} d z_{j t + h} - {\bar{y}}_{i} {\bar{y}}_{j} - {\hat{σ}}_{i} {\hat{σ}}_{j} {\hat{ρ}}_{i j h} f^{'} (r_{i j h}^{(m)}) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} F_{i}^{- 1} (Φ (z_{i t})) F_{j}^{- 1} (Φ (z_{i t + h})) φ'_{r_{i j h}}^{(m)} (z_{i t}, z_{j t + h}) d z_{i t} d z_{j t + h} {φ^{'}}_{r_{i j h}}^{(m)} (z_{i t}, z_{j t + h}) = φ_{r_{i j h}}^{(m)} (z_{i t}, z_{j t + h}) ξ_{r_{i j h}}^{(m)} (z_{i t}, z_{j t + h})

and

ξ_{r_{i j h}}^{(m)} (z_{i t}, z_{j t + h}) = \frac{r_{i j h}^{(m)}}{1 - r_{i j h}^{(m) 2}} + \frac{z_{i t} z_{j t + h} + r_{i j h}^{(m) 2} z_{i t} z_{j t + h} - r_{i j h}^{(m)} z_{i t}^{2} - r_{i j h}^{(m)} z_{j t + h}^{2}}{{(1 - r_{i j h}^{(m) 2})}^{2}}

to evaluate two double integrations more efficiently, the following method may be applied:

f (r_{i j h}^{(m)}) \approx \frac{1}{M} \sum_{l = 1}^{M} F_{i}^{- 1} ((W_{l}^{(1)})) F_{j}^{- 1} (Φ (W_{l}^{(2)})) - {\bar{y}}_{i} {\bar{y}}_{j} - {\hat{σ}}_{i} {\hat{σ}}_{j} {\hat{ρ}}_{i j h}

(A6)

f^{'} (r_{i j h}^{(m)}) \approx \frac{1}{M} \sum_{l = 1}^{M} F_{i}^{- 1} ((W_{l}^{(1)})) F_{j}^{- 1} (Φ (W_{l}^{(2)})) ξ_{r_{i j h}}^{(m)} (W_{l}^{(1)}, W_{l}^{(2)})

(A7)

where

(W_{l}^{(1)}, W_{l}^{(2)})

is a random sample from the bivariate normal distribution with mean 0 and correlation

r_{i j h}^{(m)}

.

Then, a bisection method is applied to find the root corresponding to

f (r_{i j h}^{m}) \approx 0

. The index m corresponds to the number of the consecutive applications of the bisection method until r_ijh is estimated.

Once r_ijhs are available, ϕ_iand Σ_ufor i = 1,...,p can be obtained directly by solving Equation (2).

Obtain simulated data for the base process z_t:

The simulated base process can be obtained by the following steps.

(1): Construct the covariance matrix P and find Q such that P = QQ′, where

$P = [\begin{matrix} Γ (0) & Γ^{'} (1) & \dots & Γ^{'} (p - 1) \\ Γ (1) & Γ (0) & \dots & Γ^{'} (p - 2) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Γ (p - 1) & Γ (p - 2) & \dots & Γ (0) \end{matrix}]$

where Γ′ is the transpose matrix of Γ.
(2): Obtain the initial value (z_−p+1,z_−p+2,...,z₀) by simulating v_i ∼ N(0,1), where i = 1, ..., 3p, and letting (z_−p+1,z_−p+2,...,z₀) = Q(v₁,v₂,...,v_3p)>′.
(3): Find matrix Q₁ such that $Σ_{u} = Q_{1} Q_{1}^{'}$
(4): Obtain simulated data for u_tby letting u_t= Q₁(v_1t,v_2t,v_3t)′, where v_kt ~ N(0,1), k = 1 ,2 , 3 and t = 1, 2,..., T, where T is the length of the simulated data.
(5): Obtain the simulated data for the base process z_tby letting

$z_{t} = φ_{1} z_{t - 1} + φ_{2} z_{t - 2} + \dots + φ_{p} z_{t - p} + u_{t}, \begin{matrix} \end{matrix} t = 1, \dots, T .$

Obtain simulated series

{\tilde{y}}_{k t}

:

{\tilde{y}}_{k t} = F_{k}^{- 1} (Φ (z_{k t}))

where k = 1, 2, 3 and t = 1, 2, ..., T, and F⁻_k¹(x) is the inverse function of the empirical distribution of the y process and

{\tilde{y}}_{k t}

can be obtained by interpolation of the empirical distribution.

Transform

{\tilde{y}}_{k t}

back to obtain simulated data for E_kt:

Include the trend, seasonal components and stochastic trend into the simulated process

{\tilde{y}}_{k t}

to get the simulated data

{\tilde{Ε}}_{k t} = T_{k t} + S_{k t} + {\tilde{y}}_{k t}

, where k = 1, 2, 3 and t = 1, 2, ..., T.

References

Jägera, W.S.; Nagler, T.; Czado, C.; McCall, R.T. A statistical simulation method for joint time series of non-stationary hourly wave parameters. J. Coast. Eng. 2019, 146, 14–31. [Google Scholar] [CrossRef] [Green Version]
Borgman, L.E.; Scheffner, N.W. Simulation of Time Sequences of Wave Height, Period, and Direction; Technical Report DRP-91-2, USACE-WDC; Coastal Engineering Research Center: Vicksburg, MS, USA, 1991. [Google Scholar]
Callaghan, D.; Nielsen, P.; Short, A.; Ranasinghe, R. Statistical simulation of wave climate and extreme beach erosion. J. Coast. Eng. 2008, 55, 375–390. [Google Scholar] [CrossRef]
Corbella, S.; Stretch, D.D. Predicting coastal erosion trends using non-stationary statistics and process-based models. J. Coast. Eng. 2012, 70, 40–49. [Google Scholar] [CrossRef]
Li, J.; Winker, P. Time Series Simulation with Quasi Monte Carlo Methods. Comput. Econ. 2003, 21, 23–43. [Google Scholar] [CrossRef]
Barone, P. A method for generating independent realizations of a multivariate normal stationary and invertible ARMA(p, q) process. J. Time Ser. Anal. 1987, 8, 125–130. [Google Scholar] [CrossRef]
Shea, B.L. A Note on the generation of independent realizations of a vector autoregressive moving-average process. J. Time Ser. Anal. 1988, 9, 403–410. [Google Scholar] [CrossRef]
Camus, P.; Mendez, F.J.; Medina, R.; Cofiño, A.S. Analysis of clustering and selection algorithms for the study of multivariate wave climate. J. Coast. Eng. 2011, 58, 453–462. [Google Scholar] [CrossRef]
Soares, C.G.; Cunha, C. Bivariate autoregressive models for the time series of significant wave height and mean period. J. Coast. Eng. 2000, 40, 297–311. [Google Scholar] [CrossRef]
Cai, Y. Multivariate time series simulation. J. Time Ser. Anal. 2011, 32, 566–579. [Google Scholar] [CrossRef]
Cai, Y.; Gouldby, B.; Hawkes, P.; Dunning, P. Statistical Simulation of Flood Variables: Incorporating Short-Term Sequencing. J. Flood Risk Manag. 2008, 1, 1–10. [Google Scholar] [CrossRef]
Cario, M.C.; Nelson, B.L. Numerical methods for fitting and simulating autoregressive-to-anything processes. J. Comput. 1998, 10, 72–81. [Google Scholar] [CrossRef] [Green Version]
Biller, B.; Nelson, B.L. Modeling and generating multivariate timeseries input processes using a vector autoregressive technique. ACM Trans. Modeling Comput. Simul. (TOMACS) 2003, 13, 211–237. [Google Scholar] [CrossRef]
Cai, Y.; Huang, J.; Tang, Y.; Zhou, G. A simulation method for finite non-stationary time series. J. Stat. Comput. Simul. 2014, 84, 1563–1579. [Google Scholar] [CrossRef] [Green Version]
Ghaderpour, E.; Pagiatakis, S.D.; Hassan, Q.K. A Survey on Change Detection and Time Series Analysis with Applications. Appl. Sci. 2021, 11, 6141. [Google Scholar] [CrossRef]
Looney, D.; Adjei, T.; Mandic, D.P. A Novel Multivariate Sample Entropy Algorithm for Modeling Time Series Synchronization. Entropy 2018, 20, 82. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, X.; Li, Y.; Gao, S.; Ren, P. Ocean Wave Height Series Prediction with Numerical Long Short-Term Memory. Mar. Sci. Eng. 2021, 9, 514. [Google Scholar] [CrossRef]
Fuller, W.A. Introduction to Statistical Time Series, 1st ed.; John Wiley and Sons: New York, NY, USA, 1976. [Google Scholar]
Mathworks. Available online: https://www.mathworks.com/help/econ/adftest.html#d123e114490 (accessed on 21 January 2022).
Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods; Springer: New York, NY, USA, 1991. [Google Scholar]
Bradbury, A.; Mason, T.; Poate, T. Implications of the spectral shape of wave conditions for engineering design and coastal hazard assessment–evidence from the English Channel. In Proceedings of the 10th International Workshop on Wave Hindcasting and Forecasting and Coastal Hazard Symposium, North Shore, Oahu, HI, USA, 11–16 November 2007. [Google Scholar]
Silverman, B.W. Density Estimation for Statistics and Data Analysis. In Monographs on Statistics and Applied Probability; Chapman and Hall: London, UK, 1986. [Google Scholar]

Figure 1. Schematic representation of the wave data generation methodology.

Figure 2. Schematised flow of data processing.

Figure 3. (a) Littlehampton beach lies in Southeast England; (b) a rose diagram illustrating the predominant wave direction from southwest to northeast,where the numbers indicate degrees from North.

Figure 4. Time series of the observed daily wave data over the period 1 July 2003 to 30 June 2016. The horizontal axis shows days from the start of the period. The top panel shows wave heights, the middle panel shows wave periods and the bottom panel the transformed wave direction.

Figure 5. For 1 July 2003 to 31 March 2013, the upper panel (a) shows the significant wave height trend (H_s), similarly, the middle panel (b) depicts the peak wave period trend, and finally, the lower panel, (c), shows the wave direction trend. Dates are shown in day, month, year format.

Figure 6. The upper panel (a) shows the seasonal component of significant wave height (H_s), similarly, the middle panel (b) depicts the seasonal component of peak wave period, and finally, the lower panel, (c), shows the seasonal component of wave direction θ(α).

Figure 7. Time series of the simulated daily wave data in the time period: 1 July 2003 to 31 March 2013.

Figure 8. In the time period: 1 July 2003 to 31 March 2013, density function plots for the observed and simulated wave height (m), wave period (s) and wave direction (dimensionless transformed) respectively. Blue curves correspond to observed data and red curves correspond to simulated data. Similarly, blue and red vertical lines correspond to the means of the observed and simulated data, respectively.

Figure 9. In the time period: 1 July 2003 to 31 March 2013, first row: autocorrelation up to 50 lags for each sea condition variable. Second row: cross-correlation between pairs of sea condition variables. Blue lines are the auto/cross correlation function of the original data, and the red lines correspond to those of the simulated data. Note: In all plots the horizontal axis is the lag time in days and the vertical axis is the normalised correlation value which lies between 1 and −1.

Figure 10. Time series of the simulated daily wave-data in the time period from: 1 April 2013 to 30 June 2016.

Figure 11. Density function plots for the observed and simulated wave height (m), wave period (s) and wave direction (dimensionless transformed) respectively, in the time period from: 1 April 2013 to 30 June 2016. Blue curves correspond to observed data and red curves correspond to simulated data. Similarly, blue and red vertical lines correspond to the means of the observed and simulated data, respectively.

Figure 12. In the time period from: 1 April 2013 to 30 June 2016, first row: autocorrelation up to 50 lags for each sea condition variable. Second row: cross-correlation between pairs of sea condition variables. Blue lines are the auto/cross correlation function of the original data, and the red lines correspond to those of the simulated data. Note: In all plots the horizontal axis is the lag time in days and the vertical axis is the normalised correlation value which lies between 1 and −1.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Valsamidis, A.; Cai, Y.; Reeve, D.E. Simulation of Wave Time Series with a Vector Autoregressive Method. Water 2022, 14, 363. https://doi.org/10.3390/w14030363

AMA Style

Valsamidis A, Cai Y, Reeve DE. Simulation of Wave Time Series with a Vector Autoregressive Method. Water. 2022; 14(3):363. https://doi.org/10.3390/w14030363

Chicago/Turabian Style

Valsamidis, Antonios, Yuzhi Cai, and Dominic E. Reeve. 2022. "Simulation of Wave Time Series with a Vector Autoregressive Method" Water 14, no. 3: 363. https://doi.org/10.3390/w14030363

APA Style

Valsamidis, A., Cai, Y., & Reeve, D. E. (2022). Simulation of Wave Time Series with a Vector Autoregressive Method. Water, 14(3), 363. https://doi.org/10.3390/w14030363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Simulation of Wave Time Series with a Vector Autoregressive Method

Abstract

1. Introduction

2. Methodology

2.1. Outline

2.2. Detailed Methodology

3. Case-Study and Data Processing

3.1. Detrending

3.2. Seasonality

4. Simulation Results

5. Discussion and Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI