Sea Spectral Estimation Using ARMA Models

This paper deals with the spectral estimation of sea wave elevation time series by means of ARMA models. To start, the procedure to estimate the ARMA coefficients, based on the use of the Prony’s method applied to the auto-covariance series, is presented. Afterwards, an analysis on how the parameters involved in the ARMA reconstruction procedure—for example, the signal time length, the number of poles and data used—affect the spectral estimates is carried out, providing evidence on their effect on the accuracy of results. This allowed us to provide guidelines on how to set these parameters in order to make the ARMA model as accurate as possible. The paper focuses on mono-modal sea states. Nevertheless, examples also related to bi-modal sea states are discussed.


Introduction
Knowledge of the weather conditions in the context of marine engineering applications is often a key point for structural safety, and also for people's comfort in the case of ships in navigation. In this context, it is very important to have a reliable estimation of the sea spectrum because it has a significant influence on ship motion [1,2], but also on the forces to which ships and offshore structures are subject [3]. Moreover, it constitutes the input of the procedures for the estimation of the sea parameters which are then used for several purposes such as fatigue lifetime estimation [3], or to avoid ship dynamic instabilities [4]. Consequently, different approaches were proposed in the literature to estimate the sea features, starting from either onboard measurements (e.g., ship motion [5][6][7][8][9]) or time signals of wave level. Taking into consideration the latter approach, on the one hand, it is possible to successfully use non-parametric methods such as the Welch's [10] and Thomson's [11] methods for estimating sea spectrum [12][13][14]. On the other hand, parametric methods could also be employed, with the consequent advantage of obtaining a model of the sea spectrum, which can be then used for advanced predictions of sea actions on ships and structures. The availability of a model able to predict the sea state is essential not only in the fields of navigation, ship safety and for the estimation of the forces acting on offshore structures, but also, as an example, in the field of the wave energy converters, where the prediction of the wave elevation allows for the optimization of the control action, leading in turn to the possibility of maximizing the power extraction [15].
Among parametric methods, here the use of Auto-Regressive Moving Average (ARMA) models is taken into consideration because it was already evidenced in the literature that it can be fruitfully employed for output-only identification processes in both sea spectrum estimation and, more generally, in structural dynamics [16].
Considering the specific problem related to sea wave spectrum estimation, Spanos [17] and Mandal et al. [18] showed that ARMA models can be successfully applied and provided examples, mainly related to mono-modal sea state, of their application. Nevertheless, some points are still missing in the literature about the use of ARMA models in sea spectral reconstruction. First of all, since the Auto-regressive (AR) and Moving Average (MA) parts of the ARMA model are usually estimated with a two-step procedure, also involving numerical minimisations, a statistical analysis of the results that can be obtained is important for comprehending the actual potentiality of the ARMA approach in the context of sea estimation. Currently, such an analysis is missing. This in turn implies that no specific analysis can be found related to the influence of the different input parameters, set by the user at the beginning of the identification procedure, on the differences between the reference and the reconstructed sea power spectral estimate. Indeed, one of the main drawbacks related to the use of such models is related to the fact that their estimation procedure requires the setting of some free parameters that can significantly affect the quality of the estimated model. In the literature, no clear indications are given about how to choose the values of these input parameters and only generic indications are provided. In this context, this paper will address some of these points in order to clarify strengths and drawbacks of the considered identification approach, also providing guidelines for setting some of the key parameters of the procedure such as the number of samples that can be used, given a certain time series, and the number of the poles of the ARMA model, which are shown to strongly affect the effectiveness of the power spectral density reconstruction.
To this purpose, the paper is organised into nine sections where Sections 2-4 are related to the methods on which the proposed estimation procedure is based, Section 5 describes the method used to simulate the data for testing this approach, while Sections 6-8 are aimed at presenting the results and at their discussion. In particular, the paper briefly recalls in Section 2 the model assumed for the description of the sea wave elevation time series, which is here considered as a stochastic process, and provides some useful mathematical relations needed for the subsequent estimates of the model parameters. The estimation procedure is then described in Section 3; it combines different established techniques such as the Prony's and the Shanks' method, applied to the auto-covariance of the time series and used together to achieve the best results in terms of parameter accuracy. Despite that the used techniques are not new, they are recalled in Section 3 to summarize the whole estimation procedure and to let the reader understand the reasons behind the choice of the selected algorithms and the subsequent analysis of Section 4, related to the choice of the parameters involved in the estimation procedure. Section 5 explains how the data of the sea wave elevation and the sea power spectral density are simulated for given sea state conditions. Then, Section 6 proposes an analysis of the effect of the parameters involved in the proposed estimation procedure on the estimate accuracy. Particularly, the attention is focused on the effect of the number of auto-covariance samples used for the estimation process on the accuracy of the results, also in relation to the whole length of the time series, on the effect of the initial number of poles selected for the ARMA model and on how to improve the reliability of the estimates in the case of short-time series. All the analyses presented in Section 6 are related mono-modal sea states, where only the wind contribution is considered. Section 7 applies instead the procedure to different mono-modal sea states and to a case of bi-modal sea where both the wind and swell contributions are present. Despite that no detailed analysis is proposed here about this complex sea condition, the method is shown to be promising and worthy of further investigation. Finally, Section 8 provides guidelines on how to set the main parameters of the procedure in order to maximise the accuracy of the spectral reconstruction.

ARMA Modelling of the Wave Elevation Time Series
The wave elevation time series x(n) is considered as a stationary real valued (as all the other time signals considered in the following), zero-mean stochastic process. For the Wold's decomposition theorem, any discrete stationary stochastic process can be expressed as the sum of two uncorrelated processes: one deterministic and another stochastic. Therefore, it is possible to model the time series x(n) as a process generated by the filtering of white noise w(n) with a Linear-Shift-Invariant (LSI) filter, having unit sample response h(n). In the considered case, the LSI filter in the z-domain is represented as a stable, causal rational function with p poles and q zeros (with p ≥ q): where H(z) is the z-transform of h(n) and λ i are the poles of the LSI system and are the solutions of the polynomial equation 1 The random processes x(n) and w(n) are thus linked by the following linear difference equation with constant coefficients: where the first summation represents the AR and the second the MA parts of the ARMA model assumed to describe the stochastic process x(n).
Given the input-output relation between x(n) and w(n), the time series x(n) can be expressed as the convolution between the driving input w(n) and the unit sample response h(n): where is the convolution operator. The cross-covariance between the input and output can be then expressed as: where E{·} indicates the expected value, w(n) is assumed wide sense stationary and r w is the auto-covariance of the input. Analogously, the cross-covariance r wx can be derived as: Finally, using Equations (3)-(5), the auto-covariance of the output results: Noting that h(k) h(−k) can be also seen as the auto-covariance of the unit sample response h(n) and that the auto-covariance of the driving input can be expressed as the variance σ 2 w of the stochastic process w(n) multiplied by the unit sample [19], the autocovariance of the process x(n) of Equation (6) can be also expressed as a scaled version of the auto-covariance, r h (n), of h(n): The power spectral estimate S x of x(n) can be finally obtained from the z-transform of Equation (7), that can be expressed, using Equation (1), as:

ARMA Model Estimation Method
With the aim of estimating the model parameters to have a spectral reconstruction of the wave signal, it is convenient to manipulate Equation (2), multiplying both sides by x(n − k) and taking the expected values: using Equation (5) and recalling that h(n) is assumed to be causal (i.e., h(n) = 0 for n < 0), Equation (9) can be reduced to: Then, calling c q (k) = ∑ q−k i=0 b i+k h(i) and noting that the right-hand side of Equation (9) is equal to zero for k > q, Equation (10) becomes: Equation (11) are the Yule Walker equations of the ARMA process of Equation (2). Looking at their expression, it is possible to notice that the second set of equations (i.e., those for k > q) can be used to solve the linear problem related to the estimation of the AR coefficients a i . This procedure is referred to as the Modified Yule Walker Method (MYW) [20] and uses the unbiased sample correlation coefficients as estimates of the auto-covariance r x : x i x i−k (12) where N is the total number of samples of x(n). However, in order to find the best fit of the AR parameters in the least square sense and to improve the MYW estimates, one of the best procedures is to apply the Prony's method to the auto-covariance function of the stochastic process x(n) [21][22][23].
As for the MA parameters, a better approach with respect to Prony's method is that suggested by Shanks [24], which allows finding the MA coefficients in a least square sense as well. Therefore, a two-step procedure is herein followed to find the ARMA parameters.

Prony Based Estimation of the AR Parameters
Equation (7) expresses the auto-covariance of the stochastic process x(n) as a scaled version of the auto-covariance of the unit sample response h(n). Since h(n) is a causal sequence that can be expressed as a sum of non-increasing exponentials, the causal part of its auto-covariance function can also be expressed as a sum of damped exponentials with the same set of poles [25,26]. Therefore, recalling Equations (1) and (7), it is possible to write: Recalling that the model of the time series assumed by the Prony's method [27] has the same form of the auto-covariance of Equation (13), it is possible to estimate the AR coefficients by deriving the poles λ i applying the Prony's method to the auto-covariance sequence r x [22] and solving for β, with a least square approach, the following problem: where k 0 is an integer equal to or higher than p (usually, k 0 > p to have more equations than unknowns) and β i are the coefficients of the polynomial equations having the roots equal to It is noticed that, sometimes, the auto-covariance for k = 0 is not used in Equation (15) (as done here) due to the possible contribution of additional noise to the signal [28]. It is also noticed that the number of auto-covariance samples used to write Equation (14) will be referred to as ϕ from here on, where ϕ = p + k 0 + 1 ≥ 2p + 1.
Finally, the ARMA poles λ i can be found solving Equation (14) for β and then solving Equation (17), obtaining the roots of the polynomial equation. The AR coefficients can be then estimated using Equation (1) and the α i values can be found using Equation (13).

MA Coefficients and Signal Spectral Estimation
The Shanks' method for the MA coefficient estimation is based on the description of the one-sided auto-covariance sequence r x as the unit sample response of a linear system H r (z), whose poles are those of the ARMA model (see Equation (13)): where N q (z) = n 1 z −1 + n 2 z −2 + . . . + n q z −q . Therefore, with the aim of estimating the numerator coefficients of Equation (18), it is possible to minimize the error e(n) between the auto-covariance r x (k) and its estimatê r x (k) obtained using Equation (18): where h ra is the impulse response of the function 1/A p (z). The problem thus reduces to the least square solution with respect to n of the problem [h r ]n = r 0 : Once that N q (z) is derived, the power spectral estimate of x(n) can be expressed, using Equation (18), as: Finally, the MA coefficients can be estimated using Equations (8) and (21).

Setting the ARMA Estimation Procedure: The Free Parameter Selection
Section 2 explained how the stochastic process related to the sea wave elevation can be described by an ARMA model and Section 3 proposed a procedure, based on the auto-covariance estimates of the time series, to derive the parameters of the model and thus allowing us to derive, using Equation (21), the sea spectral estimates. However, to apply the ARMA estimation procedure, the values of some free parameters have to be defined since they affect the quality of the spectral estimates. Therefore, the involved free parameters have to be properly set, as described in Sections 4.1 and 4.2.

AR Order Selection
The first problem to be addressed is related to the choice of the unknown p order of the AR model of the ARMA process. Several studies show that increasing the AR order improves the effectiveness of the various available algorithms [29,30]. Moreover, Friedlander and Porat [31] showed that, since the AR problem (i.e., that related to the AR part of the ARMA model) can be seen as a least square predictor of an AR process on the auto-covariance sequence, increasing the p order leads to more accurate estimates and that the poles of the actual AR model are a subset of those of the higher order predictorp.
In order to set the value of the increased orderp of the AR problem, several criteria can be used. Among them, the well-known Akaike Information Criterion [32] (AIC) is used here, where the order is chosen as the value which minimizes the following expression [31,33]: Once the increased-order AR model is derived using the procedure of Section 3.1, only the poles that fit the main signal components have to be selected, discarding those mainly related to the fit of the noise. This procedure can be carried out following an energetic criterion. Knowing the poles λ i , it is possible to factorise the z-transform of Equation (13) into first, L i (z), and second, M i (z), order contributions: If L i (z) or M i (z) are found to be unstable, they can be stabilized by reflecting the unstable roots into the unit circle [34].
Then, the energy associated to each system is calculated as: Finally, only the systems with the highest energy are considered (i.e., those whose energy is higher than a fixed threshold level t r , see Section 8).

Number of Equations of the Least-Square Problems
The second parameter that has to be defined is the number of equations used to solve the minimisation problems. Indeed, the estimates of the ARMA parameters are based on the least square solution of the two problems of Equations (14) and (20). Therefore, the value of the auto-covariance samples used in the procedure, ϕ, constrains the number of equations used to define the problems which in turn affects the accuracy of the results. Particularly, an increase of the equations is expected to improve the quality of the results. However, it is noticed that since the problem involves the auto-covariance estimates obtained using Equation (12), ϕ cannot be increased indefinitely. Indeed, for large lags k, the variance of the auto-covariance estimates increases also due to a decrease of the denominator N − k in Equation (12) [19,35,36], leading to an annulment of the beneficial effect of the increase of the number of equations.
Further details about how to choose the best ϕ value are given in Section 6.

Case-Studies: Wave Elevation Time Series Generation and Spectral Description
The algorithm described in Section 3 for the spectral estimation of the sea wave time series, based on an ARMA model of the phenomenon, has been tested on simulated signals representing the sea wave elevation in different sea conditions. To generate these signals, the reference sea power spectral density (PSD) has been defined at first and then used to generate the signal time histories x(n).
In general, the sea state can be described by a combination of wind sea and uncorrelated swell and its PSD, S(ω), can be obtained joining the two contributions [37,38]: where ω is the circular frequency and the spectral wind, S wind (ω), and swell, S swell (ω), components can be modelled by the JONSWAP spectrum S J (ω) [37]: In Equation (26) γ is the peak enhancement factor and A γ is the normalizing factor expressed as A γ = 1 − 0.287 ln(γ). Given the wave peak period T p , ω p = 2π/T p is the peak frequency and σ is the spectral width parameter and is equal to 0.07 if ω ≤ ω p , 0.09 otherwise. In the event that no information about T p is available, it can be expressed, in the range of γ between 1 and 7, as a function of the wave mean period T m [37]: with C = 0.7303, D = 0.04936, c= −0.006556 and d = 0.000361. Finally, S PM in Equation (26) is the Pierson-Moskowitz spectrum [39]: where H s is the significant wave height. Given the sea PSD, S(ω), the random wave elevation time series x(n) can be derived as the superposition of M equally spaced harmonic components at frequencies ω i = i∆ω, with random phase ϕ i [40]: where ∆ t is the sampling time, defined as the inverse of the sampling frequency f samp expressed in Hertz (i.e., ∆ t = 1/ f samp ). The data used in the following analyses are mainly related to sea state conditions that can be described by single peak wave spectra related to sea wind. Nevertheless, the method for the sea PSD reconstruction has been applied also to double peak wave spectra due to both wind and swell to show the effectiveness and reliability of the procedure. The simulated sea conditions were chosen to represent different sea states from slight to rough, mono and bi-modal. In the following sections the analyses will be carried out on three case-studies. The values of the parameters used for the simulations are gathered in Table 1. Further information about the sampling frequency, the frequency resolution, the length of the time series and the value of the added noise will be specified each time in the

Sea Spectral Estimation: Results and Discussion
In the estimation process of an ARMA model starting from time series, in addition to the proper choice of the available algorithms and of the procedure to be followed to estimate the parameters among all those available, great attention must be paid on how to properly apply the selected method. Indeed, also in the case considered here, and despite that the selected method-described in Section 3-is based on well-known algorithms, their effectiveness when used together in estimating ARMA models able to accurately describe the sea frequency behaviour strongly depends on the choice of the free parameters.
Even if the parameter selection procedure is fundamental for the achievement of good spectral estimates, no detailed analyses are available in the literature in the field of sea spectral reconstruction. In light of this, this section aims at explaining how to choose the parameters (i.e., the ARMA orderp and the number of auto-covariance samples, ϕ) highlighting the limits of both the procedures presented in Section 4 and the effect of the various parameters on the accuracy of the sea spectral estimates. Moreover, it will be shown that also the choice related to the global time length T (i.e., T = N∆ t ) of the wave elevation time signal x is a parameter to take into account when applying the estimation procedure since it affects both the accuracy of the spectral estimates and the choice of the free parameters.
In order to analyse the accuracy of the method in estimating the sea spectral behaviour as a function of the free parameters, the index Y has been used. It represents the difference between the reconstructed S x , see Equation (21), and the actual JONSWAP spectrum S, see Equation (25), and it is defined as: The Y value is evaluated in a frequency range between ω 1 =10 −3 rad/s and ω M = π rad/s, with a frequency resolution of ∆ ω =10 −3 rad/s. These bounds allow including all the main frequency components of the wave elevation signal.
The following analyses will be thus based on the value of the index Y obtained by applying the method using different values of the free parameters. Before discussing in detail the results of the analyses, it is important to evidence the meaning of Y in terms of goodness of fit of the sea spectral reconstruction.
To this purpose, Figure 1 depicts some examples of comparisons in terms of PSD between S x and S in the case of test signal 1 in Table 1. The curves are aimed at showing the meaning of a given percentage Y error, and thus the information about the value of the free parameters is not relevant at this stage (i.e., the same Y can be obtained with different combinations of the free parameters). Figure 1a shows the PSD reconstruction associated with low values of Y expressed in percentage (i.e., Y ≤ 0.1) while Figure 1b shows the PSDs associated with larger values of Y (i.e., Y > 0.1). Obviously, the lower the value of Y, the better the PSD reconstruction. However, the maximum admissible error on the spectral reconstruction depends on the specific application.
The first parameter that has to be defined is the number of equations that can be used to solve the least-square problems involved in the ARMA parameter estimation procedure. Despite the increased AR order ̂ may seem the first and most important point to be addressed, it strongly depends on the maximum number of useful information available, given a certain time series ( ). To understand this point, some considerations about the auto-covariance series have to be discussed.  Table 1) with low (a) and larger (b) values.
As mentioned previously, the auto-covariance of the signal is modelled as a sum of damped exponential components, each corresponding to one pole of the ARMA model (see Equation (13)). Therefore, after some lags, the value of the estimated auto-covariance becomes low. Considering this aspect and recalling that the auto-covariance is estimated on a limited number of samples (see Equation (12)), two factors must be taken into consideration to judge the reliability of the estimated values: the unavoidable variance of the estimated auto-covariance and, in real cases, the presence of electrical noise superimposed to the physical signal provided by the transducer. Both these factors cause a scatter on the auto-covariance values, which becomes more and more significant as increases, when the oscillations of the auto-covariance become low. Therefore, from the mentioned -th auto-covariance sample on, the reliability of the auto-covariance curve becomes low and this makes its use for estimating the ARMA model improper. Thus, it is not enough to avoid the use of very large lag values, as suggested in the literature, but it is fundamental to estimate the length of the reliable part of the auto-covariance curve to the aim of estimating the ARMA model. This in turn has consequences on the number of  Table 1) with low (a) and larger (b) Y values.
The analyses presented further in this section are related to sea state conditions characterized just by the wind, and thus the considered spectra are mono-modal. The sea parameters considered are those related to type 1 signal in Table 1. However, to show the reliability of the outcomes and of the proposed procedure, also examples related to a different mono-modal sea state and a bi-modal case, where both wind and swell components are present, are discussed in Section 7.
Finally, it is worth mentioning that the proposed estimation procedure does not imply a high computational cost, and that the processing time is of the same order of magnitude of other competing approaches such as Welch's and Thomson's methods.

Effect of the Number of the Auto-Covariance Samples Used
The first parameter that has to be defined is the number of equations that can be used to solve the least-square problems involved in the ARMA parameter estimation procedure. Despite the increased AR orderp may seem the first and most important point to be addressed, it strongly depends on the maximum number of useful information available, given a certain time series x(n). To understand this point, some considerations about the auto-covariance series have to be discussed.
As mentioned previously, the auto-covariance r x of the signal is modelled as a sum of damped exponential components, each corresponding to one pole of the ARMA model (see Equation (13)). Therefore, after some k lags, the value of the estimated auto-covariance becomes low. Considering this aspect and recalling that the auto-covariance is estimated on a limited number of samples (see Equation (12)), two factors must be taken into consideration to judge the reliability of the estimated r x values: the unavoidable variance of the estimated auto-covariance and, in real cases, the presence of electrical noise superimposed to the physical signal provided by the transducer. Both these factors cause a scatter on the auto-covariance values, which becomes more and more significant as k increases, when the oscillations of the auto-covariance become low. Therefore, from the mentioned k-th auto-covariance sample on, the reliability of the auto-covariance curve becomes low and this makes its use for estimating the ARMA model improper. Thus, it is not enough to avoid the use of very large lag values, as suggested in the literature, but it is fundamental to estimate the length of the reliable part of the auto-covariance curve to the aim of estimating the ARMA model. This in turn has consequences on the number of equations that can be written to solve the Prony's and Shanks' problems and thus on the maximum orderp of the AR model.
One possible approach to select the proper lengthφ of r x to use in the estimation procedure, is to refer to the estimate of the auto-covariance of a white noise random process with unitary standard deviation (i.e., σ w = 1), acquired for N samples. The expected value of its normalised auto-covariance ρ(k) (i.e., the auto-covariance divided by its value at the null lag) is null for k > 0 and its standard deviation can be estimated as 1/ √ N. Therefore, the auto-covariance samples of a white random process, for k = 0, are assumed to be included in a range of ±3/ √ N with a confidence level of approximately 99%. Considering now the normalised auto-covariance of x (i.e., ρ x (k) = r x (k)/r x (0)), the correlation between the signal and its shifted version can be thus considered as not significant if the ρ x curve falls continuously within a range defined as ±3/ √ N. Thus, in that region, the low oscillations of ρ x are considered as not useful for the estimation process. This implies that only the first ϕ samples of the auto-covariance r x will be used for the estimation of the ARMA model because larger k values show a non-significant correlation level. Thus,φ indicates the number of auto-covariance lags ϕ (see Section 3.1) estimated using the ±3/ √ N range, that has to be used to write Equations (14) and (20).
An example of the application of the proposedφ selection procedure is shown in Figure 2 for signals of type 1 in Table 1, sampled at 1 Hz for 3600 s (i.e., 1 h). In order to give a reference of the theoretical shape of the signal auto-covariance, a sea-state of type 1 has been simulated also considering a time length of 24 h with sampling frequency equal to 1 Hz. Being this auto-covariance computed on a very large number of samples, it can be considered as the reference since, for the amount of lags depicted in the figure, the ratio k/N between the lag order and the total number of samples is considerably low. Figure 2 compares the auto-covariance sequence of three different time-series with the reference curve and with the corresponding ±3/ √ N range. It is possible to notice that the auto-covariance sequences obtained for the three different simulations become significantly different form each other when they are almost contained in the ±3/ √ N range, highlighting the low reliability associated with these data. Instead, except for small differences, the three curves follow the reference auto-covariance in the region of small lags when the ρ x values are not contained in the ±3/ √ N allowing us to identify the reliable part of the auto-covariance sequence. In this way it is possible to identify, for each time series, theφ number of samples that can be used in the spectral estimation procedure. Obviously, theφ-th sample slightly changes simulation by simulation because of its associated scatter (for the three time series of Figure 1 it is between 34 and 36). Nevertheless, this difference has negligible effects on the quality of the ARMA model identification, because the first and significant oscillations of ρ x are properly described.
In order to show the importance of choosing the right number of auto-covariance samples to be considered, and to show the reliability of the proposed selection procedure, several simulations have been carried out, varying the number of the auto-covariance samples used in the estimation process, ϕ. Particularly, 500 simulations for each selected ϕ value have been performed. For each of them, the Y index has been evaluated to quantify the accuracy of the estimation procedure. The results are presented in terms of box-plots (showing the Y percentage values of the median, the minimum, the maximum and the 25-th and 75-th percentiles) and of the number of outliers obtained for different ϕ values, starting from the minimum needed to solve Equation (14) (i.e., 2p + 1). Figure 3 shows the results obtained considering signals of type 1 of Table 1, acquired at a sampling frequency of 1 Hz for 1 h. In this case, the increased number of polesp has been set equal to 10 (details about the order selection procedure are given in Section 6.2) and q has been set equal to p, without any loss of generality (the same will apply for all the examples further in the paper). As mentioned previously, for this kind of sea wave elevation signal, the typicalφ values are between 34 and 36 (see Figure 2).  Table 1 for a signal 24 h long and three signals 1 h long, sampled at 1 Hz. The ±3/√ range is related to the three 1 h long signals. These three signals have between 34 and 36, depending on the considered time series.
Looking at Figure 3 it is interesting to notice that, when increases over approximately , the value of slightly increases (a larger increase occurs for larger values, not shown in the figure) because of the use of non-significant values of the auto-covariance sequence. Further details and evidence of this aspect are also given in Section 6.2. In the same way, the use of a number of samples that is too low (i.e., few samples more than those strictly necessary for solving Equation (14)) leads to poor results. Indeed, the left part of Figure 3, if compared to the area close to ≃ 34, shows higher values of the median error , a wider range between the maximum and the minimum and between the 25-th and 75-th percentiles, as well as a higher number of outliers, evidencing a higher dispersion of the estimation results and a greater variability of the spectral estimates. Therefore, the proposed approach to find shows to be effective since the range of the best values in the graph (i.e., the lowest values) is found to be close to the values. Moreover, it is noticed that slight changes of , such as those deriving from the use of different time series (e.g., ≃ 34 − 36) does not imply considerable changes in the estimation performance, but rather a slight increase (i.e., few lags) with respect to slightly improves the estimates. Thus, a slight increased value of is suggested in most of the cases (see also Section 6.3 for further details).
Therefore, this analysis evidences that it is essential to properly estimate the value of to be used in the estimation procedure in order to improve the identification of the ARMA model.
Other examples of the effect on the estimation accuracy of the choice of the number of auto-covariance samples used for the estimation are given in the following sections, where also the effect of the choice of the AR order and the length of the time series are considered.  Table 1

Choice of the Increased AR Order ̂
Once the value is defined, the other parameter that has to be defined is the AR order of the model, . As explained in Section 4.1, it is convenient to set the pole number to ̂> and then, once the ̂ poles are identified, only the most significant ones in terms of energy are selected. Here, in the most part of the simulations the energy threshold was set in order to discard the poles associated with systems whose energy was lower than 10% of the value associated with the system with the highest energy (see Section 4.1 and Section 8 for further details). In Section 4.1, the use of the AIC index is suggested to properly choose the ̂ value. Nevertheless, there are cases in which the AIC index (as well as other indexes such as, for example, the Bayesian Information Criterion [41]) does not provide clear indications about how to set the value of ̂ (e.g., [31]). In these cases, such as the one considered here (see Figure 4), the value of ̂ can be chosen as that allowing us to obtain a first stabilization in the AIC value, even if the AIC is not at its minimum (e.g., it is still slightly decreasing). As an example, Figure 4 shows the trend of the AIC Looking at Figure 3 it is interesting to notice that, when ϕ increases over approximatelŷ ϕ, the value of Y slightly increases (a larger increase occurs for larger ϕ values, not shown in the figure) because of the use of non-significant values of the auto-covariance sequence. Further details and evidence of this aspect are also given in Section 6.2. In the same way, the use of a number of samples that is too low (i.e., few samples more than those strictly necessary for solving Equation (14)) leads to poor results. Indeed, the left part of Figure 3, if compared to the area close toφ 34, shows higher values of the median error Y, a wider range between the maximum and the minimum and between the 25-th and 75-th percentiles, as well as a higher number of outliers, evidencing a higher dispersion of the estimation results and a greater variability of the spectral estimates. Therefore, the proposed approach to findφ shows to be effective since the range of the best ϕ values in the graph (i.e., the lowest Y values) is found to be close to theφ values. Moreover, it is noticed that slight changes ofφ, such as those deriving from the use of different time series (e.g.,φ 34 − 36) does not imply considerable changes in the estimation performance, but rather a slight increase (i.e., few lags) with respect toφ slightly improves the estimates. Thus, a slight increased value ofφ is suggested in most of the cases (see also Section 6.3 for further details).
Therefore, this analysis evidences that it is essential to properly estimate the value of ϕ to be used in the estimation procedure in order to improve the identification of the ARMA model.
Other examples of the effect on the estimation accuracy of the choice of the number of auto-covariance samples used for the estimation are given in the following sections, where also the effect of the choice of the AR order and the length of the time series are considered.

Choice of the Increased AR Orderp
Once theφ value is defined, the other parameter that has to be defined is the AR order of the model, p. As explained in Section 4.1, it is convenient to set the pole number top > p and then, once thep poles are identified, only the most significant ones in terms of energy are selected. Here, in the most part of the simulations the energy threshold t r was set in order to discard the poles associated with systems whose energy was lower than 10% of the value associated with the system with the highest energy (see Sections 4.1 and 8 for further details). In Section 4.1, the use of the AIC index is suggested to properly choose thep value. Nevertheless, there are cases in which the AIC index (as well as other indexes such as, for example, the Bayesian Information Criterion [41]) does not provide clear indications about how to set the value ofp (e.g., [31]). In these cases, such as the one considered here (see Figure 4), the value ofp can be chosen as that allowing us to obtain a first stabilization in the AIC value, even if the AIC is not at its minimum (e.g., it is still slightly decreasing). As an example, Figure 4 shows the trend of the AIC index, normalized with respect to the number of auto-covariance samples used in the estimation procedure,φ. In this case, the signal is of type 1 in Table 1, acquired for 1 h at a sampling frequency of 1 Hz. The two curves are related to two different time series for which a differentφ value was estimated. Looking at the curves, it is evident that the AIC does not provide a clear indication of the optimalp value to be used. However, it is possible to see that, after a first drop in correspondence of p almost equal to 4, the curves stabilize. One possibility to setp would be to choose a value close to the maximum allowed for the considered time series (i.e., that allowing to write a determined system in Equation (14) for the givenφ). However, this reduces the number of equations that can be written to solve the Equations (14) and (20), thus worsening the results of the least square minimisation. Therefore, even ifp needs to be larger than p, its value must be not too high and a balance between the number of equations that can be used and the increase of the pole number has to be reached. One possible solution is to choose thep value inside the stabilized region which guarantees a good number of equations to solve the least square problems and that is, at the same time, far from the first drop of the normalized AIC. Therefore,p is set to a value lower than the mentioned maximum, taking care not to move too close to the stabilisation point. In this case, a good balance is ap value of about 10. Of course, this choice is subjective (e.g., the value could be 8 or 12 as well); however, this does not modify the obtained results significantly.
To show the effect of the number of poles on the accuracy of the spectral estimates as a function of the number of the auto-covariance samples ϕ, many simulations were carried out on different wave elevation signals where both the sea state conditions and the length of the time histories were changed. For sake of clarity, and coherently with the test case study of the previous section (so that a straight comparison can be carried out), the results related to 1 h signals of type 1, sampled at 1 Hz, are shown. The consideredp range in this case was between 10 and 40 poles. The value of the maximum number of poles was set equal to 40 in the simulations to also test values very far from the maximum achievable with the usualφ in order to see its effect on the accuracy of the PSD estimates. The results are summarized in Figure 5, which shows a comparison in terms of accuracy of the spectral estimates, measured by the index Y, when the increased order of the poles isp = 10, 20 and 40. In the figure, the trends of the median and the 75-th percentile values of Y associated to eachp value are shown as functions of the value of ϕ. Comparing the curves in the figure, it is evident that, at a given value of ϕ, the value of Y is lower for the lowest value ofp and increasingp makes the identification worse (higher values of Y), with the only exception of the right part of the plot where the curves related top = 10 and 20 merge. Nevertheless, this part of the plot is related to ϕ values providing results that are not satisfactory.   Table 1 sampled at 1 Hz for 1 h.
To show the effect of the number of poles on the accuracy of the spectral estimates as a function of the number of the auto-covariance samples , many simulations were carried out on different wave elevation signals where both the sea state conditions and the length of the time histories were changed. For sake of clarity, and coherently with the test case study of the previous section (so that a straight comparison can be carried out), the results related to 1 h signals of type 1, sampled at 1 Hz, are shown. The considered ̂ range in this case was between 10 and 40 poles. The value of the maximum number of poles was set equal to 40 in the simulations to also test values very far from the maximum achievable with the usual in order to see its effect on the accuracy of the PSD estimates. The results are summarized in Figure 5, which shows a comparison in terms of accuracy of the spectral estimates, measured by the index , when the increased order of the poles is ̂ = 10, 20 and 40. In the figure, the trends of the median and the 75-th percentile values of associated to each ̂ value are shown as functions of the value of . Comparing the curves in the figure, it is evident that, at a given value of , the value of is lower for the lowest value of ̂ and increasing ̂ makes the identification worse (higher values of ), with the only exception of the right part of the plot where the curves related to = 10 and 20 merge. Nevertheless, this part of the plot is related to values providing results that are not satisfactory. The use of a lower level of overestimation of proves to provide better results because of two main reasons: (i) the use of high values of ̂ obliges to employ auto-covariance samples in the lag range where the auto-covariance is low and (ii) for a given value of , the number of equations used to solve the least square problems of Equations (14) and (20) is larger for lower values of .
Finally, it is worth highlighting that, as already shown in Section 6.1, the use of a number of auto-covariance samples higher than can significantly worsen the achievable performance. Indeed, the curves in Figure 5 show an increase of the index for high values. is 3600 s long and sampled at 1 Hz.

Effect of the Length of the Time Series
So far, all the examples used to support the presented analyses were related to long time histories, in particular 1 h time series were considered. Despite the proposed procedure and all the results are valid whichever signal is taken into account, it is worth paying attention to the problems that can arise when short-time signals are considered and showing the results that can be achieved in terms of accuracy of the estimated sea PSD. Indeed, samples in the lag range where the auto-covariance is low and (ii) for a given value of ϕ, the number of equations used to solve the least square problems of Equations (14) and (20) is larger for lower values ofp.
Finally, it is worth highlighting that, as already shown in Section 6.1, the use of a number of auto-covariance samples higher thanφ can significantly worsen the achievable performance. Indeed, the curves in Figure 5 show an increase of the Y index for high ϕ values.

Effect of the Length of the Time Series
So far, all the examples used to support the presented analyses were related to long time histories, in particular 1 h time series were considered. Despite the proposed procedure and all the results are valid whichever signal is taken into account, it is worth paying attention to the problems that can arise when short-time signals are considered and showing the results that can be achieved in terms of accuracy of the estimated sea PSD. Indeed, in these cases, problems can arise, making the use of the proposed procedure more cumbersome. To this purpose, sea wave elevation signals of 30 and 10 min are analysed, using a sea state of type 1 of Table 1, also to allow for a comparison with the already presented results. The first point that has to be analysed is the auto-covariance behaviour of the time series. Figures 6 and 7 present normalised auto-covariances ρ x achieved using x signals of 30 min and 10 min length, respectively. As mentioned in Section 6.1, the auto-covariance plot allows us to select the reliable portion of the samples that have to be used in the estimation procedure, since this allows for more accurate PSD estimates. However, the identification of theφ value using the ±3/ √ N can be not so straightforward if the length of the time series decreases too much. Indeed, looking at Figure 6, it is possible to notice that ρ x for the 30 min signal, after entering the ±3/ √ N band, starts oscillating and the amplitude of these oscillations increases as the lag increases. It can be also seen that one of the curves in Figure 6 overcomes the upper threshold due to the large scatter. Comparing Figures 2-6, it appears that this behaviour is instead not evident for the 1 h time series. Actually, also in the case of long time series (e.g., 1 h long) the behaviour is similar; however, it is related to higher lag values. This happens because, decreasing the time length (e.g., from 1 h to 30 min), even if the ±3/ √ N bounds become larger, at the same lag value the scatter associated with the auto-covariance of the short signal, is larger compared to the case of long signals since the considered lag is closer to the tail of the auto-covariance (i.e., higher k/N value).
Therefore, it is evident that, lowering the total number of samples, the auto-covariance shows a larger scatter for a fixed lag value. This can become a problem in identifying the significant portion of ρ x . However, the important point for the application of the method proposed to choose ϕ is to have the auto-covariance series within the ±3/ √ N for a significant number of lags in order to recognize when the auto-covariance becomes non-significant for the purpose of the ARMA identification. Looking at Figures 2 and 6, it is easy to notice that this is verified for time signals of 1 h and 30 min. Therefore, the procedure of Section 6.1 allows us to identifyφ values related to portions of the autocovariance series that are close to the reference auto-covariance curve, implying employing a part of r x that is significant (i.e.,φ 25 − 27 for 30 min signals, see Figure 6). However, in the two cases, the identified value ofφ is different and in particular decreases from 1 h to 30 min signals (i.e., fromφ 34 − 36 toφ 25 − 27) because of the larger value of ±3/ √ N. This helps in avoiding using portions of the r x that are too affected by its scatter, but in turn it limits the number of available points, and this has an effect on the achievable identification performance, as described in Section 6.1. To show this effect and the accuracy of the PSD estimates achievable with time histories of 30 min and 10 min, Figure 8 presents the results in terms of box plots and number of outliers, as those of Figure 3 for series 1 h long. Considering signals 30 min long, from Figure 8b it can be noticed that, as in the previous case, the best results are achieved for a ϕ value close toφ. Moreover, even if the quality of the reconstruction is lower reducing the time length (see Figures 3 and 8b), as expected, it is still satisfactory, which in turn means that 30 min is an acceptable time length for a proper ARMA identification.

Short-Time Series: Effect of the Number of Poles and the Sampling Frequency
The analyses of the previous section showed the problems that can arise when dealing with short-time series, due to the scatter of the auto-covariance samples that occur even for low lag values. One possibility to try to improve the achievable estimates is to reduce the increased order ̂ of the AR problem. This would allow increasing the overdetermination level of the least square problems of Equations (14) and (20) and thus to improve the accuracy of the estimates. However, the increase of the poles used to solve the first step of the estimation procedure will be lower than in the previous case (i.e., the difference between ̂ and will be lower). This choice can have consequences in the pole-selection procedure based on the energetic criterion and also implies the risk of choosing a ̂ value too close or even lower than . In order to evaluate the effect of the order reduction on the estimates, a new lower ̂ value has been selected for the 10 min long time series, following the criterion described in Section 6.2. The trend of the normalized AIC index is shown in Figure 9 for two different 10 min long time series. In this case the ̂ value must be close to the first drop of the AIC value in order to be able to solve in the least square sense the minimisation problem of Equation (14). Therefore, Figure 9 suggests in this case a value of ̂ of about 5, though, as in the previous case, its interpretation is not straightforward. With this value of , new simulations have been carried out to evaluate the accuracy of the obtained sea PSD estimates, which are shown in Figure 10b. In this case, in order to allow for a straight comparison with the results related to the case where the sampling frequency is increased (see further in this section), also some noise has been added to the wave elevation signal. The signal-to-noise ratio has been set equal to 30 dB (assuming a noise having flat power spectrum). Before analysing the results, it is worth mentioning that the effect of an increased noise level does not affect significantly the results shown for 30 min and 1 h time series as well as for 10 min, as mentioned further in this section, mainly thanks to the use of auto-covariance series in the estimation process and to the least square solution of the problems involved in the proposed procedure. Comparing Figure 10a where ̂= 10 to Figure 10b where ̂= 5, and the signal-to-noise-ratio If the length of the time signal is further decreased, for example to 10 min, the problems discussed previously become more and more evident. Considering Figure 7, which shows the auto-covariance for signals 10 min long, it is clear that it becomes difficult to clearly identify a lag range after the first oscillations where the curve remains within the ±3/ √ N range. Moreover, the four auto-covariance series are different even for small lag values (e.g., at lag 15), because of the lower ratio between the lag value and the total amount of samples of the original signal, meaning that the usable portion of the auto-covariance curve is strongly decreased. Therefore, sometimes, it can occur that the criterium proposed to findφ leads to a very small number of usable lag values. In these cases, there are various possibilities to manage the situation. One possibility is to use a small value forp. This case is treated in Section 6.4, as well as the possibility to use an increased sampling frequency for the time series x. The other possibility is to use ap values that is usually fine for longer time series (e.g.,p = 10) and accepting to increase ϕ to a value allowing to write a determined (or slightly overdetermined) system for the solution of the Prony's problem. This case is treated in this section in order to compare the identification results when the time length of x is changed. For example, Figure 7 shows that, in the case of 10 min signals in some cases theφ lag is around 24 while for some others the value ofφ can be around the 17-th lag. In these latter cases,φ is not enough for writing all the required equations in Equation (14) withp = 10. Therefore, the number of the auto-covariance points that has to be considered must be increased at least up to 21. In order to show the effect of these problems on the accuracy of the estimated PSD, Figure 8a presents the box plot for the signals 10 min long withp = 10. Since the observed phenomenon is always the same (i.e., sea state of type 1), the choice of maintaining thep value equal to the previous cases is reasonable. Figure 8a, shows that increasing ϕ over 21 (i.e., the number of auto-covariance samples to be used for having as many equations as the number of unknowns in Equation (14)), provides a very slight decrease of the value Y and the number of outliers. Conversely, if the number of the used auto-covariance samples increases over about 24, the value Y starts increasing due to the fact that the added auto-covariance samples are already unreliable. Indeed, even if using 26 lag samples (as an example) means to employ few unreliable auto-covariance samples, their number is not negligible compared to the total number of auto-covariance samples used for the identification, thus turning out in a worsening of the procedure results. Therefore, in the cases of short time series, it is not convenient to slightly increaseφ value, as instead occurred in the case of long-time signals. Moreover, Figure 8a shows that the Y results for 10 min are far from those achievable with either 30 or 60 min for all the possible ϕ values.
As mentioned, other possible ways to deal with short-time series are: the use of a lower number of polesp and the increase the sampling frequency. These methods, explained in the next section, will also allow to slightly improve the estimation accuracy.

Short-Time Series: Effect of the Number of Poles and the Sampling Frequency
The analyses of the previous section showed the problems that can arise when dealing with short-time series, due to the scatter of the auto-covariance samples that occur even for low lag values. One possibility to try to improve the achievable estimates is to reduce the increased orderp of the AR problem. This would allow increasing the overdetermination level of the least square problems of Equations (14) and (20) and thus to improve the accuracy of the estimates. However, the increase of the poles used to solve the first step of the estimation procedure will be lower than in the previous case (i.e., the difference betweenp and p will be lower). This choice can have consequences in the pole-selection procedure based on the energetic criterion and also implies the risk of choosing ap value too close or even lower than p. In order to evaluate the effect of the order reduction on the estimates, a new lowerp value has been selected for the 10 min long time series, following the criterion described in Section 6.2. The trend of the normalized AIC index is shown in Figure 9 for two different 10 min long time series. In this case thep value must be close to the first drop of the AIC value in order to be able to solve in the least square sense the minimisation problem of Equation (14). Therefore, Figure 9 suggests in this case a value ofp of about 5, though, as in the previous case, its interpretation is not straightforward. With this value ofp, new simulations have been carried out to evaluate the accuracy of the obtained sea PSD estimates, which are shown in Figure 10b. In this case, in order to allow for a straight comparison with the results related to the case where the sampling frequency is increased (see further in this section), also some noise has been added to the wave elevation signal. The signal-to-noise ratio has been set equal to 30 dB (assuming a noise having flat power spectrum). Before analysing the results, it is worth mentioning that the effect of an increased noise level does not affect significantly the results shown for 30 min and 1 h time series as well as for 10 min, as mentioned further in this section, mainly thanks to the use of auto-covariance series in the estimation process and to the least square solution of the problems involved in the proposed procedure. Comparing Figure 10a wherep = 10 to Figure 10b wherep = 5, and the signal-to-noise-ratio is the same, it is evident that the results obtained usingp = 5 shows an improvement of the accuracy of the spectral reconstruction of few percentage points (i.e., about 5%), though the achieved results do not reach the accuracy values achievable with longer time series (see Figures 3 and 8b).
Another possible way to improve the results achievable with short-time series is to slightly increase the sampling frequency of the signal. In order to verify if this could provide benefits, simulations have been carried out increasing the sampling frequency of the original time sequence x. When the sampling frequency f samp is increased, different factors must be considered: • in real cases, electrical noise is superimposed to the physical signal in the time sequence x, and an increase of the sampling frequency turns into an increase of the noise power. This implies a larger scatter associated to the auto-covariance sequence, especially when its value is low. Therefore, this worsens the ARMA identification; • on the other hand, a larger number of samples can be used to write Equation (14), thus improving the least square solution, because, for the same lag value in the time domain (i.e., for the same k∆ t ), a larger number of lag samples is obtained due the smaller value of ∆ t ; • given the spectral content of the wave elevation signals (i.e., far below 1 Hz) and the values of sampling frequency considered (i.e., 1 Hz), the expected improvement of the auto-covariance estimates provided by an increase of the sampling frequency is low. Indeed, according to [42], only a slight decrease of the auto-covariance scatter is obtained increasing f samp from 1 Hz. is the same, it is evident that the results obtained using ̂= 5 shows an improvement of the accuracy of the spectral reconstruction of few percentage points (i.e., about 5%), though the achieved results do not reach the accuracy values achievable with longer time series (see Figures 3 and 8b).  Table 1 sampled at 1 Hz for 10 min.
Another possible way to improve the results achievable with short-time series is to slightly increase the sampling frequency of the signal. In order to verify if this could provide benefits, simulations have been carried out increasing the sampling frequency of the original time sequence . When the sampling frequency is increased, different factors must be considered: • in real cases, electrical noise is superimposed to the physical signal in the time sequence x, and an increase of the sampling frequency turns into an increase of the noise power. This implies a larger scatter associated to the auto-covariance sequence, especially when its value is low. Therefore, this worsens the ARMA identification; • on the other hand, a larger number of samples can be used to write Equation (14), thus improving the least square solution, because, for the same lag value in the time domain (i.e., for the same Δ ), a larger number of lag samples is obtained due the smaller value of Δ ; • given the spectral content of the wave elevation signals (i.e., far below 1 Hz) and the values of sampling frequency considered (i.e., 1 Hz), the expected improvement of the auto-covariance estimates provided by an increase of the sampling frequency is low. Indeed, according to [42], only a slight decrease of the auto-covariance scatter is obtained increasing from 1 Hz.
According to the above points, no large improvements can be obtained by increasing . Simulations using = 2 Hz for the case of 10 min long signals, ̂= 10 and noise added (signal-to-noise ratio equal to 30 dB for = 1 Hz and to approximately 27 dB for = 2 Hz, assuming noise with a flat power spectrum) confirmed this expectation. It is noticed that is straightforward to adapt the procedure described in Section 3 to non-unitary sampling frequency. Comparing Figure 10a,c, where only the sampling frequency changes, the best performance obtained for the highest sampling frequency is slightly better than that obtained with = 1. The main benefit provided by employing = 2 Hz is related to the fact that changes in the number of lags used to write  Table 1 Figure 10b,c, it is clear that both the proposed strategies (i.e., decrease of ̂ and increase of ) provide the same slight performance improvement; thus, neither the use of one nor the other is preferred. Finally, it is worth mentioning that use of the two strategies together leads to further slight improvements (i.e., about 1 percentage point).
Finally, to summarize, this analysis evidenced that a decrease of the time length of (e.g., under 20 min) always implies a worsening of the model identification. Furthermore, for signals with short duration, the variance associated to is significant compared to the range ±3/√ even at low order lag values, thus making difficult the use of the proposed criterion for the choice of . Therefore, the two proposed strategies can be applied to slightly improve the estimation accuracy, that are to choose a low ̂ value or to slightly increase the sampling frequency.

PSD Estimation Results for Different Sea Conditions
Finally, to show the reliability of the proposed procedure, some results are shown for different mono-modal and bi-modal sea state conditions. In particular, sea states 2 and 3 According to the above points, no large improvements can be obtained by increasing f samp . Simulations using f samp = 2 Hz for the case of 10 min long signals,p = 10 and noise added (signal-to-noise ratio equal to 30 dB for f samp = 1 Hz and to approximately 27 dB for f samp = 2 Hz, assuming noise with a flat power spectrum) confirmed this expectation. It is noticed that is straightforward to adapt the procedure described in Section 3 to non-unitary sampling frequency. Comparing Figure 10a,c, where only the sampling frequency changes, the best performance obtained for the highest sampling frequency is slightly better than that obtained with f samp = 1. The main benefit provided by employing f samp = 2 Hz is related to the fact that changes in the number of lags used to write Equation (14) generate less changes of the results (see the right part of plot (c)), compared to the case of 1 Hz. Furthermore, as mentioned, comparing Figures 8a and 10a, it is evident that the noise does not cause significant worsening of the ARMA identification because the two figures show similar Y results, regardless of the presence of noise. Moreover, comparing Figure 10b,c, it is clear that both the proposed strategies (i.e., decrease ofp and increase of f samp ) provide the same slight performance improvement; thus, neither the use of one nor the other is preferred. Finally, it is worth mentioning that use of the two strategies together leads to further slight improvements (i.e., about 1 percentage point).
Finally, to summarize, this analysis evidenced that a decrease of the time length of x (e.g., under 20 min) always implies a worsening of the model identification. Furthermore, for signals x with short duration, the variance associated to ρ x is significant compared to the range ±3/ √ N even at low order lag values, thus making difficult the use of the proposed criterion for the choice ofφ. Therefore, the two proposed strategies can be applied to slightly improve the estimation accuracy, that are to choose a lowp value or to slightly increase the sampling frequency.

PSD Estimation Results for Different Sea Conditions
Finally, to show the reliability of the proposed procedure, some results are shown for different mono-modal and bi-modal sea state conditions. In particular, sea states 2 and 3 of Table 1 have been considered and signals of 3600 s long, withp = 10 and with f samp = 1 Hz have been simulated using the procedure of Section 5. The box plot of Y is shown in Figure 11 for sea state 2. It is evident that the performance of the reconstruction is not far from that obtained for sea state 1 using the same simulation parameters (see Figure 3). Some reconstructed PSDs compared to the reference one are also provided in Figure 12. The shown PSDs are related to Y values close to the median, 25-th percentile and 75-th percentile values and are found using a number of auto-covariance samples equal toφ. Finally, some identifications have been carried out for sea state 3 (bi-modal sea state) using again the number of samples obtained applying the range ±3/ √ N for the model identification process. The median value of Y resulted equal to approximately 13%, while the 25-th and 75-th percentiles to approximately 10% and 17%, respectively, thus evidencing a good match between the original and reconstructed PSD. Some reconstructed PSDs are compared to the reference one in Figure 13, for Y values close to the median, 25-th percentile and 75-th percentile values. Therefore, the proposed method and the suggested procedures to set the parameters involved in the analysis show good results even when different sea states are considered. Indeed, the method allows for achieving good sea PSD reconstructions even though, especially in the case of bi-modal sea state, a deeper analysis needs to be carried out and just a first analysis is presented.
procedures to set the parameters involved in the analysis show good results even when different sea states are considered. Indeed, the method allows for achieving good sea PSD reconstructions even though, especially in the case of bi-modal sea state, a deeper analysis needs to be carried out and just a first analysis is presented.  Table 1, ̂ = 10, 3600 s long and sampled at 1 Hz. The typical values of are between 23 and 27. Figure 12. Reconstructed PSD and the corresponding PSD generated with the procedure given in Section 5 (sea state 2 in Table 1) for different values.  Table 1,p = 10, x 3600 s long and sampled at 1 Hz.
The typical values ofφ are between 23 and 27.
compared to the reference one in Figure 13, for values close to the median, 25-th percentile and 75-th percentile values. Therefore, the proposed method and the suggested procedures to set the parameters involved in the analysis show good results even when different sea states are considered. Indeed, the method allows for achieving good sea PSD reconstructions even though, especially in the case of bi-modal sea state, a deeper analysis needs to be carried out and just a first analysis is presented.  Table 1 Table 1) for different values.  Table 1) for different Y values.  Figure 13. Reconstructed PSD and the corresponding PSD generated with the procedure given in Section 5 (sea state 3 in Table 1) for different values.

Guidelines
The discussions presented in the previous sections showed the effect of the parameters involved in the ARMA model estimation process on the accuracy and on the effectiveness of the proposed approach when the frequency behaviour of sea wave elevation has to be estimated. The main parameters that have to be set are the number of poles of the increased order AR model, , and the number of auto-covariance samples that can be used in the estimation process, . These two parameters are shown to be strictly related and to significantly influence the effectiveness of the procedure. The analyses carried out allow for defining guidelines that can be followed to set the ̂ and value. They can be chosen as follows: • the value of can be estimated by plotting the normalised auto-covariance of the considered time series ( ) and finding the lag for which its oscillations remain within a range ±3/√ . It is also noticed that simulations suggest to increase of some samples the value obtained with the range ±3/√ because this slightly improves the spectral reconstruction. This latter consideration does not hold for short time series; • the value of ̂ has to be overestimated compared to , that is unknown, however, it must not be increased excessively not to worsen the ARMA identification. To select a suitable value of , the AIC index can be used and ̂ can be chosen as the value which allows to obtain a stabilization in the AIC value after the first drop, even if the AIC is not at its minimum, and at the same time to solve the Prony's and Shanks' problems in the least square sense. The simulations carried out showed that when the sea state is mono-modal, a reasonable value of ̂ can be about 10. Higher values showed to be useless. For short-time series, this value can be decreased to 5 allowing the use of a reasonable number of auto-covariance samples in the minimisation procedure. For bi-modal sea, few analyses on long time histories were carried out so far, which showed similar results and suggested similar values of . However, the procedure described in Section 5 allows for simulating different sea states, making it possible to have a more detailed estimate of the possible value of ̂ in case of given sea states. Figure 13. Reconstructed PSD and the corresponding PSD generated with the procedure given in Section 5 (sea state 3 in Table 1) for different Y values.

Guidelines
The discussions presented in the previous sections showed the effect of the parameters involved in the ARMA model estimation process on the accuracy and on the effectiveness of the proposed approach when the frequency behaviour of sea wave elevation has to be estimated. The main parameters that have to be set are the number of poles of the increased order AR model,p, and the number of auto-covariance samples that can be used in the estimation process, ϕ. These two parameters are shown to be strictly related and to significantly influence the effectiveness of the procedure. The analyses carried out allow for defining guidelines that can be followed to set thep and ϕ value. They can be chosen as follows: • the value of ϕ can be estimated by plotting the normalised auto-covariance ρ x of the considered time series x(n) and finding the lag for which its oscillations remain within a range ±3/ √ N. It is also noticed that simulations suggest to increase of some samples the ϕ value obtained with the range ±3/ √ N because this slightly improves the spectral reconstruction. This latter consideration does not hold for short time series; • the value ofp has to be overestimated compared to p, that is unknown, however, it must not be increased excessively not to worsen the ARMA identification. To select a suitable value ofp, the AIC index can be used andp can be chosen as the value which allows to obtain a stabilization in the AIC value after the first drop, even if the AIC is not at its minimum, and at the same time to solve the Prony's and Shanks' problems in the least square sense. The simulations carried out showed that when the sea state is mono-modal, a reasonable value ofp can be about 10. Higher values showed to be useless. For short-time series, this value can be decreased to 5 allowing the use of a reasonable number of auto-covariance samples in the minimisation procedure. For bi-modal sea, few analyses on long time histories were carried out so far, which showed similar results and suggested similar values ofp. However, the procedure described in Section 5 allows for simulating different sea states, making it possible to have a more detailed estimate of the possible value ofp in case of given sea states.

•
A slight increase of the sampling frequency provides little improvement in the estimation accuracy when short time series are used. • A final remark is related to the energy threshold t r used for the selection of the final AR order of the ARMA model, starting from an increased AR order model,p. In these analyses, it was set equal to an energy level at least equal to a given percentage P e of the energy of the subsystem with the highest energy (see Equation (24)). Usually, in these analyses, P e was set to 10%. Therefore, the discarded poles were those associated to systems whose energy was lower than 10% of the value associated to the system with the highest energy. For mono-modal sea states, this choice often led to a final AR order equal to 4. Nevertheless, it was found that lowering P e to 5% does not generate significant changes of the results for mono-modal sea state, while it provides benefits in case of bi-modal sea state.
Finally, in light of these considerations, the whole spectral estimation procedure can be summarised by the workflow depicted in Figure 14. In case of applications aimed at continuous sea state monitoring, the values of the input parametersp and t r can be periodically checked and updated, following the abovementioned procedure. • A slight increase of the sampling frequency provides little improvement in the estimation accuracy when short time series are used. • A final remark is related to the energy threshold used for the selection of the final AR order of the ARMA model, starting from an increased AR order model, . In these analyses, it was set equal to an energy level at least equal to a given percentage of the energy of the subsystem with the highest energy (see Equation (24)). Usually, in these analyses, was set to 10%. Therefore, the discarded poles were those associated to systems whose energy was lower than 10% of the value associated to the system with the highest energy. For mono-modal sea states, this choice often led to a final AR order equal to 4. Nevertheless, it was found that lowering to 5% does not generate significant changes of the results for mono-modal sea state, while it provides benefits in case of bi-modal sea state.
Finally, in light of these considerations, the whole spectral estimation procedure can be summarised by the workflow depicted in Figure 14. In case of applications aimed at continuous sea state monitoring, the values of the input parameters ̂ and can be periodically checked and updated, following the abovementioned procedure.

Conclusions
The paper proposes a procedure to estimate an ARMA model for the purpose of the spectral reconstruction of wave elevation time series and to properly set the involved parameters. Indeed, its main focus was on an analysis aiming at comprehending the performance in terms of reconstruction accuracy when the identified ARMA model is used. Specific focus has been given to three different parameters that must be set before carrying out the identification and the spectral reconstruction: the global length of the signal , the number of poles ̂ to be initially assigned to the model to be estimated, and the number of auto-covariance samples to be employed for the identification procedure. The paper proposes some guidelines to properly set these parameters in order to obtain the best possible accuracy for a given time series.

Conclusions
The paper proposes a procedure to estimate an ARMA model for the purpose of the spectral reconstruction of wave elevation time series and to properly set the involved parameters. Indeed, its main focus was on an analysis aiming at comprehending the performance in terms of reconstruction accuracy when the identified ARMA model is used. Specific focus has been given to three different parameters that must be set before carrying out the identification and the spectral reconstruction: the global length of the signal x, the number of polesp to be initially assigned to the model to be estimated, and the number of auto-covariance samplesφ to be employed for the identification procedure. The paper proposes some guidelines to properly set these parameters in order to obtain the best possible accuracy for a given time series.
Moreover, the analysis of Section 6 has shown that the total time length T of the signal x is a critical parameter for a proper frequency reconstruction of the sea state. Indeed, for T lower than 1800 s (i.e., 30 min), that corresponds to N equal to 1800 for f samp = 1 Hz, the quality of the reconstruction worsens significantly even when optimal parameters are used. Conversely, for larger T values the results are satisfactory, and the strategy proposed to find the number of auto-covariance samples and the AR order works properly. When T is low, an increase of the sampling frequency can lead to some benefits, even if the reconstruction performance remains far from those achieved increasing T.
The main part of the analysis has been focused to mono-modal sea states, even if preliminary analyses show good performances also on bi-modal sea states. A deeper insight will be necessary in this case in order to understand how to take into consideration the different decay rates of the different components of the auto-covariance function. Moreover, the authors are currently studying some approaches to improve the spectral reconstruction through ARMA models in case of signals of short duration. The paper demonstrated the potentiality of the approach and the feasibility of its use in the field of sea state analysis when the parameters involved in the estimation process are properly chosen. This justifies future studies, currently in progress, on the comparison of the present approach with other competing spectral estimation methods, presently investigated for the purpose of sea spectral reconstruction.