Wind Speed Modeling by Nested ARIMA Processes

Sim, So-Kumneth; Maass, Philipp; Lind, Pedro G.

doi:10.3390/en12010069

Open AccessArticle

Wind Speed Modeling by Nested ARIMA Processes

by

So-Kumneth Sim

,

Philipp Maass

and

Pedro G. Lind

^*

Fachbereich Physik, Universität Osnabrück, Barbarastrasse 7, 49076 Osnabrück, Germany

^*

Author to whom correspondence should be addressed.

Energies 2019, 12(1), 69; https://doi.org/10.3390/en12010069

Submission received: 27 November 2018 / Revised: 19 December 2018 / Accepted: 23 December 2018 / Published: 26 December 2018

Download

Browse Figures

Versions Notes

Abstract

Wind speed modelling is of increasing interest, both for basic research and for applications, as, e.g., for wind turbine development and strategies to construct large wind power plants. Generally, such modelling is hampered by the non-stationary features of wind speed data that, to a large extent, reflect the turbulent dynamics in the atmosphere. We study how these features can be captured by nested ARIMA models. In this approach, wind speed fluctuations in given time windows are modelled by one stochastic process, and the parameter variation between successive windows by another one. For deriving the wind speed model, we use 20 months of data collected at the FINO1 platform at the North Sea and use a variable transformation that best maps the wind speed onto a Gaussian random variable. We find that wind speed increments can be well reproduced for up to four standard deviations. The distributions of extreme variations, however, strongly deviate from the model predictions.

Keywords:

wind speed forecast; ARIMA model; wind power plants

1. Introduction

When planning wind power plants, the choice of its location as well as the precise positions of the composing wind turbines depend on estimates of wind speed at those particular spots. By deriving efficient models that can, under certain accuracy, convert wind speed estimates into wind power estimates, one is able to forecast deficit and excess of generated power and decide in advance the amount of necessary energy to handle in the market and for maintaining stable functioning of power grids.

ARIMA (Autoregressive Integrated Moving Average) models have been often used in the last decades [1] for modelling wind speed and wind power variations over large time lags, typically of the order of one hour. They are simple to implement and, if of low order, can be interpreted as discretised forms of differential equations. They were used for modelling output of wind power plants in connection with investment strategies in the energy market [2], variations of energy demand [3], and monthly variations of wind power generation [4]. For describing fluctuations, on shorter time-scales of about 10 min, ARIMA models could be applied by increasing the number of regression and moving average terms (higher order) and introducing frequency decomposition [5]. One main advantage of ARIMA models is their cheap computational cost, since they are based on simple iterative procedures [6].

More sophisticated recursive models are the so-called artificial neural networks, which include nonlinear terms and derive the weight of each particular contribution through a learning process, based on previous sets of measurements. They have been applied also for wind speed prediction [7]. Though more precise in value prediction than standard ARIMA models [8,9], artificial neural networks are difficult to interpret due to their higher complexity.

For this reason, ARIMA models remain as a widely used approach for wind speed modelling [10], sometimes in combination with other methods. For example, hourly and ten minute averages of wind speed data collected in Mexico were modelled with an ARIMA model and compared with a neural network that uses additional exogenous meteorological variables [8]. Though using considerably more data, neural networks seem to guarantee improvements not better than 6%. ARIMA models with exogenous variables have been also developed [11] for modelling 15 min average wind speed. By studying wind speed series from China, it has been found that an ARIMA model combined with empirical mode decomposition has a better accuracy in comparison with neural networks and other machine learning approaches [12], achieving a mean absolute error smaller than 4%. However, this accurary is obtained for ten minute averages. For one hour, the accuracy of all models decreases considerably. A further approach, using empirical mode decomposition in combination with neural networks, was followed in [13]. In a recent review, different approaches for modelling ultra-short, short, medium and long-term wind speed fluctuations are discussed [14], including neural networks, ARIMA models and wavelet methods.

With respect to wind data analysis, there are two major issues that challenge the application and usefulness of ARIMA models. First, wind speeds are not Gaussian distributed, but typically can be well described by a Weibull distribution [15]. Secondly, the distribution of wind speed increments is also not Gaussian, showing fat tails [16]. These issues raise the question whether these problems can be resolved by a refined approach based on ARIMA processes. In this paper, we investigate up to which extent ARIMA models can reproduce statistical features of wind speed data, by addressing both these issues. We focus on 15 min of wind speed measurements collected at the FINO1 platform at the North Sea [17], see Figure 1a,b. These measurements were performed over a period of 20 months with a sample frequency of 1 Hz. The choice of 15 min is motivated by forecasting strategies in energy markets, which commonly use predictions of energy production 15 min ahead.

To address the non-Gaussianity of wind speed distributions, we consider a transformed variable, which is a power of the wind speed. The respective exponent is determined by the requirement that the distribution of the transformed variable fits a Gaussian distribution as best as possible. The corresponding optimisation procedure is outlined in the Appendix A, where we provide a table also of exponent values as a function of the two parameters specifying the Weibull distribution. This can be used for other wind speed data sets.

We find that the best ARIMA model for the wind speed data on the time scale of 15 min is an ARIMA

(0, 1, 2)

model. The coefficients change for each 15 min window and are also modelled by ARIMA processes. We call this combination of the ARIMA

(0, 1, 2)

model with the ARIMA models for its coefficients a nested ARIMA model and show that it is able to predict distributions of wind speed increments over a range of up to four standard deviations. Beyond this range, however, this approach has its limits. The methodology resembles that of the so-called superstatistics approach [18] and earlier work on the distribution of velocity increments in turbulent flows [19]. Distribution parameters have been modelled with stochastic differential equations, for instance to model the evolution of volume-price distributions in the stock exchange [20,21]. Stochastic differential equations are a further means of data modelling and can in general be interpreted straightforwardly. They have been recently applied for describing short-time fluctuations of wind power outputs in wind turbines and wind farms [22,23].

In Section 2, we describe the ARIMA model for the wind speed and its parameters. In Section 3, we first describe the data used and the pre-processing of these data. Then, we derive the ARIMA model that best suits the set of measurements on the scale of 15 min. In Section 4, we introduce the nested ARIMA model and compare its predictions for the distributions of wind speed increments with the corresponding distributions of the measured values. In Section 5, we summarise our main results and discuss their implications for future research.

2. ARIMA-Model

ARIMA model stands for Autoregressive Integrated Moving Average model [10], where the evolution of a variable x in discrete time t is described as

Δ^{d} x_{t} = \sum_{j = 1}^{p} ϕ_{j} Δ^{d} x_{t - j} + \sum_{k = 1}^{q} θ_{k} w_{t - k} + w_{t} .

(1)

Here, p, d and q are the parameters quantifying the order of the model, usually represented as

(p, d, q)

,

ϕ_{j}

(

j = 1, \dots, p

) and

θ_{k}

(

k = 1, \dots, q

) are coefficients, and

w_{k}

are uncorrelated Gaussian random numbers with zero mean and variance

σ_{w}^{2}

. The operator

Δ

corresponds to a discrete derivative and is defined as

Δ x_{t} = x_{t} - x_{t - 1} .

(2)

Without loss of generality, the mean of

x_{t}

is zero. Parameter d specifies the minimum number of “differentiations” needed to map the original (possibly non-stationary) process onto a stationary process. Parameter p specifies the number of previous x values entering the ARIMA process and defines the order of the auto-regressive (AR) part of the model. Parameter q gives the number of previous additive Gaussian white noise terms and defines the order of the moving average (MA) part of the model.

For modelling a time series

{x_{t}}_{1 \leq t \leq N}

, an ARIMA process can be derived by estimating its order

(p, d, q)

, the sets

ϕ = {ϕ_{j}}_{1 \leq j \leq p}

,

θ = {θ_{k}}_{1 \leq k \leq q}

of coefficients and the noise strength

σ_{w}

by a maximum likelihood estimator in combination with the Bayesian Information Criterion [24]. The corresponding procedure is illustrated in Figure 2 for a test series

{x_{t}^{(test)}}_{1 \leq t \leq N}

of

N = 10^{5}

values generated by Equation (1) with

(p, d, q) = (2, 1, 2)

,

σ_{w} = 0.1

, and coefficients

ϕ_{1} = 0.5

,

ϕ_{2} = - 0.1

,

θ_{1} = 0.3

, and

θ_{2} = - 0.2

. The test series is shown in Figure 2a, and in Figure 2b the series of its increments

Δ x_{t}^{(test)} = [x_{t}^{(test)} - x_{t - 1}^{(test)}]

is plotted; the corresponding distributions are shown in the right panels of these graphs.

To estimate parameter d, the unit-root test [10] is performed, starting with

d = 0

. In case it fails, the order d is increased by one and the test repeated. The first value of d for which

Δ^{d} x_{t}

fulfils the unit-root test is chosen. For the series

{x_{t}^{(test)}}_{1 \leq t \leq N}

, this yielded the correct value

d = 1

.

To determine q and p, first maximal values

τ_{q}

and

τ_{p}

of these parameters are estimated from analysing the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the time series

{x_{t}}_{1 \leq t \leq N}

, respectively [10,25]. The ACF measures the correlation between two increment values

Δ^{d} x_{t}

and

Δ^{d} x_{t + τ}

and, for

d = 1

; it is given by

γ (τ) = \frac{〈 Δ x_{t + τ} Δ x_{t} 〉}{σ^{2}},

(3)

where

σ^{2}

is the variance of the increments

Δ x_{t}

. In general, the ACF must be defined with time-dependent variances

σ_{x}^{2} (t), σ_{x}^{2} (t + τ)

, but in this case the time series

Δ x_{t}

is stationary and

σ^{2}

becomes time-independent. For the test series, the ACF is shown in Figure 2c. The upper bound

τ_{q}

of q is estimated from the smallest value of

τ

, where the ACF becomes negative, i.e.,

τ_{q} = {min}_{τ} {γ (τ) < 0}

. In Figure 2c, the corresponding value

τ_{q} = 3

is indicated by the arrow.

The PACF quantifies the direct correlation between

x_{t}

and

x_{t - τ}

after removing the linear dependence on the intermediate values

x_{t - k}

with

k = 1, \dots, τ - 1

. Specifically, the PACF

φ (τ)

is the last component of the vector [25]

\begin{matrix} (\begin{matrix} φ (1) \\ ⋮ \\ ⋮ \\ φ (τ) \end{matrix}) & = & {(\begin{matrix} γ (0) & γ (1) & \dots & γ (τ - 1) \\ γ (1) & γ (0) & \dots & γ (τ - 2) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ γ (τ - 1) & γ (τ - 2) & \dots & γ (0) \end{matrix})}^{- 1} (\begin{matrix} γ (1) \\ ⋮ \\ ⋮ \\ γ (τ) \end{matrix}) . \end{matrix}

(4)

For the test series, the PACF is shown in Figure 2d. We estimate the upper bound

τ_{p}

from a least square fit of

| φ (τ) |

to an exponential decay

| φ (0) | exp (- τ / {\tilde{τ}}_{p})

, where

τ_{p} = int ({\tilde{τ}}_{p}) + 1

. The corresponding value

τ_{p} = 4

is indicated by the arrow in Figure 2d.

Given d as well as p and q in

0 \leq p \leq τ_{p}

and

0 \leq p \leq τ_{q}

, a maximum likelihood function estimator can be defined, based on assumptions regarding Gaussian statistics of residuals and by using reasonable initialisations of the recurrence Equation (1) (see, e.g., reference [10] for details). The likelihood function

L_{X} (ϕ, θ, σ_{w})

estimates the probability of finding the time series

X = {x_{t}}_{1 \leq t \leq N}

realised for the sets

ϕ = {ϕ_{j}}_{1 \leq j \leq p}

,

θ = {θ_{k}}_{1 \leq k \leq q}

of coefficients and the noise strength

σ_{w}

in the ARIMA process (1). Maximising

L_{X} (ϕ, θ, σ_{w})

with respect to the model parameters, best estimators

\hat{ϕ} (p, q)

,

\hat{θ} (p, q)

, and

{\hat{σ}}_{w} (p, q)

are obtained for each p and q.

The maximum likelihood method yields parameter sets for each p and q, but one still needs to decide which of the p and q in the ranges

0 \leq p \leq τ_{p}

and

0 \leq p \leq τ_{q}

should be selected. Larger p and q will give better maximised likelihoods, but at the price of a higher complexity of the model. To take into account both (i) how well the models fit the data, quantified by the likelihood

L (\hat{ϕ} (p, q), \hat{θ} (p, q), {\hat{σ}}_{w} (p, q))

, and (ii) how many parameters are needed for obtaining this likelihood, the Bayesian Information Criterion (BIC) [24]

B_{X} (p, q) = (p + q + 1) ln N - 2 ln L_{X} (\hat{ϕ} (p, q), \hat{θ} (p, q), {\hat{σ}}_{w} (p, q))

(5)

is often introduced. The best values p and q are the ones yielding the smallest

B_{X} (p, q)

. For the test series

{x_{t}^{(test)}}_{1 \leq t \leq N}

, we obtain the correct order

(p, d, q) = (2, 1, 2)

and coefficients values that, within the numerical uncertainties, agree with the values used for generating the series:

{\hat{ϕ}}_{1} = 0.49 \pm 0.03

,

{\hat{ϕ}}_{2} = - 0.101 \pm 0.005

,

{\hat{θ}}_{1} = 0.31 \pm 0.03

,

{\hat{θ}}_{2} = - 0.20 \pm 0.02

, and

{\hat{σ}}_{w}^{2} = 0.01

. The error associated to the variance of the Gaussian noise can be estimated as

S E ({\hat{σ}}_{w}^{2}) = {\hat{σ}}_{w}^{2} \sqrt{2 / (N - 1)} \sim 4 \times 10^{- 5}

.

With this estimation of the parameters, it is clear that the respective adapted ARIMA

(2, 1, 2)

model with

\hat{ϕ}

,

\hat{θ}

and

{\hat{σ}}_{w}

would give predictions within an uncertainty that almost agrees with the intrinsic noise of the ARIMA

(2, 1, 2)

process used for generating the test series. It is instructive to compare corresponding predictions with the simplest prediction model, the maximum persistence model with

{\hat{x}}_{t + 1} = x_{t}

. In Figure 2e, we show, as an example, 30 predicted data values of the adapted ARIMA

(2, 1, 2)

process and that of the maximum persistence model in comparison with additional data (

N = 10^{5} + k

,

k = 1, 2, \dots

) generated for the test series. In the right panel of this figure, the distributions of the deviations between predicted and generated data are shown. The smaller width of the distribution for the adapted ARIMA model shows that it significantly improves the prediction quality over the maximum persistence model.

To evaluate the accuracy of the method described above for determining p and q, we generated a number n of test series

{x_{t}^{(j)}}_{1 \leq t \leq N}

,

j = 1, \dots, n

, for the same ARIMA(2, 1, 2) model and determined the fraction of series, where the above analysis recovers the correct values. The so estimated accuracy will increase with the size N of the test series. This is shown in Figure 2f, where the accuracy, estimated from

n = 250

test series, is plotted as a function of N for the case

(p, d, q) = (2, 1, 2)

. It can be seen that for

N ≳ 10^{3}

, the accuracy becomes larger than 90%.

3. ARIMA Model for Wind Speed Measurements

We analyse data collected at the FINO1 platform, which is located about 45 km north from the island of Borkum in the North Sea (see Figure 1a). The wind speeds were measured over 20 months, from September 2015 to April 2017, at a sampling frequency of

1 Hz

. The FINO1 platform measures wind speed at eight different heights. Here, we consider the measurements at a height of 100 m, taken by a cup anemometer.

As illustrated in Figure 1b, the series

{v (m)}_{1 \leq m \leq M}

of 20 months wind speed data (

M ≅ 5.2 \times 10^{7}

) is transformed into a series of backward averages over

M_{0} = 900

values, corresponding to 15 min,

\bar{v} (t) = \frac{1}{M_{0}} \sum_{t^{'} = M_{0} (t - 1) + 1}^{t M_{0}} v (t^{'}), t = 1, 2, \dots, N ≅ 5.8 \times 10^{4} .

(6)

This series of backward-averaged values is shown in Figure 3a. All windows of 15 min data having more than 25% missing data (225 values) were disregarded. In total, this pre-processing resulted in only 65 missing

\bar{v}

values. The distribution of the

\bar{v}

, shown in Figure 3b, is well fitted by a Weibull form

ρ_{W} (\bar{v}) = \frac{k}{λ} {(\frac{\bar{v}}{λ})}^{k - 1} exp [- {(\bar{v} / λ)}^{k}],

(7)

with scale parameter

λ

and shape parameter k. By least square fitting, we obtained

λ = 10.17

m/s and

k = 2.07

, corresponding to a mean

μ_{\bar{v}} = 9.16

m/s and standard deviation

σ_{\bar{v}} = 4.76

m/s.

Because ARIMA models yield Gaussian distributed values, we consider a transformed variable u, which is a power law of the wind speed, i.e.,

u = {\bar{v}}^{α}

(8)

with

0 < α < 1

. An optimal value

α_{opt}

of

α

is determined by requiring the distribution of values in the series

{u (t)}_{1 \leq t \leq N} = {{[\bar{v} (t)]}^{α}}_{1 \leq t \leq N}

to be close to a Gaussian. The corresponding optimisation procedure is described in the Appendix A and yields

α_{opt} = 0.586 \pm 10^{- 3}

for the series

{\bar{v} (t)}_{1 \leq t \leq N}

. Figure 3c shows

{u (t)}_{1 \leq t \leq N}

and in Figure 3d the respective distribution of u values is plotted (symbols). The mean and standard deviation of this distribution are

μ_{u} = 3.53

(m/s)

^{α_{opt}}

and

σ_{u} = 1.14

(m/s)

^{α_{opt}}

. As can be seen from Figure 3d, the Gaussian distribution (dashed lines) with

μ_{u}

and

σ_{u}

fits the data (symbols) well.

Previous works have considered other values of

α

, e.g.,

α = 1 / 2

in Ref. [1] and

α = 1 / 3

in Ref. [26]. In general, the optimal exponent varies from one data sample to another. The optimisation procedure in the Appendix A gives

α_{opt}

in dependence of the Weibull parameters

λ

and k, and we provide a table of

α_{opt}

values for k and

λ

lying in ranges typical for wind speed distributions.

We now intend to adapt an ARIMA model to the transformed wind speed u. To this end, we consider time windows shorter than the full 20-month series for the following reasons: firstly, we want to validate whether there exists a typical order

(p, d, q)

of the ARIMA model. Secondly, for practical applications, one cannot expect data to be available over a period as large as 20 months. As illustrated above, cf. Figure 2f, about 1000 values are sufficient to estimate the parameters of the ARIMA model with good accuracy. We fix the size of the time window to

N_{s} = 3 \times 10^{3}

(approximately one month). This yields a total of

n_{s} ≃ 5.5 \times 10^{4}

training series

{u (t^{'} + t)}_{1 \leq t \leq N_{s}}

,

t^{'} = 1, \dots, n_{s}

. In case a training series contains some of the 65 missing

\bar{v}

-values (see above), we consider a slightly smaller

N_{s}

.

We now apply the methods described in the previous Section 2 to obtain ARIMA models for each of the training series. In all cases, we find

d = 1

. The analysis of the ACF and PACF was performed for the complete series

{u (t)}_{1 \leq t \leq N}

and yielded

τ_{p} = τ_{q} = 5

. The p and q values for the training series vary and their joint histogram is shown in Figure 4. In this histogram, a maximum occurs at

(p, q) = (0, 2)

. We therefore consider

(p, d, q) = (0, 1, 2)

as the order of the ARIMA model for predicting 15 min wind speed averages in our data set.

Next, we evaluate the power of the ARIMA

(0, 1, 2)

model to predict values

\hat{u} (t)

for

u (t)

from the series

{u (t^{'})}_{t - N_{s} \leq t^{'} \leq t - 1}

of

N_{s} = 3 \times 10^{3}

previous values. The previous values are used to determine the optimal model parameters

{\hat{θ}}_{1}

,

{\hat{θ}}_{2}

and

{\hat{σ}}_{w}

with the maximum likelihood estimator, and the ARIMA process with these optimal parameters gives the respective model series

{\hat{u} (t^{'})}_{t - N_{s} \leq t^{'} \leq t - 1}

. The prediction is obtained by setting

w_{t} = 〈 w_{t} 〉 = 0

in Equation (1), and by replacing

w_{t - 1}

and

w_{t - 2}

with the residuals

ϵ_{t - 1}

and

ϵ_{t - 2}

,

\hat{u} (t) = u (t - 1) + θ_{1} ϵ_{t - 1} + θ_{2} ϵ_{t - 2} = u (t - 1) + θ_{1} [u (t - 1) - \hat{u} (t - 1)] + θ_{2} [u (t - 2) - \hat{u} (t - 2)] .

(9)

The corresponding prediction for the 15 min averaged wind speed is

\hat{\bar{v}} (t) = \hat{u} {(t)}^{1 / α_{opt}}

.

Figure 5a shows the first 30 predictions

\hat{\bar{v}} (t)

(open circles),

t = 3001, \dots, 3030

, in comparison with the measured values

\bar{v} (t)

(full circles). Compared to the simple maximum persistence prediction,

{\hat{\bar{v}}}_{pers} (t + 1) = \bar{v} (t)

(crosses), it does not give a significant improvement. This can be clearly seen from Figure 5b, where the distribution of the deviations between each of the models and the real measurement is shown. We find in both cases an average of zero

(〈 {\hat{\bar{v}}}_{model} - \bar{v} 〉 ≃ 3 \times 10^{- 3})

and similar standard deviations,

σ_{(012)} = 0.7495

and

σ_{pers} = 0.7525

.

One could suppose that the predictive power of the ARIMA modelling becomes better, when taking higher orders

(p, q)

(with

d = 1

kept fixed). However, when performing an analysis with such models of higher order, we obtained results comparable to those shown in Figure 5.

To evaluate the performance of the ARIMA model, we consider the mean absolute percentual error (MAPE) defined as

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{{\hat{v}}_{i} - v_{i}}{\bar{v}}|

(10)

of the wind speed increments. Here,

{\hat{v}}_{i}

denotes the prediction of the ith measured value of the wind speeds, and

v_{i}

is the corresponding measured value. The mean of wind speeds is denoted by

\bar{v}

. Table 1 shows the results of our ARIMA

(0, 1, 2)

model together with the values obtained for other models. The MAPE of the ARIMA

(0, 1, 2)

process shows values comparable to the most accurate ARIMA models.

4. Nested ARIMA Model for Wind Speeds

Apart from the prediction of wind speeds based on previous values, it is an important issue to develop tools for generating surrogate wind speed data with statistical features that resemble key features of real data. This allows researchers to use such surrogate data in studies without access to real data, or when an averaging over a large amount of data is required. As mentioned in the Introduction, the distribution of wind speed increments

[\bar{v} (t + τ) - \bar{v} (t)]

for a time lag

τ

shows also strong deviations from a Gaussian. As the ARIMA

(0, 1, 2)

with fixed parameters

θ_{1}

,

θ_{2}

, and

σ_{w}

corresponds to a simple random walk, it is clear that the time variation of these parameters is important for obtaining non-Gaussian features for the increments. In the prediction procedure described in Section 3, the time variations is mediated by the optimal adaptation of the model parameters to the set of

N_{s} = 3 \times 10^{3}

previous, and assumed to be known real values.

For generating surrogate data, we now consider the corresponding time series of optimal parameters

{\hat{θ}}_{1} (t)

,

{\hat{θ}}_{2} (t)

, and

{\hat{σ}}_{w} (t)

. For convenient notation, we will write

θ_{1} (t)

,

θ_{2} (t)

, and

σ_{w} (t)

for these series, i.e., we drop the circumflex. As for

σ_{w} (t)

, we consider its (normalised) logarithm,

ζ (t) = log (\frac{σ_{w} (t)}{{\bar{σ}}_{w}}),

(11)

where

{\bar{σ}}_{w}

represent the average over all

σ_{w} (t)

. The use of the logarithm is motivated by a previous study [19], where the distribution of velocity increments in turbulent flows was successfully modelled by considering the velocity increments to follow a Gaussian process with fluctuating diffusivity (variance) that is log-normal distributed. Note also that the logarithm implies that

ζ (t)

can have positive and negative values. Figure 6 shows the time series of

θ_{1} (t)

,

θ_{2} (t)

and

ζ (t)

together with their respective histograms. These time series are now modelled also by ARIMA processes, resulting in a nested ARIMA model for generating surrogate data.

Applying the procedure described in Section 2, we find that

θ_{1}

is best described by an ARIMA

(3, 1, 1)

process, parameter

θ_{2}

by an ARIMA

(1, 1, 0)

process, and

ζ

by an ARIMA

(2, 1, 2)

process:

\begin{array}{l} (12a) & Δ θ_{1} (t) & = & ϕ_{1}^{(θ_{1})} Δ θ_{1} (t - 1) + ϕ_{2}^{(θ_{1})} Δ θ_{1} (t - 2) + ϕ_{3}^{(θ_{1})} Δ θ_{1} (t - 3) + η_{1}^{(θ_{1})} w_{t - 1}^{(θ_{1})} + w_{t}^{(θ_{1})}, \\ (12b) & Δ θ_{2} (t) & = & ϕ_{1}^{(θ_{2})} Δ θ_{2} (t - 1) + w_{t}^{(θ_{2})}, \\ (12c) & Δ ζ (t) & = & ϕ_{1}^{(ζ)} Δ ζ (t - 1) + ϕ_{2}^{(ζ)} Δ ζ (t - 2) + η_{1}^{(ζ)} w_{t - 1}^{(ζ)} + η_{2}^{(ζ)} w_{t - 2}^{(ζ)} + w_{t}^{(ζ)} . \end{array}

The coefficients

ϕ_{1}^{θ_{1}, θ_{2}, ζ}

,

ϕ_{2}^{θ_{1}, ζ}

,

ϕ_{3}^{θ_{1}}

,

η_{1}^{θ_{1}, ζ}

and

η_{2}^{ζ}

are given in Table 2 together with the standard deviations

σ_{w}^{θ_{1}, θ_{2}, ζ}

of the independent Gaussian noise terms

w_{t}^{(θ_{1}, θ_{2}, ζ)}

,

w_{t - 1}^{(θ_{1}, ζ)}

and

w_{t - 2}^{(ζ)}

(all having zero mean).

The nested ARIMA model that combines the ARIMA

(0, 1, 2)

model

Δ u_{t} = w_{t} + θ_{1} (t) w_{t - 1} + θ_{2} (t) w_{t - 2}

(13)

with Equations (12a)–(12c) provides transformed wind speed increments

Δ u

, whose increment statistics can now be analysed.

In Figure 7a–d we show the increment statistics of the wind speeds

v (t) = u {(t)}^{1 / α_{opt}}

,

Δ_{τ} v (t) = v (t + τ) - v (t),

(14)

for different time lags

τ = 0.25

, 2, 4 and 8 h (red dashed lines). In the same plot, we also show the corresponding increment statistics of the 15 min averages of measured wind speeds (solid black lines). For a time-lag of 15 min (Figure 7a), the simulated increment distribution follows approximately the distribution of the measured values, except close to the maximum where it is more sharply peaked. When the time-lag is increased, the measured wind speed increment distributions become more similar to a Gaussian distribution, as it has been reported earlier [27,28]. However, this feature is not reproduced by the modelled increment distributions, which exhibit a significant leptokurtic shape even for large time lags.

In Figure 7e,f, we show the skewness S and the kurtosis

κ

of the modelled and measured increment distributions as a function of the time lag

τ

. As the values for S are not strongly deviating from zero, both for the modelled and measured data, the distributions can be considered to be approximately symmetric. The kurtosis

κ

in contrast shows significant differences: while it approaches

κ = 3

of a Gaussian for very large time lags in the case of the measured increment distribution, the kurtosis for the modelled distribution remains nearly constant at a large value of about nine for all lags.

Based on these findings, one can not expect the full distributions of the measured and modelled velocity increments to match. Indeed, performing a Kolmogorov–Smirnov test retrieves a rejection when setting a significance level of 5%. Therefore, modelled and measured distributions are not the same. Nevertheless, we consider the nested ARIMA model as a first step for modelling the intermittent character reflected in the non-Gaussian tails of the increment distributions. Possible improvements of this approach are discussed below.

5. Conclusions

In this manuscript, we applied ARIMA models to a series of 15 min of the wind speeds measured at the North Sea. We focused on two problems. The first was to evaluate the power of these models for wind speed forecasting. The second was to ascertain the ability of these models to generate surrogate data for wind speeds. To overcome the problem that ARIMA models yield Gaussian distributions, while measured wind speed distributions resemble a Weibull form, a power law transformation was applied. Different from earlier studies, which used ad hoc values for the exponent in the transformation, we here determined an optimal value.

The evaluation of the predictive power was based on a subdivision of the time series into moving windows of about one month. We found that the best predictive ARIMA model for the set of wind speeds is ARIMA

(0, 1, 2)

, though showing no significantly better accuracy than a simple maximum persistence model. In fact, our results showed also that predictions of the maximum persistence model could not be significantly improved by using ARIMA models of higher order.

With respect to generating surrogate data, we applied the ARIMA(0, 1, 2) model to each of the moving time windows. Thereby, time series of the coefficients and the noise parameter are obtained, which were also modelled by ARIMA processes. We called this combination of ARIMA processes a nested ARIMA model. Our results provide evidence that this nested ARIMA model is able to approximately reproduce wind speed increments yielding strong non-Gaussian features. In particular, the tails of the increment distributions could be nearly recovered for time lags below two hours. Consequently, within this time window, the nested ARIMA model can be used as a simple tool for generating surrogate wind speed data, for example, as input data for simulations of power grid dynamics under fluctuating wind power injection [29,30].

For larger time lags, the nested ARIMA model is not well suited for generating surrogates of wind speed data, showing pronounced deviations between the modelled and measured increment distributions. In particular, we observed a considerably large leptokurtic shape of the modelled distributions. These findings suggest to favor a modeling of wind speed data by alternative approaches, such as stochastic differential equations [22,31,32]. Differential equation approaches have proven helpful in minimising the amount of input data [33] in situations where large data sets are needed for training neural networks. Moreover, it has been shown that, to reconstruct stochastic series of wind tower vibrations, artificial neural networks retrieve accurate estimates of the mean and standard deviation of the tower vibration increments [34]. However, higher order moments, namely skewness and kurtosis, are better reconstructed with differential equation approaches.

Alternatively, one could try to further improve the modelling based on ARIMA processes. For instance, instead of using the power-law transformation of the wind speed, it could be better to apply a similar transformation to the wind speed increments. This in particular may avoid to obtain increment distributions with a too strong leptokurtic shape. Furthermore, the non-Gaussian character of the parameter evolution (cf. Figure 5) could also be mitigated by applying some optimised nonlinear transformation to them.

Author Contributions

P.G.L. conceived the simulations; P.M. and P.G.L. designed them and S.-K.S. performed them and analyzed the output; All authors wrote the paper.

Funding

This research was funded by the Deutsche Forschungsgemeinschaft under the grants MA 1636/9-1 and LI 1599/3-1.

Acknowledgments

Financial support from the Deutsche Forschungsgemeinschaft (MA 1636/9-1) and from the bilateral cooperation FAPERJ-DFG (LI 1599/3-1) is gratefully acknowledged. The authors also thank the FINO-project and the Forschungs- und Entwicklungszentrum Fachhochschule Kiel GmbH for providing the data sets.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Optimal Non-Linear Variable Transformation of Wind Speeds into a Gaussian Variable

In previous studies, the square- or cubic-root of the wind velocity v was considered to be Gaussian-distributed [1,26]. In this appendix, we derive a optimal value

α_{opt}

of the exponent

α \in (0, 1]

in the transformation

u = v^{α}

(A1)

that best approximates the distribution of u to a Gaussian, if v is distributed according to a Weibull form

ρ_{W} (v)

(see Equation (7)). We show how the optimal exponent changes for different combinations of parameters k and

λ

of the Weibull distribution and provide a table that can be used for other data sets of wind speed measurements.

As wind speeds v are always positive, u should also be positive. Therefore, we consider the truncated Gaussian distribution

ρ_{G} (u) = \frac{2}{1 + erf (\frac{μ}{\sqrt{2 σ^{2}}})} \frac{1}{\sqrt{2 π σ^{2}}} exp (- \frac{{(u - μ)}^{2}}{2 σ^{2}}),

(A2)

for

u \geq 0

. In case of an exact transformation to a Weibull distribution, we would have

ρ_{W} (v) d v = ρ_{G} (u) d u

or

F_{W} (v) = F_{G} (u) = F_{G} (v^{α})

, where

\begin{matrix} F_{W} (v) & = \int_{0}^{v} ρ_{W} (v^{'}) d v^{'} = 1 - exp (- {(\frac{v}{λ})}^{k}), \end{matrix}

(A3)

\begin{matrix} F_{G} (u) & = \int_{0}^{u} ρ_{G} (u^{'}) d u^{'} = \frac{erf (\frac{u - μ}{\sqrt{2 σ^{2}}}) + erf (\frac{μ}{\sqrt{2 σ^{2}}})}{1 + erf (\frac{μ}{\sqrt{2 σ^{2}}})} \end{matrix}

(A4)

are the cumulative distributions of the Weibull and truncated Gaussian, respectively.

An optimal exponent

α_{opt}

can be determined by considering a distance measure between

F_{W} (v)

and

F_{G} (v^{α})

, and by minimising this distance with respect to

α

. One possibility would be to take the maximum distance

{sup}_{v} | F_{W} (v) - F_{G} (v^{α}) |

as suggested by the Kolmogorov–Smirnov statistic. Here, we use the

L^{2}

-norm (integral over squared deviations)

Λ (α) = \int_{0}^{\infty} {[F_{W} (v) - F_{G} (v^{α})]}^{2} d v

(A5)

as the distance measure. The determining equation for the optimal exponent

α_{opt}

then is

\frac{d Λ}{d α} (α_{opt}) = 2 \int_{0}^{\infty} [F_{W} (v) - F_{G} (v^{α_{opt}})] \frac{d F_{G} (v^{α_{opt}})}{d α} d v = 0,

(A6)

where

\frac{d F_{G}}{d α} = \frac{d F_{G}}{d u} \frac{d u}{d α} + \frac{d F_{G}}{d μ} \frac{d μ}{d α} + \frac{d F_{G}}{d σ^{2}} \frac{d σ^{2}}{d α}

(A7)

and the functions

μ = μ (α)

and

σ^{2} = σ^{2} (α)

are given by

\begin{array}{l} (A8a) & μ (α) & = & \int_{0}^{\infty} u ρ_{G} (u) d u = \int_{0}^{\infty} v^{α} ρ_{W} (v) d v = λ^{α} Γ (\frac{α}{k} + 1), \\ (A8b) & σ^{2} (α) & = & \int_{0}^{\infty} {(v^{α} - μ)}^{2} ρ_{W} (v) d v = λ^{2 α} Γ (\frac{2 α}{k} + 1) - μ^{2} . \end{array}

The terms in Equation (A7) thus are:

\begin{array}{l} (A9a) & \frac{d F_{G}}{d u} & = & \frac{1}{1 + erf (\frac{μ}{\sqrt{2 σ^{2}}})} \sqrt{\frac{2}{π σ^{2}}} exp (- \frac{{(v^{α} - μ)}^{2}}{2 σ^{2}}), \\ (A9b) & \frac{d u}{d α} & = & α v^{α - 1}, \\ (A9c) & \frac{d F_{G}}{d μ} & = & \sqrt{\frac{2}{π σ^{2}}} \frac{exp (\frac{μ^{2}}{2 σ^{2}}) (1 - erf (\frac{v^{α} - μ}{\sqrt{2 σ^{2}}})) - exp (\frac{{(v^{α} - μ)}^{2}}{2 σ^{2}}) (1 + erf (\frac{μ}{\sqrt{2 σ^{2}}}))}{{(1 + erf (\frac{μ}{\sqrt{2 σ^{2}}}))}^{2}}, \\ (A9d) & \frac{d μ}{d α} & = & α λ^{α - 1} Γ (\frac{α - 1}{k} + 1), \\ (A9e) & \frac{d F_{G}}{d σ^{2}} & = & \frac{2 (μ - u) exp (\frac{{(v^{α} - μ)}^{2}}{2 σ^{2}}) (1 + erf (\frac{μ}{\sqrt{2 σ^{2}}})) + \frac{μ}{\sqrt{2}} (1 - erf (\frac{v^{α} - μ}{\sqrt{2 σ^{2}}}))}{\sqrt{π σ^{3}} {(1 + erf (\frac{μ}{\sqrt{2 σ^{2}}}))}^{2}}, \\ (A9f) & \frac{d σ^{2}}{d α} & = & 2 α Γ (\frac{α - 1}{k} + 1) (λ^{2 α + 1} + μ λ^{α - 1}) . \end{array}

Substituting Equation (A9) into (A7), and the result into Equation (A6), yields a nonlinear equation that can be solved with respect to

α

for given parameters k and

λ

specifying the Weibull distribution.

For wind speed time series, the parameters of the associated Weibull distribution vary typically in the ranges

1.5 < k < 4

and

5 < λ < 20

(m/s). In Figure A1a, the optimal exponent is plotted as a function of both Weibull parameters in these ranges, and, in Figure A1b, we show the corresponding value of

Λ (α)

. It can be seen that the transformation parameter

α_{opt}

strongly depends on k, but is almost insensitive to the scale parameter

λ

. The minimum of the distance measure,

Λ (α_{opt})

, quantifies the quality of the transformation. The larger the minimum, the more the transformed Weibull distribution deviates from a truncated Gaussian distribution. The quality of the transformation increases with decreasing

λ

, i.e., series with smaller average wind speeds transform better to a truncated Gaussian. The

α_{opt}

values for different pairs of k and

λ

are listed in Table A1. These can be used for other series of wind speeds.

References

Brown, B.; Katz, R.; Murphy, A. Time series models to simulate and forecast wind speed and wind power. J. Clim. Appl. Meteorol. 1984, 23, 1184–1195. [Google Scholar] [CrossRef]
Ghadikolaei, H.; Ahmadi, A.; Aghaei, J.; Najafi, M. Risk constrained self-scheduling of hydro/wind units for short term electricity markets considering intermittency and uncertainty. Renew. Sustain. Energy Rev. 2012, 16, 4734–4743. [Google Scholar] [CrossRef]
Ediger, V.; Akar, S. ARIMA forecasting of primary energy demand by fuel in Turkey. Energy Policy 2007, 25, 667–676. [Google Scholar] [CrossRef]
Chen, P.; Pedersen, T.; Bak-Jensen, B.; Chen, Z. ARIMA-Based Time Series Model of Stochastic Wind Power Generation. IEEE Trans. Power Syst. 2010, 25, 667–676. [Google Scholar] [CrossRef]
Yunus, K.; Thiringer, T.; Chen, P. ARIMA-Based Frequency-Decomposed Modeling of Wind Speed Time Series. IEEE Trans. Power Syst. 2016, 31, 2546–2556. [Google Scholar] [CrossRef]
Lau, A.; Mcsharry, P. Approaches for multi-step density forecasts with application to aggregated wind power. Ann. Appl. Stat. 2010, 4, 1311–1341. [Google Scholar] [CrossRef]
Kadhem, A.; Wahab, N.; Aris, I.; Jasni, J.; Abdalla, A. Advanced Wind Speed Prediction Model Based on a Combination of Weibull Distribution and an Artificial Neural Network. Energies 2017, 10, 1744. [Google Scholar] [CrossRef]
Cadenas, E.; Rivera, W.; Campos-Amezcua, R.; Heard, C. Wind Speed Prediction Using a Univariate ARIMA Model and a Multivariate NARX Model. Energies 2016, 9, 109. [Google Scholar] [CrossRef]
Cao, Q.; Ewing, B.; Thompson, M. Forecasting wind speed with recurrent neural networks. Eur. J. Oper. Res. 2012, 221, 148–154. [Google Scholar] [CrossRef]
Shumway, R.; Stoffer, D. Time Series Analysis and Its Applications with R Examples; Springer: Pittsburgh, PA, USA, 2006. [Google Scholar]
Zhao, E.; Zhao, J.; Liu, L.; Su, Z.; An, N. Hybrid Wind Speed Prediction Based on a Self-Adaptive ARIMAX Model with an Exogenous WRF Simulation. Energies 2016, 9, 7. [Google Scholar] [CrossRef]
Han, Q.; Wu, H.; Hu, T.; Chu, F. Short-Term Wind Speed Forecasting Based on Signal Decomposing Algorithm and Hybrid Linear/Nonlinear Models. Energies 2018, 11, 2976. [Google Scholar] [CrossRef]
Hong, Y.Y.; Yu, T.H.; Liu, C.Y. Hour-Ahead Wind Speed and Power Forecasting Using Empirical Mode Decomposition. Energies 2013, 6, 6137–6152. [Google Scholar] [CrossRef]
Alencar, D.; de Mattos Affonso, C.; Oliveira, R.; Moya Rodríguez, J.; Leite, J.; Reston Filho, J.C. Different Models for Forecasting Wind Power Generation: Case Study. Energies 2017, 10, 1976. [Google Scholar] [CrossRef]
Johnson, G. Wind Energy Systems; Prentice-Hall: Englewood Cliffs, NJ, USA, 1998. [Google Scholar]
van Kuik, G.; Peinke, J. Long-term research challenges in wind energy—A research agenda by the European Academy of Wind Energy. Wind Energy Sci. 2016, 1, 1–39. [Google Scholar] [CrossRef]
FINO I Project and Database. The FINO Project Is Supported by the German Government through BMWi and PTJ. 2016. Available online: http://www.bsh.de (accessed on 24 May 2017).
Beck, C.; Cohen, E. Superstatistics. Physica A 2003, 322, 267–275. [Google Scholar] [CrossRef]
Castaing, B.; Gagne, Y.; Hopfinger, E. Velocity Probability Density Functions of High Reynolds Number Turbulence. Physica D 1990, 46, 177–200. [Google Scholar] [CrossRef]
Rocha, P.; Raischel, F.; Boto, J.; Lind, P. Uncovering the evolution of non-stationary stochastic variables: The example of asset volume-price fluctuations. Phys. Rev. E 2016, 93, 052122. [Google Scholar] [CrossRef] [PubMed]
Estevens, J.; Rocha, P.; Boto, J.; Lind, P. Stochastic modelling of non-stationary financial assets. Chaos 2017, 27, 113106. [Google Scholar] [CrossRef]
Milan, P.; Wächter, M.; Peinke, J. Stochastic modeling and performance monitoring of wind farm power production. J. Renew. Sustain. Energy 2014, 6, 033119. [Google Scholar] [CrossRef]
Lind, P.; Herráez, I.; Wächter, M.; Peinke, J. Fatigue Load Estimation through a Simple Stochastic Model. Energies 2014, 7, 8279–8293. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Brockwell, P.; Davis, R. Time Series: Theory and Methods; Springer: New York, NY, USA, 2009. [Google Scholar]
Essenwanger, O. Probleme der Windstatistik. Meteorol. Rundsch. 1959, 12, 37–47. [Google Scholar]
Friedrich, R.; Peinke, J. Description of a Turbulent Cascade by a Fokker-Planck Equation. Phys. Rev. Lett. 1997, 78, 863. [Google Scholar] [CrossRef]
Ragwitz, M.; Kantz, H. Indispensable Finite Time Corrections for Fokker-Planck Equations from Time Series Data. Phys. Rev. Lett. 2001, 87, 254501. [Google Scholar] [CrossRef] [PubMed]
Weber, J.; Zachow, C.; Witthaut, D. Modeling long correlation times using additive binary Markov chains: Applications to wind generation time series. Phys. Rev. E 2018, 97, 032138. [Google Scholar] [CrossRef] [PubMed]
Schäfer, B.; Beck, C.; Aihara, K.; Witthaut, D.; Timme, M. Non-Gaussian power grid frequency fluctuations characterized by Lévy-stable laws and superstatistics. Nat. Energy 2018, 3, 119–126. [Google Scholar] [CrossRef]
Morales, A.; Peinke, J. Assesment of turbulence by high-order statistics Offshore example. In Proceedings of the EWEA Proceedings, Copenhagen, Denmark, 15–19 April 2012; pp. 1–4. [Google Scholar]
Mücke, T.; Kleinhans, D.; Peinke, J. Atmospheric turbulence and its influence on the alternating loads on wind turbines. Wind Energy 2010, 14, 301–316. [Google Scholar] [CrossRef]
Raischel, F.; Russo, A.; Haase, M.; Kleinhans, D.; Lind, P. Optimal variables for describing evolution of NO₂ concentration. Phys. Lett. A 2012, 376, 2081–2089. [Google Scholar] [CrossRef]
Lind, P.; Vera-Tudela, L.; Wächter, M.; Kühn, M.; Peinke, J. Normal Behaviour Models for Wind Turbine Vibrations: Comparison of Neural Networks and a Stochastic Approach. Energies 2017, 10, 1944. [Google Scholar] [CrossRef]

Figure 1. (a) FINO1 platform, located at Alpha Ventus wind farm at Borkum West in the North Sea (54.3

^{o}

N–6.5

^{o}

W). Source: © Forschungs- und Entwicklungszentrum Fachhochschule Kiel GmbH (Kiel, Germany). (b) Illustration of the series of wind speed measurements (solid line) taken at the height of 100 m with a sampling frequency of 1 Hz. Symbols indicate 15 min average analysed in this manuscript.

Figure 1. (a) FINO1 platform, located at Alpha Ventus wind farm at Borkum West in the North Sea (54.3

^{o}

N–6.5

^{o}

W). Source: © Forschungs- und Entwicklungszentrum Fachhochschule Kiel GmbH (Kiel, Germany). (b) Illustration of the series of wind speed measurements (solid line) taken at the height of 100 m with a sampling frequency of 1 Hz. Symbols indicate 15 min average analysed in this manuscript.

Figure 2. Illustration of an ARIMA model adaptation to a test series

{x_{t}^{(test)}}_{1 \leq t \leq N}

generated by iterating Equation (1) for

(p, d, q) = (2, 1, 2)

with initial conditions

x_{0} = x_{1} = 0

. In (a), the series

{x_{t}^{(test)}}_{1 \leq t \leq N}

is shown together with the corresponding histogram

ρ (x)

. Because the process

x (t)

is not stationary, the series

{Δ x_{t}^{(test)}}_{1 \leq t \leq N}

of first differences (

d = 1

) is considered. It is plotted in (b) with the corresponding histogram

ρ (Δ x)

. This series fulfils the unit-root test [10] and we show in (c) its autocorrelation function (ACF) and in (d) its partial autocorrelation function (PACF). From these data, the range

0 \leq p \leq τ_{p}

and

0 \leq q \leq τ_{q}

of possible

(p, q)

values are estimated (see text). Within these ranges, the Bayesian Information Criterion (BIC) is used to obtain an optimal

(p, q)

choice. (e) 30 pieces of predicted data from the ARIMA model adapted to the test series and the model of maximum persistence in comparison with the values generated by the ARIMA(2, 1, 2) process. In the right panel, the distributions of the deviations between the predicted and generated values are shown. (f) Accuracy of the prediction of the parameters

(p, d, q)

dependent on the size N of the time series. This accuracy is given by the percentage of correct estimates, when generating

n = 250

test series with N data points by using Equation (1).

Figure 2. Illustration of an ARIMA model adaptation to a test series

{x_{t}^{(test)}}_{1 \leq t \leq N}

generated by iterating Equation (1) for

(p, d, q) = (2, 1, 2)

with initial conditions

x_{0} = x_{1} = 0

. In (a), the series

{x_{t}^{(test)}}_{1 \leq t \leq N}

is shown together with the corresponding histogram

ρ (x)

. Because the process

x (t)

is not stationary, the series

{Δ x_{t}^{(test)}}_{1 \leq t \leq N}

of first differences (

d = 1

) is considered. It is plotted in (b) with the corresponding histogram

ρ (Δ x)

. This series fulfils the unit-root test [10] and we show in (c) its autocorrelation function (ACF) and in (d) its partial autocorrelation function (PACF). From these data, the range

0 \leq p \leq τ_{p}

and

0 \leq q \leq τ_{q}

of possible

(p, q)

values are estimated (see text). Within these ranges, the Bayesian Information Criterion (BIC) is used to obtain an optimal

(p, q)

choice. (e) 30 pieces of predicted data from the ARIMA model adapted to the test series and the model of maximum persistence in comparison with the values generated by the ARIMA(2, 1, 2) process. In the right panel, the distributions of the deviations between the predicted and generated values are shown. (f) Accuracy of the prediction of the parameters

(p, d, q)

dependent on the size N of the time series. This accuracy is given by the percentage of correct estimates, when generating

n = 250

test series with N data points by using Equation (1).

Figure 3. (a) series of the 15 min backward averages

\bar{v}

and (b) their distribution (marked with symbols), which is well fitted by a Weibull form (solid line) with scale parameter

λ = 10.17

m/s and shape parameter

k = 2.07

, corresponding to a mean

μ_{\bar{v}} = 9.16

m/s and standard deviation

σ_{\bar{v}} = 4.76

m/s. (c) Series of the transformed variable

u (t) = {[\bar{v} (t)]}^{α_{opt}}

with

α_{opt} = 0.586

, and (d) its distribution, which is well fitted by a Gaussian with mean

μ_{u} = 3.53

(m/s)

^{α_{opt}}

and standard deviation

σ_{u} = 1.14

(m/s)

^{α_{opt}}

.

Figure 3. (a) series of the 15 min backward averages

\bar{v}

and (b) their distribution (marked with symbols), which is well fitted by a Weibull form (solid line) with scale parameter

λ = 10.17

m/s and shape parameter

k = 2.07

, corresponding to a mean

μ_{\bar{v}} = 9.16

m/s and standard deviation

σ_{\bar{v}} = 4.76

m/s. (c) Series of the transformed variable

u (t) = {[\bar{v} (t)]}^{α_{opt}}

with

α_{opt} = 0.586

, and (d) its distribution, which is well fitted by a Gaussian with mean

μ_{u} = 3.53

(m/s)

^{α_{opt}}

and standard deviation

σ_{u} = 1.14

(m/s)

^{α_{opt}}

.

Figure 4. Joint histogram of the model parameters p and q obtained from an analysis of

n_{s} = 5.5 \times 10^{4}

training series

{u (t^{'} + t)}_{1 \leq t \leq N_{s}}

,

t^{'} = 1, \dots, n_{s}

of length

N_{s} = 3 \times 10^{3}

.

Figure 4. Joint histogram of the model parameters p and q obtained from an analysis of

n_{s} = 5.5 \times 10^{4}

training series

{u (t^{'} + t)}_{1 \leq t \leq N_{s}}

,

t^{'} = 1, \dots, n_{s}

of length

N_{s} = 3 \times 10^{3}

.

Figure 5. (a) Wind speed forecasts 15 min ahead using an ARIMA

(0, 1, 2)

model (open circles) and the maximum persistence model (

{\hat{\bar{v}}}_{pers} (t + 1) = \bar{v} (t)

, red squares) compared with the measured values (full circles). (b) Distribution of the deviations between predicted and measured values for both the ARIMA

(0, 1, 2)

model (black line) and the maximum persistence model (red dashed line).

Figure 5. (a) Wind speed forecasts 15 min ahead using an ARIMA

(0, 1, 2)

model (open circles) and the maximum persistence model (

{\hat{\bar{v}}}_{pers} (t + 1) = \bar{v} (t)

, red squares) compared with the measured values (full circles). (b) Distribution of the deviations between predicted and measured values for both the ARIMA

(0, 1, 2)

model (black line) and the maximum persistence model (red dashed line).

Figure 6. Series of (a) the coefficient

θ_{1}

of the ARIMA

(0, 1, 2)

model and (b) histogram of the

θ_{1}

values. In (c,d), one shows similar plots for

θ_{2}

. In (e,f) the time series and histogram of the noise strength

ζ

are shown (see Equation (11)).

Figure 6. Series of (a) the coefficient

θ_{1}

of the ARIMA

(0, 1, 2)

model and (b) histogram of the

θ_{1}

values. In (c,d), one shows similar plots for

θ_{2}

. In (e,f) the time series and histogram of the noise strength

ζ

are shown (see Equation (11)).

Figure 7. (a–d) Increment statistics of the original wind speed measurements,

Δ_{τ} v

, for different time-lags

τ = 0.25

, 2, 4, 8 and 16 h. The solid (black) lines indicate the measurements, while dashed (red) lines indicate the respective surrogate data set obtained with the nested ARIMA model for the set of wind speeds. (e) Skewness and (f) kurtosis of the wind speed increments as a function of the time-lag

τ

.

Figure 7. (a–d) Increment statistics of the original wind speed measurements,

Δ_{τ} v

, for different time-lags

τ = 0.25

, 2, 4, 8 and 16 h. The solid (black) lines indicate the measurements, while dashed (red) lines indicate the respective surrogate data set obtained with the nested ARIMA model for the set of wind speeds. (e) Skewness and (f) kurtosis of the wind speed increments as a function of the time-lag

τ

.

Figure A1. (a) Optimal exponent

α_{opt}

as a function of the shape parameter k and the scale parameter

λ

of the Weibull distribution. (b) Minimal value

Λ (α_{opt})

of the distance measure in Equation (A5) associated with the optimal exponent.

Figure A1. (a) Optimal exponent

α_{opt}

as a function of the shape parameter k and the scale parameter

λ

of the Weibull distribution. (b) Minimal value

Λ (α_{opt})

of the distance measure in Equation (A5) associated with the optimal exponent.

Table 1. Performance of the ARIMA(0,1,2) model, using the metric defined in Equation (10), in comparison with different data sets and models reported in [14].

Model	MAPE
ARIMA $(0, 1, 2)$ (Speed, 15 min.)	5.53%
ARIMA (Speed, 10 min.)	17.61%
ARIMA (Speed, 20 min.)	20.59%
Neural Network (Speed, 10 min.)	5.97%
Neural Network (Speed, 20 min.)	7.03%
ARIMA with Neural Net. (Speed, 10 min.)	4.46%
ARIMA with Neural Net. (Speed, 20 min.)	5.31%

Table 2. Estimated coefficients entering Equations (12a)–(12c) of the ARIMA processes describing the stochastic evolution of the coefficients in Equations (11) and (13). Notice that each coefficient is modelled with an ARIMA process of different order: ARIMA

(3, 1, 1)

for

θ_{1}

, ARIMA

(1, 1, 0)

for

θ_{2}

, and ARIMA

(2, 1, 2)

for

ζ

.

Table 2. Estimated coefficients entering Equations (12a)–(12c) of the ARIMA processes describing the stochastic evolution of the coefficients in Equations (11) and (13). Notice that each coefficient is modelled with an ARIMA process of different order: ARIMA

(3, 1, 1)

for

θ_{1}

, ARIMA

(1, 1, 0)

for

θ_{2}

, and ARIMA

(2, 1, 2)

for

ζ

.

Parameter	$θ_{1}$	$θ_{2}$	$ζ$
$ϕ_{1}$	0.976	0.0580	1.47
$ϕ_{2}$	0.0255	–	−0.477
$ϕ_{3}$	−0.0191	–	–
$η_{1}$	−0.969	–	−1.29
$η_{2}$	–	–	0.312
$σ_{w}^{2}$ ( $\times 10^{- 7}$ )	9.91	7.19	7.46

Table A1. Optimal exponent

α_{opt}

for k and

λ

values in the ranges

1.5 \leq k \leq 4

and

5 \leq λ \leq 20

.

Table A1. Optimal exponent

α_{opt}

for k and

λ

values in the ranges

1.5 \leq k \leq 4

and

5 \leq λ \leq 20

.

$λ / k$	$1.5$	$1.6$	$1.7$	$1.8$	$1.9$	2	$2.1$	$2.2$	$2.3$	$2.4$	$2.5$	$2.6$	$2.7$
$5$	0.425	0.451	0.478	0.504	0.529	0.556	0.583	0.609	0.633	0.657	0.687	0.712	0.737
$5.3$	0.426	0.452	0.479	0.505	0.530	0.557	0.584	0.609	0.634	0.658	0.688	0.713	0.738
$5.6$	0.426	0.453	0.479	0.506	0.531	0.558	0.585	0.610	0.635	0.663	0.689	0.714	0.738
$5.9$	0.427	0.453	0.480	0.506	0.532	0.559	0.585	0.611	0.636	0.664	0.689	0.715	0.739
$6.2$	0.428	0.454	0.481	0.507	0.533	0.560	0.586	0.612	0.636	0.665	0.690	0.715	0.740
$6.5$	0.428	0.455	0.481	0.508	0.533	0.560	0.587	0.612	0.637	0.665	0.691	0.716	0.740
$6.8$	0.429	0.455	0.482	0.508	0.534	0.561	0.587	0.613	0.637	0.666	0.691	0.716	0.741
$7.1$	0.429	0.456	0.482	0.509	0.534	0.561	0.588	0.613	0.638	0.666	0.692	0.717	0.741
$7.4$	0.430	0.456	0.483	0.509	0.535	0.562	0.588	0.614	0.638	0.667	0.692	0.717	0.741
$7.7$	0.430	0.456	0.483	0.510	0.535	0.562	0.589	0.614	0.639	0.667	0.693	0.718	0.742
$8$	0.431	0.457	0.484	0.510	0.536	0.562	0.589	0.615	0.639	0.668	0.693	0.718	0.742
$8.3$	0.431	0.457	0.484	0.510	0.536	0.563	0.590	0.615	0.642	0.668	0.694	0.718	0.742
$8.6$	0.431	0.457	0.484	0.511	0.537	0.563	0.590	0.615	0.642	0.668	0.694	0.719	0.742
$8.9$	0.432	0.458	0.485	0.511	0.537	0.563	0.590	0.616	0.643	0.669	0.694	0.719	0.742
$9.2$	0.432	0.458	0.485	0.511	0.537	0.564	0.591	0.616	0.643	0.669	0.695	0.719	0.742
$9.5$	0.432	0.458	0.485	0.512	0.538	0.564	0.591	0.616	0.643	0.669	0.695	0.719	0.743
$9.8$	0.432	0.459	0.486	0.512	0.538	0.564	0.591	0.616	0.644	0.670	0.695	0.719	0.743
$10.1$	0.433	0.459	0.486	0.512	0.538	0.564	0.591	0.617	0.644	0.670	0.695	0.720	0.748
$10.4$	0.433	0.460	0.486	0.512	0.538	0.564	0.592	0.617	0.644	0.670	0.696	0.720	0.749
$10.7$	0.433	0.460	0.486	0.513	0.539	0.565	0.592	0.617	0.645	0.670	0.696	0.720	0.749
$11$	0.433	0.460	0.487	0.513	0.539	0.565	0.592	0.617	0.645	0.671	0.696	0.720	0.749
$11.3$	0.434	0.460	0.487	0.513	0.539	0.565	0.592	0.617	0.645	0.671	0.696	0.720	0.749
$11.6$	0.434	0.460	0.487	0.513	0.539	0.565	0.592	0.618	0.645	0.671	0.696	0.720	0.749
$11.9$	0.433	0.461	0.487	0.513	0.540	0.566	0.593	0.618	0.645	0.671	0.696	0.720	0.750
$12.2$	0.434	0.461	0.487	0.514	0.540	0.567	0.593	0.618	0.646	0.671	0.696	0.721	0.750
$12.5$	0.434	0.461	0.487	0.514	0.540	0.567	0.593	0.618	0.646	0.672	0.697	0.721	0.750
$12.8$	0.434	0.461	0.488	0.514	0.540	0.567	0.593	0.618	0.646	0.672	0.697	0.721	0.750
$13.1$	0.435	0.461	0.488	0.514	0.540	0.567	0.593	0.620	0.646	0.672	0.697	0.725	0.750
$13.4$	0.435	0.461	0.488	0.514	0.540	0.567	0.593	0.620	0.646	0.672	0.697	0.725	0.751
$13.7$	0.435	0.461	0.488	0.514	0.541	0.567	0.594	0.620	0.646	0.672	0.697	0.725	0.751
$14$	0.435	0.462	0.488	0.514	0.541	0.568	0.594	0.620	0.646	0.672	0.697	0.725	0.751
$14.3$	0.435	0.462	0.488	0.514	0.541	0.568	0.594	0.620	0.647	0.672	0.697	0.725	0.751
$14.6$	0.435	0.462	0.488	0.515	0.541	0.568	0.594	0.621	0.647	0.673	0.697	0.725	0.751
$14.9$	0.435	0.462	0.488	0.515	0.541	0.568	0.594	0.621	0.647	0.673	0.698	0.725	0.751
$15.2$	0.435	0.462	0.489	0.515	0.541	0.568	0.594	0.621	0.647	0.673	0.698	0.726	0.751
$15.5$	0.435	0.462	0.489	0.515	0.541	0.568	0.594	0.621	0.647	0.673	0.698	0.726	0.751
$15.8$	0.436	0.462	0.489	0.515	0.541	0.568	0.594	0.621	0.647	0.673	0.698	0.726	0.752
$16.1$	0.436	0.462	0.489	0.515	0.542	0.568	0.594	0.621	0.647	0.673	0.698	0.726	0.752
$16.4$	0.436	0.462	0.489	0.515	0.542	0.568	0.595	0.621	0.647	0.673	0.699	0.726	0.752
$16.7$	0.436	0.462	0.489	0.515	0.542	0.569	0.595	0.621	0.648	0.673	0.699	0.726	0.752
$17$	0.436	0.463	0.489	0.515	0.542	0.569	0.595	0.621	0.648	0.673	0.698	0.726	0.752
$17.3$	0.436	0.463	0.489	0.515	0.542	0.569	0.595	0.622	0.648	0.673	0.698	0.726	0.752
$17.6$	0.436	0.463	0.489	0.516	0.542	0.569	0.599	0.622	0.648	0.673	0.698	0.726	0.752
$17.9$	0.436	0.463	0.489	0.516	0.542	0.569	0.595	0.622	0.648	0.674	0.698	0.727	0.752
$18.2$	0.436	0.463	0.489	0.516	0.542	0.569	0.595	0.622	0.648	0.674	0.701	0.727	0.752
$18.5$	0.436	0.463	0.489	0.516	0.542	0.569	0.595	0.622	0.648	0.674	0.701	0.727	0.752
$18.8$	0.436	0.463	0.489	0.516	0.542	0.569	0.595	0.622	0.648	0.674	0.701	0.727	0.752
$19.1$	0.436	0.463	0.490	0.516	0.542	0.569	0.595	0.622	0.648	0.674	0.701	0.727	0.753
$19.4$	0.436	0.463	0.490	0.516	0.543	0.569	0.595	0.622	0.648	0.674	0.701	0.727	0.753
$19.7$	0.436	0.463	0.490	0.516	0.543	0.569	0.595	0.622	0.648	0.674	0.701	0.727	0.753
$20$	0.436	0.463	0.490	0.516	0.543	0.570	0.595	0.622	0.648	0.674	0.701	0.727	0.753
$λ / k$	$2.8$	$2.9$	$3$	$3.1$	$3.2$	3.3	$3.4$	$3.5$	$3.6$	$3.7$	$3.8$	$3.9$	$4$
$5$	0.761	0.783	0.803	0.840	0.863	0.884	0.903	0.920	0.934	0.945	0.955	0.963	0.970
$5.3$	0.761	0.783	0.804	0.841	0.864	0.884	0.903	0.920	0.933	0.945	0.954	0.962	0.968
$5.6$	0.762	0.784	0.817	0.841	0.864	0.885	0.903	0.919	0.933	0.944	0.954	0.961	0.968
$5.9$	0.762	0.784	0.818	0.842	0.864	0.885	0.903	0.919	0.933	0.943	0.953	0.960	0.967
$6.2$	0.763	0.784	0.819	0.842	0.865	0.885	0.903	0.919	0.932	0.943	0.952	0.960	0.966
$6.5$	0.763	0.784	0.819	0.843	0.865	0.885	0.903	0.918	0.931	0.942	0.951	0.959	0.965
$6.8$	0.763	0.795	0.820	0.843	0.865	0.885	0.903	0.917	0.931	0.941	0.950	0.958	0.964
$7.1$	0.764	0.795	0.820	0.844	0.865	0.885	0.902	0.918	0.930	0.941	0.949	0.957	0.963
$7.4$	0.764	0.796	0.820	0.844	0.865	0.885	0.902	0.917	0.930	0.940	0.949	0.956	0.962
$7.7$	0.764	0.796	0.821	0.844	0.865	0.885	0.902	0.917	0.929	0.969	0.978	0.984	0.989
$8$	0.772	0.797	0.821	0.844	0.866	0.885	0.902	0.916	0.956	0.969	0.978	0.984	0.989
$8.3$	0.772	0.797	0.821	0.844	0.865	0.884	0.901	0.915	0.956	0.968	0.977	0.984	0.988
$8.6$	0.772	0.797	0.822	0.844	0.865	0.884	0.901	0.940	0.956	0.968	0.977	0.983	0.988
$8.9$	0.773	0.798	0.822	0.845	0.865	0.884	0.901	0.940	0.956	0.968	0.977	0.983	0.988
$9.2$	0.773	0.798	0.822	0.845	0.865	0.884	0.900	0.940	0.955	0.967	0.976	0.983	0.987
$9.5$	0.773	0.798	0.822	0.845	0.865	0.884	0.921	0.940	0.955	0.967	0.976	0.982	0.987
$9.8$	0.774	0.798	0.822	0.845	0.865	0.883	0.921	0.940	0.955	0.967	0.975	0.982	0.986
$10.1$	0.774	0.799	0.822	0.845	0.865	0.883	0.921	0.939	0.955	0.966	0.975	0.982	0.986
$10.4$	0.774	0.799	0.823	0.845	0.865	0.883	0.921	0.939	0.954	0.966	0.975	0.981	0.986
$10.7$	0.774	0.799	0.823	0.845	0.865	0.900	0.921	0.939	0.954	0.965	0.974	0.981	0.985
$11$	0.774	0.799	0.823	0.845	0.865	0.900	0.921	0.939	0.953	0.965	0.974	0.980	0.985
$11.3$	0.775	0.799	0.823	0.845	0.865	0.900	0.921	0.939	0.953	0.965	0.974	0.980	0.985
$11.6$	0.775	0.800	0.823	0.845	0.864	0.900	0.921	0.939	0.953	0.964	0.973	0.979	0.984
$11.9$	0.775	0.800	0.823	0.845	0.864	0.900	0.921	0.938	0.953	0.964	0.973	0.979	0.984
$12.2$	0.775	0.800	0.823	0.845	0.864	0.900	0.921	0.938	0.953	0.964	0.972	0.979	0.984
$12.5$	0.775	0.800	0.823	0.845	0.878	0.900	0.921	0.938	0.952	0.963	0.972	0.978	0.983
$12.8$	0.776	0.800	0.823	0.844	0.878	0.900	0.921	0.938	0.952	0.963	0.972	0.978	0.983
$13.1$	0.776	0.800	0.823	0.844	0.878	0.900	0.921	0.938	0.952	0.963	0.971	0.978	0.983
$13.4$	0.776	0.800	0.823	0.844	0.878	0.900	0.921	0.938	0.951	0.963	0.971	0.977	0.982
$13.7$	0.776	0.800	0.823	0.844	0.878	0.901	0.920	0.937	0.951	0.962	0.971	0.977	0.982
$14$	0.776	0.800	0.823	0.844	0.878	0.900	0.920	0.937	0.951	0.962	0.970	0.977	0.982
$14.3$	0.776	0.800	0.823	0.844	0.878	0.900	0.920	0.937	0.951	0.961	0.970	0.976	0.981
$14.6$	0.776	0.800	0.823	0.854	0.878	0.900	0.920	0.937	0.950	0.961	0.970	0.976	0.981
$14.9$	0.776	0.800	0.823	0.854	0.878	0.900	0.920	0.936	0.950	0.961	0.969	0.976	0.981
$15.2$	0.776	0.800	0.823	0.854	0.878	0.900	0.920	0.936	0.950	0.961	0.969	0.975	0.980
$15.5$	0.777	0.801	0.823	0.855	0.878	0.900	0.920	0.936	0.950	0.960	0.969	0.975	0.980
$15.8$	0.777	0.801	0.823	0.855	0.878	0.900	0.920	0.936	0.949	0.960	0.968	0.975	0.980
$16.1$	0.777	0.801	0.823	0.855	0.878	0.900	0.919	0.936	0.949	0.960	0.968	0.974	0.979
$16.4$	0.777	0.801	0.823	0.855	0.878	0.900	0.919	0.936	0.949	0.959	0.968	0.974	0.979
$16.7$	0.777	0.801	0.823	0.855	0.878	0.900	0.919	0.935	0.949	0.959	0.967	0.974	0.979
$17$	0.777	0.801	0.823	0.855	0.878	0.900	0.919	0.935	0.948	0.959	0.967	0.973	0.978
$17.3$	0.777	0.801	0.823	0.855	0.878	0.900	0.919	0.935	0.948	0.958	0.967	0.973	0.978
$17.6$	0.777	0.801	0.823	0.855	0.879	0.900	0.919	0.935	0.948	0.958	0.966	0.973	0.978
$17.9$	0.777	0.801	0.830	0.855	0.879	0.900	0.919	0.935	0.948	0.958	0.966	0.973	0.978
$18.2$	0.777	0.801	0.830	0.855	0.879	0.900	0.919	0.934	0.947	0.958	0.966	0.972	0.977
$18.5$	0.777	0.801	0.831	0.855	0.879	0.900	0.918	0.934	0.947	0.958	0.966	0.972	0.977
$18.8$	0.777	0.801	0.831	0.855	0.879	0.900	0.918	0.934	0.947	0.957	0.965	0.972	0.977
$19.1$	0.777	0.801	0.831	0.855	0.879	0.900	0.918	0.934	0.947	0.957	0.965	0.971	0.976
$19.4$	0.777	0.801	0.831	0.855	0.879	0.899	0.918	0.933	0.946	0.957	0.965	0.971	0.976
$19.7$	0.777	0.801	0.831	0.855	0.879	0.900	0.918	0.933	0.946	0.956	0.964	0.971	0.976
$20$	0.777	0.801	0.831	0.855	0.879	0.900	0.918	0.933	0.946	0.956	0.964	0.970	0.975

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sim, S.-K.; Maass, P.; Lind, P.G. Wind Speed Modeling by Nested ARIMA Processes. Energies 2019, 12, 69. https://doi.org/10.3390/en12010069

AMA Style

Sim S-K, Maass P, Lind PG. Wind Speed Modeling by Nested ARIMA Processes. Energies. 2019; 12(1):69. https://doi.org/10.3390/en12010069

Chicago/Turabian Style

Sim, So-Kumneth, Philipp Maass, and Pedro G. Lind. 2019. "Wind Speed Modeling by Nested ARIMA Processes" Energies 12, no. 1: 69. https://doi.org/10.3390/en12010069

APA Style

Sim, S.-K., Maass, P., & Lind, P. G. (2019). Wind Speed Modeling by Nested ARIMA Processes. Energies, 12(1), 69. https://doi.org/10.3390/en12010069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wind Speed Modeling by Nested ARIMA Processes

Abstract

1. Introduction

2. ARIMA-Model

3. ARIMA Model for Wind Speed Measurements

4. Nested ARIMA Model for Wind Speeds

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Optimal Non-Linear Variable Transformation of Wind Speeds into a Gaussian Variable

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI