1. Introduction
Knowledge of the flow disturbances in the boundary layer of the atmosphere is very important for the design of wind turbines, the landing and takeoff of airplanes, the design of bridges, etc. The kinetic energy of air mass motion in the ABL
is the sum of two components: the kinetic energy of the ordered motion
, associated with the mean wind velocity
, and the turbulence kinetic energy
, associated with the wind velocity variance
. The vector of the wind velocity components
forms the non-stationary random process defined by different
n-dimensional distributions depending on a great number of factors. It is clear that to construct a mathematical model of the wind velocity in the ABL based on these
n-dimensional distributions is extremely difficult if, indeed, it is possible. Usually simpler problems with fixed stable factors (locality, underlying surface, time, altitude) are considered for which particular mathematical models are developed and solutions are obtained. For a fixed altitude, consider the quasi-stationary period
T during which the wind velocity
with distribution
is represented as a superposition of the stationary process
characterizing the stationary motion of air masses with the prior distribution
and the small outlier fraction ε with the velocity
and the distribution
. By the outliers, we mean any wind velocities, including wind gusts, the distribution
of which deviates from the main wind velocity distribution
during the period
T [
1,
2,
3]. For this model, the kinetic wind energy
E in the ABL is the sum of two components
, where
is the kinetic energy of the reference stationary process, and
is the kinetic energy of the outliers. The probabilistic problem formulation is defined by the prior information on the basic types of three distributions
,
, and
.
The kinetic energy of the motion of air masses includes both the prior distribution that characterizes the stationary motion used to predict the wind energy potential of wind turbines, and the distribution that characterizes the kinetic energy of wind outliers, used to estimate the effect of wind outliers on objects in the ABL, for example, light drones, bridges, and high-rise buildings.
The appearance of Doppler acoustic radars (sodars) makes it possible to obtain information on the altitude profiles of wind velocity vector components with high spatial (up to several meters) and temporal resolution [
4]. However, usually the outliers are detected and then eliminated from raw wind field data [
5,
6]. Thus, in [
7], various outlier detection and elimination methods were analyzed, and the conclusion was drawn that in the previous works, outliers were simply removed and considered as missing data. In [
8], a visual inspection method was used to eliminate the outliers. In this case, the information on the outliers and their contribution to the kinetic wind energy in the ABL was completely lost.
In the present work, we succeeded in analyzing the spatiotemporal dynamics of kinetic wind energy with and without allowance being made for the kinetic energy of outliers. We first represented the distribution as a mixture of distributions and ; we separated the contributions of the mean kinetic energy and the kinetic energy of the outliers, and estimated the latter using robust parametric, semiparametric, and semi-nonparametric algorithms developed by us.
We studied the diurnal hourly dynamics of kinetic wind energy in the lower 200 m layer of the atmosphere based on minisodar measurements of the wind velocity vector that obeys the distribution . Like all big data arrays, the minisodar measurements of altitude profiles of the wind velocity vectors have outliers and measurement errors. The integral estimate of the kinetic wind energy in the ABL can be obtained from sample means and variances of the vertical profiles of the wind velocity vectors. To estimate the energy of the mean wind velocity, the unbiased, consistent, and effective estimates of the parameters of distribution should be obtained, based on the measured vertical profiles of the wind velocity vector with the distribution . This can be achieved by using robust estimates of the parameters of the distribution .
Robust statistics is an actively developing branch of mathematics [
8,
9]. At present, different methods (including subjective ones) of obtaining robust estimates are proposed based on the minimization of various robustness criteria. Thus, in works [
9,
10,
11], the criteria of minimizing the maximum asymptotic bias and the maximum variance were proposed. The advantages and disadvantages of the criterion of minimizing the maximum entropy were analyzed in [
6,
9]. The criteria of the maximum robustness and of the maximum distance were used for the minimum distance (MD) estimates in works [
11,
12]. The adaptive estimators were studied in works [
6,
11]. For example, for the shear parameter, about 50 different robust estimates were used, for example, based on robust M-estimators [
11,
13], regression credibility [
14], or robust linear regression models [
15]. This is obviously caused by the fact that in robust statistics, there is no established robustness criterion, unlike the efficiency criterion in classical mathematical statistics [
16].
First, the robustness and efficiency criteria turned out to be contradictory, which stimulated the application of the robust estimates with intermediate characteristics (for example, the MD and Hellinger distance estimates [
10,
11] and the adaptive estimates [
6,
11]). Second, for the asymmetric outlier characteristics typical of the ABL [
6,
12,
17,
18], the robust estimates are biased and inconsistent [
6,
19]. Third, methods for obtaining the robust estimates are aimed at the removal of external outliers distant from the bulk of observations, whereas the presence of internal outliers can lead to essential errors in decision-making. All these problems stimulate the development of new robust, efficient algorithms for a wide class of outlier distributions, including asymmetric distributions and distributions with internal outliers. Such algorithms should adapt to the priori distributions. In the present work, robust algorithms are synthesized by the maximum likelihood method that converge to the effective algorithms for the inhomogeneous minisodar measurement data.
Unfortunately, real measurement data usually do not fit the mathematical models on which mathematical statistics heavily relies. Their processing requires solutions of a number of mathematical problems by robust nonparametric statistical methods [
6]. Indeed, the problems of studying the spatiotemporal dynamics of kinetic wind energy in the ABL depend significantly on prior information on the distributions
,
, and
. They can be divided into different classes: parametric (when
,
, and
are parameterized to within unknown parameters
), semiparametric (when
and
are parameterized, and the form of
is unknown), and semi-nonparametric (when the forms of
,
, and
are unknown) problems of robust statistics [
6]. As to the prior distribution
, from physical considerations, most researchers think that it belongs to the class of symmetric normal-type distributions with light or medium tails [
1,
2,
3]. The dynamics of the atmosphere in the ABL depends on many significant and not always measurable parameters. In this regard, the distribution
, representing a mixture of the distributions
and
, should be referred to the nonparametric class. Traditionally, this class of problems is of great interest to scientists and has a century-old solution history [
20,
21]. Typical robust problems belong to the class of semiparametric statistics, provided that the fraction ε and the distribution of the outliers
are unknown. In this case, the problem arises: how many sample observations, and from which side, should be trimmed to ensure the stability of the solutions obtained [
12,
17,
20,
21]. For example, symmetric trimming provides unbiased estimates for the symmetric distributions
and eliminates remote outliers [
12,
17], but the application of these procedures for asymmetric internal outliers leads to biased and ineffective estimates [
6]. The semi-nonparametric problems for the unknown forms of
,
, and
were not considered within the framework of robust statistics [
6,
12]. This is due to the fact that, for the unknown forms of the distributions
and
, the formal problem arises in the assessment of their differences. On the other hand, it should be noted that although the forms of distributions of the majority of observations
and outliers
are unknown, researchers usually implicitly use some additional prior information concerning these distributions (their semi-nonparametric forms) to distinguish them. For example, the information on the class of distributions
with external (internal) symmetric (asymmetric) outliers, and on the normal symmetric distributions
with light or medium tails, is often implicitly implied and allows symmetric trimming to be performed at a level of
. The wind outliers, including wind gusts, are investigated using the skewness and kurtosis of their distributions [
17,
18]. This additional information on the distributions should be normalized and taken into account [
22,
23] to synthesize robust semi-nonparametric algorithms [
12]. Hence, to estimate the kinetic wind energy components in the ABL from inhomogeneous data samples with the distribution
, effective robust algorithms for the processing of experimental data with different levels of prior uncertainty should be developed.
In the present work, new robust parametric, semiparametric, and semi-nonparametric algorithms of the weighted maximum likelihood method are used to process experimental data with different levels of a priori statistical uncertainty to estimate the kinetic wind energy components in the ABL. The robust estimates of the total kinetic wind energy with and without allowance for the contribution of the outliers and of their difference equal to the kinetic wind energy of the outliers are obtained by postprocessing the measurements with an AV4000 minisodar [
4].
The problem solved in the present work is formulated in
Section 2 below.
Section 3 describes the robust parametric algorithms and their application for estimating kinetic wind energy with and without allowance for the contribution of the outliers. In
Section 4, the robust semiparametric algorithms are described and the results of their application are given. The kinetic wind energy with allowance for the contribution of the outliers and its mean and the turbulence components are presented together with the kinetic energy of the outliers retrieved by postprocessing of minisodar measurements. In the Conclusion, the main obtained results are given.
2. Problem Formulation
The kinetic wind energy in the ABL
is determined by the energy of the motion of air masses—the wind energy. Below, we present the kinetic energy in the ABL reduced to unit air mass
and measured in m
2/s
2 (m
2/s
2 = J/kg) [
24]. It is natural that the regularities in the spatiotemporal behavior of the reduced kinetic energy will fully refer to the total kinetic energy. For this reason, we use the term kinetic energy for the kinetic energy per unit air mass. It is equal to the sum of two components: the mean kinetic energy
EM, associated with the mean wind velocity
, and the turbulence kinetic energy
ET, associated with the wind velocity variance
. Following [
24], we can write:
where
,
, and
are mean values of the
x,
y, and
z components of the wind velocity, and
,
, and
are their variances. The wind vector
is a non-stationary random process. At a fixed altitude
z, we consider the quasi-stationary interval
, and for
, represent the process
as a superposition of the process
, stationary in a broad sense, with the prior distribution
characterizing the ordered movement of air masses, and the fraction ε of the outliers
with the distribution
characterizing the inhomogeneous movement of air masses. As a convenient probabilistic model of the real distributions
used for studying the ABL, we take the mixture of the distributions
and
that satisfies the regularity conditions. We consider that these models approximately coincide with the prior distribution
characterizing the ordered movement of air masses for
[
1]. For this model, the energy of movement of air masses in the ABL,
E is the sum of two components
, where
is the energy of the stationary process, and
is the energy of the outliers. With an allowance for Formula (1), we can write
,
,
, and
.
We studied the kinetic wind energy characteristics in the ABL by postprocessing the minisodar measurements of the x, y, and z components of the wind velocity , , and in the jth strobe at altitude in the kth measurement series started at time during observation time T. We considered N observations (samples) , , and , , during observation time T. Based on these samples, it is required to obtain unbiased, consistent estimates , , and of the processes , , and .
The unbiased consistent estimate averaged over the observation period T started at time is equal to the mean and variance of the samples , , and , . The unbiased and consistent estimates of may be calculated from the unbiased and consistent estimates of the distribution that may be obtained using the robust algorithms for the distribution based on the samples , , , , , from the distribution . Finally, the unbiased and consistent estimate of the outlier energy is equal to the difference . Hence, the main problem is reduced to obtaining asymptotically unbiased, consistent, and effective estimates of the parameters of the distribution based on the samples from the distribution ; that is, the robust estimates for the distribution based on prior information on the forms of the distributions , , and and their superpositions.
As already indicated above, robust statistics based on different robustness criteria cannot guarantee unbiased, consistent, and effective estimates of parameters, especially for asymmetric distributions
[
6,
10,
11,
19]. Considering the algorithms for robust estimation on different levels of prior information based on the weighted maximum likelihood method, we can formalize the problem. To simplify the derivation of the robust algorithms, we designated the samples
,
, and
,
, by
. Let
be the inhomogeneous sample of independent random variables with the distribution function
, where
P is the class of distributions in the form of the mixture of the distributions satisfying the regularity conditions of the maximum likelihood method (MLM) [
6,
10,
11]:
where
is the a priori model of the distribution function,
is the outlier distribution,
is the outlier fraction, and
is the unknown parameter vector of the distribution. We designated by
,
, and
the corresponding distribution densities. Considering the problem of constructing the robust effective estimate
for the unknown parameter
of the prior distribution
from the inhomogeneous sample
with the distribution
, the mathematical model (2) can be based on both parametric and nonparametric models of the distribution function and on their superpositions—semiparametric and semi-nonparametric models.
3. Robust Parametric Algorithms and Their Application for Estimating Kinetic Wind Energy
Let us consider the problem at the parametric level of prior uncertainty. For the sample
with the distribution
, we first derived the maximum likelihood estimate (MLE)
in
. By analogy with [
5,
8], it can be shown that the consistent asymptotically unbiased effective estimates
for the prior distribution
are determined from the system of estimation equations:
The estimates
from the system of estimation Equation (3) are determined by iterations and are estimates of the weighted maximum likelihood method (WMLM). The efficiency of the estimates follows from the MLM, the unbiasedness of the estimates for
was shown in [
12], and their robustness follows from the bounded variances provided with the weight functions
given by Equation (5).
As an example, we considered the algorithms for robust estimates of the mean and variance
of the prior normal distribution
:
The plots of the corresponding distribution densities, weight functions, and estimate functions for the normal distribution are shown in
Figure 1 for the indicated outlier fractions.
Figure 2a shows an example of the parametric estimates of the vertical profiles of the total kinetic wind energy without (
E0, the orange curve) and with an allowance for the contribution of the kinetic outlier energy (
E, the blue curve) retrieved from measurements with the commercial triaxial Doppler monostatic minisodar AV4000 (Atmospheric Systems Corporation, Santa Clara, CA, USA) [
4]; its sounding range was 5–200 m with vertical resolution Δz = 5 m. The acoustic antenna was an array of 50 loudspeakers used to both transmit and receive acoustic signals at a frequency of 4900 Hz. This loudspeaker array was electrically steered to generate three independent beams: one vertical and two others at elevation angles of 76° in two mutually orthogonal planes. The minisodar had a pulse repetition period of 4 s and a pulse duration of 60 ms. The minisodar provided one vertical signal profile in all three channels every 4 s, which was used to calculate the wind vector components
Vx(
z,
t),
Vy(
z,
t), and
Vz(
z,
t), and their variances,
,
, and
from the well-known formulas for the Doppler frequency shifts. To investigate their dynamics, we sampled and processed 150 vertical profiles recorded from the beginning of each hour from 00:00 till 23:00 to obtain 10 min averages and to estimate the total kinetic wind energy
E(
z,
t) and its components caused by the stationary air movement
E0(
z,
t) and the wind outliers
Eout(
z,
t).
Continuous minisodar measurements were taken in the vicinity of Santa Clarita, CA, USA (34°23′29.9904″ N, 118°32′33.3096″ W) over a flat underlying surface without tall vegetation [
25] from 10 to 17 September 2003. During the period of the measurements, the weather was dry, warm and sunny. Here, we postprocessed 10-min averaged minisodar measurements that started on 10 September at 12:00 LT, and at every hour from 00:00 till 23:00 on 16 September. The temperature during the period of measurements on 10 September was 23 °C, and the mean wind speed was 3.57 m/s. The maximum daytime temperature on 16 September was 25.7 °C, and the minimum temperature at night was 16 °C. The average wind velocity was 9.1 m/s.
From
Figure 2, it can be seen that without the contribution of the kinetic outlier energy, the curve is quite smooth, whereas with an allowance made for this contribution, its layered structure can clearly be seen. It should also be noted that the atmospheric layers with strong turbulence were also pointed out by Shikhovtsev et al. in [
26] and Bolbasova et al. in [
27], where the maximum changes of the turbulence layer strengths were observed in the lower layer of the atmosphere at altitudes up to 70 m, and the diurnal variations of their altitudes were also indicated. The blue curve in
Figure 2a also shows that local outlier layers were clearly manifested in the vertical profile of the total kinetic energy starting from an altitude of 50 m; moreover, the thickness of these layers remained practically unchanged to altitudes of 150 m. Above 150 m, the curves synchronously changed with altitude and had kinetic energy maxima at
z = 175 m. This suggests that new robust algorithms [
14] for the detection and selection of outliers of various origins in the observation samples at different levels of a priori statistical uncertainty are especially important for altitude ranges in which the kinetic outlier energy is significant. For this reason, of considerable interest are the physical reasons for the appearance of these features in the kinetic outlier energy.
Figure 2b shows an example of the altitude profiles of the turbulence kinetic energy component
ET without (the orange curve) and with allowance for the contribution of the kinetic outlier energy (the blue curve). The comparison of
Figure 2a,b demonstrates that, in this situation, the altitude behavior of the orange and blue curves differs quantitatively only in the layer below 25 m; above this altitude, they qualitatively agree. This suggests that at these altitudes, the predominant contribution to the total kinetic wind energy comes from the turbulence kinetic energy component.
In
Figure 2a,b, the fairly smooth (orange) curves of the parametric estimates of the vertical profiles of the kinetic energy of stationary air motion, disregarding the contribution of the kinetic outlier energy, can be seen. Taking into account the contribution of the kinetic energy of outliers, the blue curves depict the fine structure of the altitude dependence of the total kinetic energy. This testifies that, in this case, attention should be focused on the kinetic energy of the outliers.
4. Robust Semiparametric Algorithms and Their Application for Estimating Kinetic Wind Energy
Let us consider the problem at the semiparametric level of prior uncertainty. In this case, it is assumed that
is known to within a finite number of the unknown parameters
, and the information on the outliers
, that is, on the form of the distribution function
and, hence,
is unknown. For the sample
from
, we obtained the estimate
by the maximum likelihood method. By analogy with [
12], it is possible to show that the asymptotically effective (unbiased and consistent) estimates
of the WMLM for the prior distribution
are determined from the system of the estimation equations:
where
is the nonparametric estimate of the Parzen–Rosenblatt density,
is the kernel function, and
is the bandwidth [
6,
8]. For
to be consistent and unbiased, the following conditions must be satisfied:
.
The estimates
are defined from the system of estimation Equation (8) by iterations. The efficiency of the estimates follows from the WMLM, and the unbiased and bounded variances of the estimates are provided by the weight functions
given by Equation (9). For example, for the normal distribution with the shear parameter
,
Estimation Equation (8) assumes the form
In
Figure 3, the results of modeling are shown for observations of the superposition of the normal distributions of asymmetric internal outliers with
To determine the kinetic energy
of the stationary process, we needed to determine the mean
and variance
of the distribution
:
Defining from Equations (8)–(10) the asymptotically effective robust estimates of the parameters and substituting them into Equation (8), we obtained for and the asymptotically effective robust estimates of the mean and variance and determined on their basis the kinetic energy of the stationary process.
5. Robust Semi-Nonparametric Algorithms and Their Application for Estimation of Kinetic Wind Energy
Let us consider the class of semi-nonparametric problems of robust statistics in which the information on the prior distribution
and the outliers
; that is, on the forms of the distribution functions
,
and, hence,
is unknown. As the parameters
of this nonparametric class of distributions
, the shear and scale parameters can be considered and, in general, some functionals of
. Note that though the forms of distribution of the majority of observations
and of the outliers
are unknown, the researchers usually use some additional prior information on these distributions which allows them to be distinguished. We formalized this additional prior information. It is known a priori that
, where
is a nonparametric class of continuous functions satisfying to the conditions
where the functions
are known. The assignment of the prior information in the form of Equation (13) allows one to consider a wide spectrum of information on
of both quantitative and qualitative character [
22,
23]. Taking into account prior information (13), the modified nonparametric estimate of the Parzen–Rosenblatt density
can be constructed [
22,
23]:
The estimate
is asymptotically unbiased, consistent, and
has the asymptotically normal distribution [
22,
23].
Let us define the WMLM estimation equation for
in the form
where the estimation function
is
is the function of the MLE contribution for
,
is the weight function:
and
is the nonparametric estimate of the Parzen–Rosenblatt density given by Equation (10). Formulas (15)–(17) define the adaptive nonparametric estimate (ANE)
of the WMLM.
For example, let
be a symmetric function of the parameter
. The modified nonparametric estimate of the Parzen–Rosenblatt density takes the form [
22,
23]
The estimate is asymptotically unbiased, consistent, and has the minimal root-mean square error on the nonparametric class of distributions .
Let
be the unknown parameter of
in the form:
To derive the robust estimate
of the parameter β, consider the estimation equation in the form:
where the estimation function
has the form:
is the weight function,
is the nonparametric estimate of the Parzen–Rosenblatt density given by Equation (10), and
is the modified nonparametric estimate of the Parzen–Rosenblatt density. Formulas (20) and (21) define the robust semi-nonparametric estimate β
N of the parameter β. A mathematical analysis of these estimates is beyond the scope of the present work. Note only that they are asymptotically unbiased and efficient for the distribution
, provided that the sample
is from
.
For example, let
be the class of symmetric functions of the shear parameter θ. Let us designate by
the average and variance of
. In this case, the estimation equations for the robust nonparametric estimates
and normal kernel function (7) take the following forms:
where
are the Walsh half-sums and
is defined by Formula (10). The estimates are determined by iterations. The estimates of the kinetic energies of the stationary process without outliers,
, and with the outliers,
, were obtained using these formulas.
Table 1 presents the semi-nonparametric estimates of the altitude dependence of the kinetic wind energy without outliers,
, the kinetic wind energy with an allowance for the contribution of the outliers,
E, and their difference
equal to the kinetic energy of the outliers obtained by the postprocessing of the minisodar measurements.
Consider first the semi-nonparametric estimates of the kinetic energy without outliers. Their nearly monotonic increase with altitude could be easily traced from 20 to 200 m. The large E0 value at z = 10 m is apparently caused by the effect of the underlying surface. The rate of E0 growth increased with altitude starting from z = 150 m. Our statistical analysis of the altitude behavior of the total kinetic energy E with an allowance for the contribution of the kinetic energy of outliers revealed the following. The E values differred only slightly from the corresponding E0 values at altitudes up to 50 m, that is, in this altitude range, the contribution of the kinetic outlier energy to the total kinetic energy was low. The situation radically changed above 50 m. An essential nonmonotonic increase in E values was observed with a further increase in the altitude, accompanied by considerable deviations from the monotonic dependence.
As to the altitude dependence of the kinetic energy of outliers Eout, it was low at altitudes z ≤ 50 m. Its increase at z = 10 m was caused by the effect of the underlying surface. From 50 to 170 m, the nonmonotonic increase in Eout was observed together with increasing Eout values between 100 and 110 m. Above 170 m, a certain decrease in Eout was observed.
Figure 4 shows the diurnal hourly dynamics of the robust semi-nonparametric estimates of the total kinetic wind energy with an allowance for the outliers
E (a); its mean, E
MKE (b), and turbulence,
ETKE (c), components; and the kinetic energy of the outliers
(d) retrieved by the postprocessing of the minisodar measurements. From
Figure 4a, it can be seen that the maximum values (red color area)
Emax ~ 500 m
2/s
2 were observed at night, from 00:00 till 02:00, local time, and in the evening, from 19:00 till 24:00. However, whereas at night the lower boundary of the layer with enhanced kinetic energy was about 125 m, at night it descended to 75 m. From 02:00 till 19:00,
E did not exceed 100 m
2/s
2 in the lower layer. The upper boundary of this layer first ascended from about 50 m to 200 m at noon and then descended to 100 m.
The robust semi-nonparametric estimates of the diurnal hourly behavior of the mean kinetic wind energy component
EM (
Figure 4b) demonstrated that from 04:00 till 19:00, its values were about 50 m
2/s
2 and were practically independent of time of day and sensing altitude. From midnight till 02:30,
EM underwent local variations, and reached 225 m
2/s
2 (red area) at altitudes above 125 m. From 02:30 till 04:00, it increased to 100 m
2/s
2 at altitudes from 25 to 75 m (light blue area). From 19:00 till 24:00, areas with enhanced values of
EM ~ 225 m
2/s
2 (red areas) were observed. As a whole, the contribution of
EM to the total kinetic wind energy
E was low.
Figure 4c shows the robust semi-nonparametric estimates of the hourly dynamics each day of the turbulence kinetic energy component
ET obtained by the postprocessing of the minisodar measurements. From the figure, it can be seen that
ET increased with altitude
z. In the lower layer, the altitude of which first increased from 75 m at 00:00 to 200 m at 01:00 and then decreased to 75 m at 24:00 undergoing significant altitude hourly variations, it did not exceed 80 m
2/s
2. Above 75 m, an area of enhanced
ET was clearly visible.
Figure 4d shows the diurnal hourly dynamics of the kinetic wind energy of the outliers. It can be seen that the contribution of the outliers became especially pronounced at night from 00:00 till 02:00 at altitudes above 50 m. From 05:00 till 19:00, the kinetic energy of the outliers reached 60 m
2/s
2 at altitudes above 100 m. In the evening and at night, the kinetic energy of the outliers (red areas) reached 120 m
2/s
2, that is, providing significant contribution to the total kinetic energy
E~200 m
2/s
2 in this altitude range (see
Figure 4a), and its effect was more pronounced at altitudes between 50 and 100 m.