Applications and Modeling Techniques of Wind Turbine Power Curve for Wind Farms—A Review

Bilendo, Francisco; Meyer, Angela; Badihi, Hamed; Lu, Ningyun; Cambron, Philippe; Jiang, Bin

doi:10.3390/en16010180

Open AccessReview

Applications and Modeling Techniques of Wind Turbine Power Curve for Wind Farms—A Review

by

Francisco Bilendo

¹

,

Angela Meyer

²

,

Hamed Badihi

¹

,

Ningyun Lu

^1,*,

Philippe Cambron

³ and

Bin Jiang

¹

College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

²

Department of Engineering and Information Technology, Bern University of Applied Sciences, 2501 Biel, Switzerland

³

Department of Wind Energy Research and Development (R&D), Power Factors, Montreal, QC J4Z 1A7, Canada

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(1), 180; https://doi.org/10.3390/en16010180

Submission received: 28 November 2022 / Revised: 19 December 2022 / Accepted: 20 December 2022 / Published: 24 December 2022

(This article belongs to the Section A3: Wind, Wave and Tidal Energy)

Download

Browse Figures

Versions Notes

Abstract

:

In the wind energy industry, the power curve represents the relationship between the “wind speed” at the hub height and the corresponding “active power” to be generated. It is the most versatile condition indicator and of vital importance in several key applications, such as wind turbine selection, capacity factor estimation, wind energy assessment and forecasting, and condition monitoring, among others. Ensuring an effective implementation of the aforementioned applications mostly requires a modeling technique that best approximates the normal properties of an optimal wind turbines operation in a particular wind farm. This challenge has drawn the attention of wind farm operators and researchers towards the “state of the art” in wind energy technology. This paper provides an exhaustive and updated review on power curve based applications, the most common anomaly and fault types including their root-causes, along with data preprocessing and correction schemes (i.e., filtering, clustering, isolation, and others), and modeling techniques (i.e., parametric and non-parametric) which cover a wide range of algorithms. More than 100 references, for the most part selected from recently published journal articles, were carefully compiled to properly assess the past, present, and future research directions in this active domain.

Keywords:

power curve; applications; modeling techniques; wind farms; wind turbines

1. Introduction

Among renewable energy resources, wind energy technology has emerged as one of the most outstanding sources of power due to its competitiveness in terms of economic benefits and environmental friendliness [1,2,3]. In the last decade, the industry has experienced tremendous growth in various aspects, for instance, in terms of the technology to harvest the wind power effectively, the increasing number of wind farms and installed wind turbines, and the extension of wind farms from onshore to remote locations and offshore, among others. This growth has been partially influenced by the increasing demand for wind power in the most recent years. Meeting these high demands requires a massive penetration of “wind power” in the power grid system. However, the volatile nature of wind causes uncertainty and significant challenges in the “energy management systems” (EMS) in terms of scheduling and dispatching, which consequently impact the “reliability” of the power grid system [4,5,6]. This problem has drawn the attention of researchers towards the state of the art to develop appropriate solutions. A large portion of the developed solutions are established through wind turbine “power curve”. In fact, the power curve has proven to be a key condition indicator of the wind turbine as it represents the relationship between the “wind speed” at the hub height and the corresponding “active power” to be generated [7,8,9,10].

According to the reviewed literature, most research articles in the power curve framework can be classified into four major categories: (i) the first category refers to research works which are focused on the aspects related to wind farm development, such as wind turbine selection, capacity factor estimation, wind energy assessment and forecasting, among others; these studies do not necessarily used historical data of wind turbines, as they are based on measurements from the test site; (ii) the second category refers to research works focused towards condition monitoring with regards to anomaly and fault detection, along with diagnosis and prognosis. Such studies indeed require historical data which includes the “wear and tear” of the wind turbine; the data is often attained through a standalone “condition monitoring system” (CMS) or the “supervisory control and data acquisition” (SCADA) system; (iii) the third category refers to research works exploring data preprocessing and correction schemes to guarantee the integrity of the historical dataset prior to modeling; in general, these methods are based on filtering, clustering, and isolation approaches, among others; (iv) lastly, the fourth category refers to research works based on modeling techniques in which a wide range of techniques from parametric to non-parametric algorithms have been explored.

Although this research field continues to evolve, most review articles currently available in literature (see brief descriptions in Table 1) only focus on the latter category of the aforementioned power curve-based research works, that is, the modeling techniques. Consequently, most of the other key aspects involved in the process of power curve modeling and recent findings are ignored, for instance:

A large portion of the modeling techniques requires optimal (normal) historical data to be available in advance for model learning purposes. Yet, how to attain an optimal (normal) historical data to train and validate the models is completely ignored in the currently available review papers.
Recent findings have reported a wide range of power curve based anomalies and fault signatures, which is very effective for root cause analysis, diagnosis and prognosis. However, none of the currently available review papers have addressed those recent findings.

In order to fill these gaps identified in the literature, the main contributions of this research article are as follows:

The applications of the wind turbine power curve are explored, including their involvement in the process of wind turbine selection, capacity factor estimation, wind energy assessment and forecasting, and condition monitoring.
The most common types of power curve based anomaly and fault signatures are investigated and analyzed from a diagnostic standpoint. That includes a wide range of issues, such as those caused by “damaged power measuring instrument”, “communication equipment fault”, “imposed control action”, “load sensor failure”, and “harsh environmental conditions”, among others.
Data preprocessing and correction schemes, which are usually performed prior to modelling the power curve of wind turbine in order to attain the optimal (normal) historical data, are explored. That includes methods in the framework of filtering, clustering, isolation, and other approaches.
An updated review of the modeling techniques including parametric and non-parametric algorithms is presented along with the most common performance metrics.

The remainder of this paper is presented as follows: Section 2 is dedicated to introducing the wind turbine power curve, followed by its applications in Section 3. In Section 4, power curve based anomaly and fault signatures are explored. In Section 5, data preprocessing and correction schemes are presented. Section 6 provides an updated review of the power curve modelling techniques, followed by an overall assessment in Section 7, and discussion and prospects in Section 8. Finally, Section 9 provides final conclusions.

2. Wind Turbine Power Curve

According to the reviewed literature, the wind turbine power curve represents the relationship between the “wind speed” at the hub height and the corresponding “active power” to be generated [13,14,16]. A general approximation of this “nonlinear relationship” can be expressed as:

P = \frac{1}{2} ρ A C_{P} (λ, β) v^{3}

(1)

where

ρ

represents the “air density”,

A

indicates the “swept area”,

C_{P}

denotes the “power coefficient” of the wind turbine, and

v

designates the “hub-height wind speed” (m/s). Important to mention that the “power coefficient is a function of tip-speed ratio

λ

and blade pitch angle

β

”. Through the power curve, one may notice that wind turbines operate in different operating modes, partially dependent on the wind speed ranges.

In Figure 1, the appointed modes of operation of a wind turbine, assigned as “regions”, are (ideally) shown:

From a low-level wind speed, the wind turbine doesn’t generate power; power generation commence at the “cut-in speed” $v_{c u t - i n}$ ;
When the wind speed level rises from the “cut-in speed”, the wind turbine generates power at an increasing rate, up to the “rated speed” $v_{r a t e d}$ ;
Having reached the “rated speed”, the wind turbine generates power at a constant rate, which is the maximum rated power up to the “cut-off speed” $v_{c u t - o f f}$ ;
From the “cut-off” speed limit, it is generally turned off as a preventive measure, in order to safeguard the wind turbine from higher speeds which may expose danger and damage the wind turbine seriously.

2.1. Ideal Power Curve

Essentially, the power curve is certified by the builder, and the process of certifying is established by the “international electrotechnical commission” (IEC) [14,16]. The so-called procedure consists of “simultaneous measurements” of the “wind speed” and “output power” which is measured at the “test site” for a substantial duration for the purpose of collecting information during different “atmospheric conditions”, on which the observed power curve may be defined. In general, this curve may be obtained by performing the “method of binning” [16,18], which may be expressed as:

P_{i} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} P_{i j}

(2)

where

P_{i j}

denotes the output power of the

j

th data point in bin

i

, while

n_{i}

represents the amount of data points in bin

i

. Note that in the demonstrated “plane version” of the “binning method”, except the wind speed, all of the variables representing environmental factors are ignored. Nevertheless, the research community has realized the importance of other factors such as the “air density” as part of the calculation for the power curve; to do so, a procedure, namely “air density correction”, is performed. Hence, from Equation (1), where

v

denotes the “raw wind speed”, the adjustment by “air density correction” procedure is performed based on the “measured air density”

ρ

, and can be expressed as:

v_{C} = v_{M} {(\frac{ρ}{ρ_{0}})}^{\frac{1}{3}}

(3)

where

v_{C}

and

v_{M}

represent the “corrected” and “measured” wind speeds individually, and

ρ_{0}

is the “dry air density” at “sea level”, as per “international atmospheric standard”

(ρ_{0} = 1.225 kg / m^{3})

, [19].

The “ideal” power curve is usually employed for a wide range of applications, such as “wind energy prediction” and “choice of wind turbines”, to mention a few. In particular, it may also be employed as a “reference” for purposes such as “wind turbine performance assessment” to evaluate the significance and effectiveness of the upgrades on wind turbines as they deteriorate through time.

2.2. Actual Power Curve

It should be mentioned that the “ideal power curve”, given by the wind turbine builder and curated through the standard procedure set by the IEC, doesn’t consider the “wear and tear” of the wind turbine. Hence, it may result in a curve deviation when analyzed in comparison with the “actual power curve” attained through data from the SCADA system [20,21]. The main factor for this deviation may be attributed to the variation in power curve values for equal wind speeds. Therefore, it may not be appropriate to directly apply the so-called standard approach for purposes such as “anomaly and fault detection”. Instead, a more delicate approach may be required.

3. Applications of Power Curve

Accurate power curve based models are essentially useful in a number of applications in the wind energy industry. Such applications mainly include wind turbine selection, capacity factor estimation, wind energy assessment and forecasting, and condition monitoring. This section is dedicated to providing a brief review on the aforementioned applications.

3.1. Wind Turbine Selection

When developing a particular wind farm, choosing the characteristics of the wind turbines to invest in, becomes an essential task, especially in the beginning stage of the project. The power curve of the wind turbine may serve as a critical indicator in a generic comparison between wind turbine types and can eventually assist in the process of choosing the appropriate wind turbine, considering the options available [14,16]. Optimization of wind farm system efficiency can be ensured by careful assessment and selection of wind turbine properties matching the wind regime of the site [22,23,24,25]. Several aspects need to be taken into consideration, for example, the wind turbine size, compatibility of the wind turbine type and the site, its history in terms of availability and reliability, and its warranty, among others. In general, the size of wind turbine has a close relationship with the amount of power to be produced. This fact is further illustrated in Figure 2. Two main constraints need to be taken into consideration:

On one hand, one may notice that, although larger and taller wind turbines are able to produce more power compared with the smaller and shorter ones, larger wind turbines can cause added expenses and delays in maintenance when replacing major components. One of the major challenges is the lack of facilities to lift heavy loads to the top of tall towers.
On the other hand, smaller wind turbines are apparently easier with regards to maintenance; however, they could provide lower production revenue due to shorter towers or less efficiency in general.

There are several other constraints that may play a role in deciding which of the aforementioned trade-offs to take on. For instance, the area available for the project (wind farm development) and the chosen site (onshore or offshore) are important factors. Offshore projects often require the use of larger wind turbines, whereas onshore wind farms may have a very limited area for the project and, therefore, there are often limited choices.

3.2. Capacity Factor Estimation

The capacity factor of a wind turbine “is the ratio of the average output power of the turbine over a period of time to its potential output if it had operated at rated capacity the entire time” [26,27,28,29,30,31]. This is mathematically expressed as:

C F = \frac{P_{a v g}}{P_{r}}

(4)

where

P_{a v g}

and

P_{r}

denote the average and the rated power, respectively. In such studies, the “wind speed” is often estimated by employing the Weibull distribution. Such information, which requires the wind turbine power curve, is essential for various purposes such as the “sizing and cost optimization studies”, “optimum turbine-site matching”, and “ranking of potential sites”, among others.

3.3. Wind Energy Assessment and Forecasting

The identification of effective and suitable areas for wind energy development requires the assessment of wind resources such as “wind speed” and “active power”, including the forecasting of output power around the prospective area. In particular, accurate power forecasting is essential to overcome issues such as underestimation and overestimation; such issues can affect the reliability of power delivery. The current and ongoing increase of wind energy penetration in the “power grid systems” have highlighted the issues caused by the volatile nature of “wind speed”, and the challenges it imposes in the energy management systems (EMS) in terms of scheduling and dispatching [32]. In recent years, this problem has drawn attention in the wind energy industry and in academia, and, as a result, a considerable extent of research efforts have been made in this regard. State of the art methods, which are broadly explored in Section 6, have been employed for modeling in order to enable an accurate power forecasting. Note that power production often relies on the health state of the wind turbine, which will deteriorate over time, and, thus, service and/or upgrades are often required. The effectiveness of upgrades can be assessed through the power curve, as illustrated in Figure 3. Note that a deteriorating wind turbine will produce power below the expected margin, whereas the upgrades will enable the wind turbine effectiveness to either be restored as intended by the manufacturer or surpass its initial potential, depending on the quality of the exercised upgrades [18].

3.4. Condition Monitoring

Wind power is on a strong growth path in Europe and around the world. Operation and maintenance costs still contribute about 25–30% of the “levelized cost of electricity” (LCOE) in on- and offshore wind turbines [1]. Therefore, many operators wish to further cut the maintenance costs and increase the uptime of their wind farms [33,34,35]. Advanced wind farms are continuously and remotely monitored to further reduce the maintenance costs by detecting operation faults and developing damages early, thereby enabling an early response and informed decision making. Automated condition monitoring, including diagnostics and prognostics, is the prerequisite for implementing effective condition-based maintenance strategies.

Modern wind turbines may be equipped with several hundreds of sensors that monitor the turbine subsystems and the environmental conditions. The sensors can acquire hundreds of gigabytes of condition data every day. Those data typically contain information about power production, thermal and electromechanical state variables, vibration responses, oil quality, and ambient conditions [36]. In addition to the dedicated sensing systems, comprehensive data from the “supervisory control and data acquisition” (SCADA) system are usually available to complement the monitoring process. SCADA data have been proposed for fault detection tasks to act as inexpensive proxies for dedicated sensing and fault detection systems [37,38,39,40,41]. Such data typically provide the active power generation and rotor speed, along with components’ temperatures, ambient conditions, and other variables at 10 min mean values. SCADA systems usually log the temperatures of critical drivetrain elements such as gearbox bearings. This is because damage processes are often associated with excessive component temperatures that may originate from, for example, abnormal friction or undesired electrical currents. Thus, SCADA-based condition monitoring often involves monitoring of component temperatures. That said, SCADA data are also used to monitor the active power generation of wind turbines [42].

Several approaches for fault detection established through SCADA data have been proposed, including normal behavior modeling and the analysis of trends, clusters and log data [41]. Normal behavior modeling is among the most relevant approaches in practical application in condition monitoring centers. Normal behavior models (NBMs) characterize the normal operation of a subsystem as expected under fault-free operation conditions. Significant deviations from the expected normal operation may indicate developing fault conditions. Normal behavior modeling based on wind farm SCADA has been reviewed by [41]. NBMs of the power generation have been presented by multiple studies including, for example, [4,13,14,38,40,43,44,45,46,47]. In their simplest form, NBMs of the power generation

P

correspond to nonlinear functions

P = f (v)

of the wind speed

v

. Those functions are called power curves. Turbine specific power curves can be estimated with statistical or machine learning regression algorithms from historical SCADA data of the 10 min mean power generation

P

and wind velocity

v

as measured by a nacelle anemometer [48,49]. The power generation usually depends on variables other than wind speed, for example, on air density

ρ

and wind direction

α

at the turbine. Therefore, more complex power curves can be expressed as functions

P = f (v, α, ρ)

that can be empirically estimated from the available condition monitoring and SCADA data from each turbine’s past operation. Moreover, power generation can also be estimated as part of multiple target/output variables in a multi-target model [47]. For example, if it is desired to monitor the gearbox and the generator bearing temperatures besides power, then an NBM

f

can be trained, e.g., based on the “wind speed, direction and air density measurements”, where

[P, T_{g e a r}, T_{g e n e r a t o r}] = f (v, α, ρ)

. One of the main advantages of the resulting NBMs (power curves) is that they can be estimated based on SCADA data from the past operation of the monitored turbine, which enables an accurate estimation that accounts for turbine-specific conditions, such as wake effects from neighboring turbines. More accurate models reduce the delay involved in detecting a fault and can therefore enable an earlier detection of developing faults [50]. The resulting empirical regression models are in use in wind farm condition monitoring centers to detect any underperformance or power-related faults at an early stage.

In particular, power curve based condition monitoring is essential for wind farm operators. In fact, anomalies and fault signatures detectable via the power curve, can play a significant role in early detection as the “first indication” that something is wrong [44,51,52]. When applied in the context of NBMs as describe above, such a model is capable of detecting any unusual operations by performing calculations with regards to the difference between the actual and the predicted values, also known as residuals [18,21,47,53], which can be described as:

R_{e} = y_{i} - {\hat{y}}_{i}

(5)

where

y_{i}

and

{\hat{y}}_{i}

denote the actual and the predicted values, respectively. Subsequently, a “control chart” is employed for detection of abnormal behavior. For instance, as shown in Figure 4, in general, a typical common rule is applied as follows:

If the attained residuals are continuously fluctuating between the established threshold limits, then the wind turbine is considered to be functioning according to the norm;
Else if the attained residuals rise or descend beyond the established threshold limits, the wind turbine is considered to be experiencing an unusual event. In such cases, the moment when the residual crosses the threshold limit, may be considered as the first detection time.

Condition monitoring can be performed in two ways [54] namely, online and offline.

Online Condition Monitoring: Online condition monitoring refers to real-time inspections. In this approach, the wind turbine is continuously under observation and often involves automatic systems.
Offline Condition Monitoring: Offline condition monitoring refers to periodic inspections. In this approach, the wind turbine is required to be “shut down”, and often requires operator’s intervention.

According to the reviewed literature, power curve based condition monitoring plays a significant role in wind farm operations and maintenance, as it enables an overall assessment of the wind turbine.

4. Anomaly and Fault Signatures

4.1. Indications of Suboptimal Performance

In general, historical data collected from wind farm’s SCADA system usually contain large numbers of abnormal data, which are caused by anomalies or faults. There are numerous causes that could result in power curve data deviation. Depending on the root cause, the power curve exhibits a statistically different shape compared to the norm. In most cases, anomalies are mainly caused by sensor accuracy degradation or malfunction, communication equipment error, environmental condition, or an imposed control action, whereas faults are mainly caused by blade, yaw, or pitch system faults [55].

4.2. Most Common Anomaly and Fault Types

Based on the reviewed literature and according to the statistical characteristics shown in Figure 5, the most common power curve based anomaly and fault types are described in Table 2, and further illustrated in Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10, including normal power curve in Figure 11.

In addition to the types outlined in Table 2, there are two other types of anomalies and faults which are not often discussed. They include:

Type 6: Negative power curve values. Such events are often identifiable through the observed data points below zero.

Root cause: From the “cut-in speed”, the wind turbine may be required to “draw power from the electrical grid to start up or maintain rotation during short lulls, which can result into negative power production” [55,56].

Type 7: Overrating. Such situations are true in the presence of data points found above the rated power.

Root cause(s): There may be a variety of reasons that may trigger such events, including “wind speed sensor malfunction/failure”, or “control system issues”. However, to the best of authors’ knowledge, the root causes of this particular even have not been broadly documented.

According to [55], the following three modes of fault derive a subdivision of most possible faults causing “suboptimal performance” in wind turbines:

Blade Fault: Rotor blade surface degradation results in a reduced aerodynamic efficiency, which will cause reduction in power generation of the wind turbine.
Yaw System Fault: In general, the “yaw system” aims to maintain the wind turbine pointing towards the wind direction properly. Misalignments may result in “lowered airflow” through the wind turbine, and hence, lower power generation. This may take place across the “wind speed” range.
Pitch System Fault: For pitch regulated wind turbines, and below the “rated wind speed”, the blades generally pitch towards the angle that enables a greater aerodynamic efficiency. Considering “wind speed” above the “rated speed”, the blades pitch to mitigate the fraction of power transferred from the wind and to maintain the rated power. Faulty pitch mechanisms are probably to appear in a greater variability at all “wind speeds” or can result to over-production or under-production of power at higher “wind speeds”.

It should be noted that the variety of fault and failure modes in wind turbines are not restricted to the aforementioned three modes. For more detailed information on wind turbine faults and failure modes, interested readers are referred to [54,57,58]. Also, several research papers have specifically addressed different types of anomalies and faults on wind turbine power curve. Brief descriptions of these papers are provided in Table 3.

5. Data Preprocessing and Correction

As introduced in the previous sections, there are various issues that may trigger variations in the wind turbine power curve data. Such issues may involve damaged power measurement instrument, communication equipment failure, power curtailment, sensor failure, turbulence by wake effects or loading ramp, sensor accuracy degradation, instrument faults, noise in the processing systems, and environmental conditions, among others. As a result, a direct use of the obtained data without prior preprocessing and corrections may not be appropriate for the power curve modeling purpose. Indeed, data correction prior to modeling is vital to ensure the integrity of a future model. Note that if the collected data is used directly without correction, the obtained model will represent a power curve with distorted statistical characteristics [60]. Therefore, several approaches such as filtering (e.g., “moving mean”, and “locally weighted regression” (LOESS)), clustering (e.g., “k-means clustering”, “density-based spatial clustering of applications with noise” (DBSCAN), and “self-organizing map” (SOM)), and isolation (e.g., “isolation forest” (iForest), and “local outlier factor” (LOF)), have been explored in the literature for the purpose of mitigate the impact of “abnormal data” in the subsequent modeling phase. This section is dedicated to investigating data preprocessing and correction schemes employed in the process of power curve modeling.

5.1. Filtering Approach

Given a dataset with minor or local anomalies, a filtering method can help to reduce the impact of abnormal data in the subsequent modeling phase. There are many filtering methods, however, the most widely used methods are namely, the statistical “moving mean” and “locally weighted regression” (LOESS).

(a): Moving Mean: Extracting a statistical moving mean feature from both signals that essentially constitute the power curve (i.e., “wind speed” and “active power”) can often help to reduced minor stochastic effects found in the dataset. The “moving mean” feature can be obtained by the following mathematical description:

\bar{x} = \frac{x_{i} + x_{i - 1} + \dots + x_{i - (n + 1)}}{n}, i = n, n + 1, \dots, l

(6)

where

n

denotes the size of the moving window,

x_{i}

represents the

i

th value at time

i

, whereas

l

designates the length of the considered signal. This approach (moving mean) has been used in [70,71] to reduce the stochastic effects in SCADA data.

(b): Locally Weighted Regression (LOESS): Essentially, LOESS is a smoothing algorithm. In LOESS, every smoothened value is determined using the neighboring data points within a given span. The regression weight for each data point in the span is calculated using the following expression:

w_{i} = {(1 - {| \frac{x - x_{i}}{d (x)} |}^{3})}^{3}

(7)

where

x

denotes the predictor sample related to the sample to be “smoothened”,

x_{i}

represents the closest neighbors of

x

in the “span”, and

d (x)

denotes the “horizontal distance” between

x

and the furthest predictor sample in the span. LOESS employs “locally quadratic regression” as a “weighted linear least-squares regression”, determining the smoothened sample through “weighted regression” at the predictor sample. This approach (LOESS) was used in [70] to attain a robust power curve reference and in [72] for smoothing purpose.

Remarks (on filtering approach): It is important to mention that although filtering approaches are generally simple and effective, the moving window value (in case of moving mean) and the span value (in case of LOESS) are critical parameters to take into consideration. Large values for such parameters may result in loss of important information, whereas small values may not be as effective. Therefore, finding an optimal value is essential for an effective implementation of a typical filtering approach.

5.2. Clustering Approach

One of the most widely used approaches to detect and remove abnormal data is through clustering. In most cases, the employed clustering approaches are typically unsupervised, and, thus, the algorithm is able to investigate and identify rare patterns in the dataset, and subsequently perform an appropriate labeling. The most frequently used algorithms are namely, the “k-means clustering”, “density-based spatial clustering of applications with noise” (DBSCAN), and “self-organizing map” (SOM).

(a): K-Means Clustering: The “k-means clustering” classifies data by separating samples in $k$ clusters of equal variance by applying minimization through a criterion referred to as the “inertia” or “within-cluster sum-of-squares” (WCSS). The number of clusters, however, needs to be specified. In general, it performs well in a large size dataset, and it has been adopted in various application domains. In particular, the “k-means clustering” tends to divide a set of $n$ samples $x$ into $k$ disjoint clusters $C$ , each described by the mean $μ_{j}$ of the samples in the cluster. The means of each cluster are generally considered as the centroids of the clusters. The algorithm usually chooses centroids that minimize the WCSS based on the expression:

W C S S = \sum_{i = 0}^{n} \min_{μ_{j} \in C} (|| x_{i} - μ_{j} {||}^{2})

(8)

Note that this criteria may be considered as “a measure of how internally coherent clusters are”. For instance, this method (k-means clustering) was used in [61,73] for data correction purpose.

(b): Density-Based Spatial Clustering of Applications with Noise (DBSCAN): The “DBSCAN” regards clusters as regions of higher density distinct from those with lower density. In this particular approach, which is rather “generic view”, the considered clusters may take on any shapes. In DBSCAN, the “core samples points” are the most essential, which are mainly samples located in the regions of “high density”. Essentially, a considered cluster is, thus, a set of “core samples points” located near each other (calculated based on a “distance metrics”, such as “Eucledian”) and a set of “non-core samples points” that are near to a “core sample point”. The parameters namely, the radius and the minimum sample points, need to be specified in order to define the density. The procedure is shown in Algorithm 1. For instance, this approach (DBSCAN) has been used in [21,74,75] for data correction purpose.

Algorithm 1: Pseudocode of the original DBSCAN.

Input: DB (database), ε (radius), dist(distance function), and minPts(density threshold)

Output: labels

(c): Self-Organizing Map (SOM): The so-called SOM is generally an “unsupervised technique” which is usually employed to produce a low-dimensional representation of a higher dimensional dataset while preserving the topological structure of the data. It is widely used for clustering and data dimensionality reduction. The SOM is a typical artificial neural network composed of input layer, output layer, and connection weights. It is trained in an iterative process including competition and convergence. In the $t$ th iterative step, the SOM finds the winner in the competition, which is the closest neuron $c$ to the input sample $x (t)$ . Subsequently, the convergence procedure leads the SOM model adjusting towards the expected order by updating the weight vectors based on the neighborhood relationships with the winner neuron. The neighborhood function that determines the neighbor update scheme for the topology-preserving nature of SOM is usually in the form of Gaussian function:

h_{c i} (t) = α (t) e x p (\frac{- s q d i s t (c, i)}{2 σ^{2} (t)})

(9)

where

α (t)

is the learning rate which monotonically decreases with step

t

,

s q d i s t (c, i)

represents the square of distance between neuron

c

and neuron

i

on the plane grid map, whereas

σ (t)

denotes the kernel radius that determines the range of the neighborhood relationships. During the training process, the weights adjust according to input until maximum iteration is reached. This approach (SOM), for instance, was used in [76] to minimize the squared error generated by local interpolation applicable to datasets with outliers, in [77] it was used for imaging wind turbine fault signatures based on power curve for image-based fault diagnosis, and in [78] due to its ability to cluster data in an unsupervised manner.

Remarks (on clustering approach): According to the reviewed literature, “DBSCAN” has been the most suitable algorithm for clustering task when compared with “k-means clustering”. For instance, the number of clusters in “DBSCAN” is detected according to the algorithm’s principles, whereas the number of clusters in “k-means clustering” algorithm is specified by the operator, and, thus, it can be challenging to attain an optimal number of clusters using the “k-means clustering”. With respect to “SOMs”, they have not yet been widely explored on power curve based data, however, the results of implementations in [76,77,78] are appealing.

5.3. Isolation Approach

Isolation approaches, also referred to as anomaly detection algorithms, are typically unsupervised; essentially, such algorithms aim at isolating the anomalous data from the normal data. According to the reviewed literature, the most frequently used methods are namely, “isolation forest” (iForest) and “local outlier factor” (LOF).

(a): Isolation Forest (iForest): The iForest is an ensemble of trees; it uses an unsupervised learning approach to detect unusual data points which can subsequently be removed from the training data. Essentially, it performs isolation by a random selection of a feature and subsequently random selection of a split sample between the maximum and minimum samples of the chosen feature. Given the fact that “recursive partitioning” may be depicted by a tree structure, the splitting number needed to perform isolation of a sample is correspondent to the path length from the “root node” to the “terminating node”. This “path length”, averaged over a forest of such random trees, is eventually a “measure of normality and decision function”. Randomly partitioning results in observable short paths for anomalies. Therefore, when a forest of random trees entirely generates shorter-length path for specific samples, it most probably indicates anomalies. iForest derives the “anomaly score” for sample $x$ from its “averaged path length”, $h (x)$ . The so-called “anomaly score” for sample $x$ , in the presence of a set of $n$ samples, may be expressed as:

s (x, n) = 2 - \frac{E (h (x))}{c (n)}

(10)

where

c (n)

indicates the mean of

h (x)

considering

n

, and

E (h (x))

denotes the “average path length” of

x

across all the trees. If

s

is close to 1, it is an indication that the instance is likely to be an anomaly, and if less than 0.5, the instance is likely not an anomaly. For instance, this approach (iForest) was used in [69,79] to perform anomaly detection and removal.

(b): Local Outlier Factor (LOF): The LOF is essentially a “density-based” technique, which is concerned with assigning a degree of outlier-ness to an instance. As the name suggests, the anomaly score of each sample is called the Local Outlier Factor. It measures the local deviation of the density of a given sample with respect to its neighbors. It is local in a way that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by k-nearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers. The LOF method requires specification of the number of nearest neighbors (minimum points). Subsequently, the LOF of a point $x$ is given by the following expression:

{LOF}_{m i n P t s} (x) = \frac{\sum_{0 \in N_{m i n P t s} (x)} \frac{l d r_{m i n P t s} (0)}{l r d_{m i n P t s} (x)}}{| N_{m i n P t s} (x) |}

(11)

where

m i n P t s

denotes the number of nearest neighbors (minimum points). The LOF of an instance is the ratio of that instance’s local reach density,

l r d

, to that of the average

l r d

of its

m i n P t s

neighbors. For instance, authors in [69,80] used this approach (LOF) for anomaly detection and removal.

Remarks (on isolation approach): According to the reviewed literature, in particular, based on the analysis and comparison of both iForest and LOF provided in [69], it was concluded that, the LOF appears to be conservative and only eliminated the most isolated instances, whereas the iForest appears to deal well with only a specific type of anomaly.

5.4. Other Approaches

Other approaches found in the currently available literature are based on a combination of the aforementioned methods. Although most of the aforementioned algorithms can attain useful results in terms of mitigating the impact of abnormal data and detecting anomalous and faulty data and removal, the main challenge is defining their optimal parameters in order to attain the best result. The complex nonlinear shape of the power curve itself, often makes it difficult to achieve a perfect result. Therefore, in order to enhance the data integrity, it is often required to employ a combination of different algorithms to tackle different problems. An example of a procedure that combines different methods to attain an optimal result is found in [21], where a four-step method as illustrated in Figure 12 is introduced as follows: (i) Anomaly Detection: Firstly, the clustering algorithm DBSCAN is used in order to mitigate anomalies in the dataset; (ii) Robust Power Curve: Secondly, the LOESS is used to attain a robust power curve; (iii) Upper-Lower Limits: Thirdly, the robust power curve is used to derive an “upper-lower bound” envelop; (iv) Optimal Power Curve Data: Lastly, the dataset within the “upper-lower bound” limits is extracted as the optimal power curve data required for accurate power curve modeling. Several research articles have proposed different signal processing schemes for data correction, by using the aforementioned methods, a combination, or a totally different approach. The reviewed data preprocessing and correction schemes are shown in Figure 13; also, additional details with regards to advantages and limitations of the reviewed algorithms are provided in Table 4, and a brief summary of the most relevant research articles addressing this particular problem is given in Table 5.

6. Modeling Techniques

6.1. State of the Art Methods

State of the art techniques for “wind turbine power curve modeling” are mainly classified in the following categories: (i) parametric algorithms and (ii) non-parametric algorithms. This subsection provides a brief review of the aforementioned techniques, followed by the modelling performance metrics.

6.1.1. Parametric Algorithms

The “wind turbine power curve” can be approximated by a parametric algorithm. According to the reviewed literature, the most frequently used methods include the linear, quadratic, and cubic methods, along with the logistic function.

The linear, quadratic, and cubic methods are basically a set of polynomial expressions. They are essentially a “natural extension of the linear regression”, expressed as:

p_{i} = β_{0} + β_{1} v_{i} + ε_{i}, i = 1, \dots, n

(12)

where the “linear dependency” of

p

to

v

is substituted by a “polynomial function”, expressed as:

p_{i} = β_{0} + β_{1} v_{i} + β_{2} v_{i}^{2} + \dots + β_{k} v_{i}^{k} + ε_{i}, i = 1, \dots, n

(13)

where

ε_{i}

is regarded as a “sequence of independent and identically distributed random variables with zero mean and finite variance”. Moreover,

n

represents the number of sample points,

β_{j}, j = 0, \dots, k

denote the “unknown parameters”, and

k

indicate the “degree of the polynomial regression”.

(a): Linear: The linear model approximates the power curve with a first-degree polynomial. It is indeed the simplest approach to approximate the power curve by a straight line.
(b): Quadratic: The quadratic model approximates the power curve with a second-degree polynomial. In this model, the power curve can be approximated by a slightly curved line.
(c): Cubic: The cubic model approximates the power curve with a third-degree polynomial. In this model, the power curve can be approximated by a further curved line.

The aforementioned polynomial expressions were explored in [4,92,93,94,95,96], for power curve modeling purpose.

(d): Logistic Function: Since the logistic curve is to a certain extent similar to the power curve shape, it can be used to approximate the power curve. For instance, the four-parameters logistic function, can be described as:

P (u / θ) = a \frac{1 + m . e x p (- u / τ)}{1 + n . e x p (- u / τ)}

(14)

where

P

denotes the power generated,

u

denotes the wind speed and

θ = (a, m, n, τ)

represents the vector parameters that determines its shape. This approach (logistic function) was used in [14,38,93,94,97,98] to model the power curve of wind turbine.

Remarks (on parametric algorithms): “Parametric models” generally define the relationship between the input and the output based on a “mathematical expression” along with a fixed number of parameters. According to the reviewed literature, with regards to the polynomial-based equations (linear, quadratic, and cubic), the linear method seems to be the least accurate since the power curve itself is not entirely linear. The attained results based on the logistic function, in particular in [38,97,98], seem to be appealing.

6.1.2. Non-Parametric Algorithms

The wind turbine power curve can be approximated by a non-parametric approach. According to the reviewed literature, most frequently used methods include the “k-nearest neighbor” (KNN), “decision tree regression” (DTR), “random forest regression” (RFR), “support vector regression” (SVR), “artificial neural network” (ANN), “Copulas”, “Gaussian process” (GP), “Markov process” (MP), the fuzzy-based algorithms such as “clustering center fuzzy logic” (CCFL) and “adaptive neuro-fuzzy interference system” (ANFIS), and “ensemble learning”, among other algorithms.

(a): K-Nearest Neighbor (KNN): The “KNN” model is the simplest non-parametric model, which has found success in many applications, including wind turbine power curve modeling. The fundamental principles of the “nearest neighbor” approach is to find a predetermined number of learning samples near (in terms of distance) to the considered sample, and subsequently estimate the label. It is often necessary to define the “number of samples”, which can be a “user-defined” constant “k-nearest neighbor learning” or alter with regards to the local density of points “radius-based neighbor learning”. In general, the distance metric measure is the standard Euclidean distance. According to the reviewed literature, this approach (KNN) is one of the most widely used techniques due to its simplicity. It was employed in [38,44,99,100] for power curve modeling.

(b): Decision Trees Regression (DTR): The “DTR” is essentially a “supervised learning” method which can be used for regression problems such as power curve modeling. It tends to predict the values of a target variable by learning fundamental “decision rules” deduced from the dataset. A so-called “tree” may be considered as a “piecewise constant approximation”. Considering the training data, a “decision tree” employs a recursive partition of the feature space in a way that samples with equivalent targets are gathered. The “quality of a candidate split” of node $m$ is calculated through a so-called “impurity function” or “loss function” $H (\cdot)$ ,

G (Q_{m}, θ) = \frac{N_{m}^{l e f t}}{N_{m}} H (Q_{m}^{l e f t} (θ)) + \frac{N_{m}^{r i g h t}}{N_{m}} H (Q_{m}^{r i g h t} (θ))

(15)

where

Q_{m}^{l e f t} (θ)

and

Q_{m}^{r i g h t} (θ)

are the partition subsets. A regression criteria is then defined. In general, its extended version, namely, random forest regressor (RFR), is more widely used.

(c): Random Forest Regression (RFR): The “RFR” is essentially a “meta estimator” which fits several classifying “decision trees” on several subsets of the dataset and employs averaging to enhance the “predictive accuracy” and restrain over-fitting. It is often necessary to specify the number of estimators (trees in the forest) along with the criterion (the function to measure the quality of split), and the maximum depth of the tree. This approach (RFR) was used in [100,101].

(d): Support Vector Regression (SVR): The “SVR” is a model from the family of the “support vector machine” (SVM) used for regression tasks. It is appropriate for modeling from a small-size dataset owing to its powerful ability for generalization. It is often necessary to specify the “kernel” type to be used, for instance, “linear”, “polynomial”, “radial basic function” (RBF), or “sigmoid”. The free parameters for such a model include the “regularization parameter” and “epsilon” which essentially specify the “epsilon-tube” where no “penalty” is syndicated in the learning “loss function” with samples estimated within a distance epsilon from the actual sample. The approximation function is expressed as:

f (x) = \sum_{i = 1}^{l} (- α_{i} + α_{i}^{*}) K (x_{i}, x) + b

(16)

where

α_{i}

and

α_{i}^{*}

represent the Lagrange multipliers,

K (x_{i}, x)

denotes the kernel function, and

b

denotes bias. This approach (SVR) was used in [101,102].

(e): Artificial Neural Network (ANN): The “ANN” is generally comprised of four different parts, namely, the “input layer”, “hidden layers”, “activation function” and “output layer”. The input layer receives the data and transfers to the hidden layers where the information is transformed into higher representation through a nonlinear transformation expressed as $h_{i} = σ (w_{i} x + b_{i})$ , where $x$ and $h_{i}$ denote the vectors of input and hidden representations, respectively, and $w_{i}$ and $b_{i}$ represent the weight matrices and bias vectors, respectively. Furthermore, $σ$ represents the nonlinear activation function; the softmax function calculates the output expressed as:

Y_{j} = \frac{e x p (h_{s, j})}{{\sum^{}}_{j = 1}^{n_{h_{s}}} e x p (h_{s, j})}

(17)

where

h_{s}

represents the output of the last “hidden layer”. This approach (ANN) was used in [44,103].

(f): Copula: The copula model is a probabilistic approach for modeling the wind turbine power curve. Its representation of a power curve may be considered if the power curve is regarded a “bivariate joint distribution”. The function to estimate the copula can be expressed as:

C (u, v) = H (F_{x}^{- 1} (u), F_{y}^{- 1} (v))

(18)

where

F_{x}

and

F_{y}

denote the marginal distributions and

H

denotes the full bivariate distribution. A delicate elaboration of this approach can be found in [55,78,104,105].

(g): Gaussian Process (GP): The GP model is a “Bayesian non-linear regression” approach widely used to deal with probabilistic regression problems. This approach has also been explored to model the wind turbine power curve. The GP is completely specified by its mean value $m (x)$ and covariance function $k (x, x^{'})$ which can be expressed as:

f (x) ~ G P (m (x), k (x, x^{'}))

(19)

A delicate explanation of this approach (GP) can be found in [101,106,107,108,109,110,111,112]. According to the reviewed literature, the GP is one of the most widely used approaches for wind turbine power curve modeling.

(h): Markov Process (MP): The “MP” is a stochastic model where its condition property demands that the dynamic of the process has no memory, and thus, is solely based on the previous event, and not the entire history. This can be described in terms of its condition probability density function (PDF), expressed as:

W = (P_{n}, t_{n} | P_{n - 1,} t_{n - 1}; P_{n - 2}, t_{n - 2}; \dots; P_{1}, t_{1}) = W (P_{n}, t_{n} | P_{n - 1}, t_{n - 1})

(20)

where

P_{n}

denotes the expected power at

t_{n}

. Note that in this description, the “left-hand side” expresses the “PDF” at time

t_{n}

under the condition that the “stochastic variable” at the time

t_{n - 1} < t_{n}

was in the state

P_{n - 1}

; at the time

t_{n - 2} < t_{n - 1}

was in the state

P_{n - 2}

, and so on. Further explanation and detailed analysis with regards to this approach (MP) can be found in [113] where it was employed for this particular purpose of modeling the power curve.

(i): Clustering Center Fuzzy Logic (CCFL): The “CCFL” is typically a “fuzzy logic-based” model which has also been explored for modeling the power curve of wind turbine. In this application, datasets are initially clustered and “center of clusters” are established; these “center of clusters” are subsequently employed to represent the power curve of the wind turbine. This approach was used in [44,114,115], where in the prior, the authors found out that, four or five “cluster centers” were sufficient in representing the power curve of wind turbine.
(j): Adaptive Neuro-Fuzzy Interference System (ANFIS): Similar to the “CCFL”, the “ANFIS” is a “fuzzy logic-based” model, which contains a typical fuzzy inference system structure, membership functions, and a set of rules; the approach requires fewer parameters which generally results into a faster training. This approach was used in [44] for power curve modeling purpose.
(k): Ensemble Learning: An ensemble model is an approach of combining different models to improve the overall accuracy to a certain extent which cannot be attained solely by an individual (single) model. Such methods are becoming more and more popular in recent years. An example of such a model can be mathematically expressed as:

v (x) = v^{'} (v_{1} (x), v_{2} (x), \dots, v_{n} (x))

(21)

where

v (x)

is the optimal output of different models constituents. The main advantage of such an ensemble model is that different mechanisms can be employed simultaneously, in order to solve the same problem; hence, optimal solution can be achieved. Examples of such an approach can be found in [21,66], where typical ensemble methods were used for modeling and condition monitoring of wind turbines.

Remarks (on non-parametric algorithms): According to the reviewed literature, “non-parametric” methods are suitable for deriving power curve from the collected and acquired historical data, after the required data preprocessing and correction. Moreover, “non-parametric” algorithms are flexible as they enable incorporation of other effective wind turbine condition parameters other than the “wind speed” and “active power”. With regards to the reviewed algorithms, among the “machine learning-based” algorithms (i.e., KNN, RFR, and SVR), the KNN seems to be the most popular due to its simplicity. The results attained by the “deep learning-based” algorithms (i.e., ANN) and “probability-based” algorithms (Gaussian process and Copula) are appealing. “Ensemble learning-based” algorithms are also gaining attention in recent years, since they are capable of enhancing model’s performance by employing multiple individual algorithms simultaneously to function as one. The Markov process and “fuzzy-based” algorithms (i.e., CCFL and ANFIS) seem to be the least used algorithms, although the reported results are just as appealing.

6.1.3. Other Algorithms

Besides the aforementioned “parametric” and “non-parametric” algorithms for power curve modeling, there are several other algorithms which have been widely explored in the currently available literature; the referred algorithms are namely, the “method of binning”, “Weibull”, “maximum likelihood estimation” (MLE), and “Monte Carlo simulation” (MCS).

(a): Method of Binning: The binning method has been already introduced in Section 2.1. It was adopted as the standard by the “international electrotechnical commission” (IEC), and thus, widely used in wind energy technology, for instance by the wind turbine manufacturers for wind turbine certification, and also in academic research. Examples can be found in [100] and in particular in [116] where the power curve reference was attained through the “method of binning”.
(b): Weibull: The “Weibull” is often employed in a typical “probabilistic” power curve estimation model. Assuming that the “wind speed” variable in the dataset follows the “Weibull distribution” with two parameters, the “probability density function” (PDF) for the “wind speed”, $u$ , is mathematically described as:

f (u / β, η) = \frac{β}{η} {(\frac{u}{η})}^{β - 1} e x p [- {(\frac{u}{η})}^{β}]

(22)

where

β

is the “shape parameter”, and

η

is the “scale parameter”. The most likely PDF for the “wind speed” dataset may be found by estimation of the values of “shape and scale parameters” using “maximum likelihood method”. A delicate elaboration of this approach can be found in [31,95,97,117].

(c): Maximum Likelihood Estimation (MLE): The “MLE” is often used for determination of parameters of power curve models. For instance, in [97] the equations for the “scale and shape parameters” in the “Weibull” wind speed distribution were obtained by maximizing likelihood function, $\ln [L (u)]$ :

\ln L (u / β, η) = N \ln (β / η) + \sum_{i = 1}^{N} \ln {(\frac{u_{i}}{η})}^{β - 1} - \sum_{i = 1}^{N} {(\frac{u_{i}}{η})}^{β}

(23)

with respect to

β

and

η

, the “MLE” estimates for the most likely “PDF” for the “wind speed” obtained by

η = {(\frac{1}{N} \sum_{i = 1}^{N} {(u_{i})}^{β})}^{\frac{1}{β}}

and

\frac{1}{β} = [\frac{\sum_{i = 1}^{N} {(u_{i})}^{β} \ln (u_{i})}{\sum_{i = 1}^{N} {(u_{i})}^{β}} - \frac{1}{N} \sum_{i = 1}^{N} \ln (u_{i})]

. This approach (MLE) was used in [38,97].

(d): Monte Carlo Simulation (MCS): The “MCS” is a probabilistic technique capable of modeling a system under uncertainty. The “Monte Carlo simulation” relies on historical “wind speed” data from wind farm sites. This approach has been used in [117,118]. In particular, it was used in [118] to make a simulation of data in order to complete insufficient real-world dataset and in order to perform analysis in terms of “long-term assessment”.

Remarks (on other algorithms): The aforementioned algorithms have been widely used in the wind energy industry, especially in early research articles. As already mentioned, the method of binning has been established as the standard approach (per IEC standards), whereas the Weibull, MLE, and MCS, all have played a huge role when lacking real data. These methods have been explored massively for power curve modeling.

In summary, the reviewed modeling techniques are shown in Figure 14 according to their respective categories. Furthermore, the advantages and limitations along with the most essential parameters of the reviewed methods are given in Table 6. Lastly, a brief summary of the most relevant research articles addressing this particular problem (power curve modeling) are given in Table 7. With that said, in the next section (see Section 6.2), the most widely used “performance metrics” for modeling the wind turbine power curve are discussed.

6.2. Performance Metrics

After building the power curve model, either based on parametric, non-parametric, or other type of algorithms, it is necessary to ensure whether the model is capable of appropriately representing the behavior of the normal power curve. Therefore, performance metrics are usually employed to measure model accuracy, in order to enable appropriate model selection. There are various methods that can be used for the aforementioned purpose. However, according to the reviewed literature, the most widely used performance metrics are namely, the “mean absolute error” (MAE), “root mean squared error” (RMSE), “mean absolute percentage error” (MAPE) and “R-Squared Score”

(R^{2})

. The mathematical expressions of the aforementioned performance metrics are presented in Table 8.

(a): Mean Absolute Error (MAE): The MAE computes “mean absolute error”, which is a risk metric corresponding to the expected value of the absolute error loss. Assuming that $y_{i}$ is the actual value and ${\hat{y}}_{i}$ is the predicted value of the $i$ th sample, and $n$ denotes the number of samples, then the MAE is defined as in Table 8.
(b): Root Mean Squared Error (RMSE): The RMSE is an extension of “mean squared error” (MSE), as the squared root of the error is calculated. MSE computes mean square error, which is a risk metric corresponding to the expected value of the squared (quadratic) error or loss. Assuming that $y_{i}$ is the actual value and ${\hat{y}}_{i}$ is the predicted value of the $i$ th sample, and $n$ denotes the number of samples, then the MSE estimated over $n$ samples is defined as $M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}$ , hence, the RMSE is defined as in Table 8.
(c): Mean Absolute Percentage Error (MAPE): The MAPE is also known as the “mean absolute percentage deviation” (MAPD). The idea of this metric is to be sensitive to relative errors; it for example does not change by a global scaling of the target variable. Assuming that $y_{i}$ is the actual value and ${\hat{y}}_{i}$ is the predicted value of the $i$ th sample, and $n$ denotes the number of samples, then the MAPE is defined as in Table 8.
(d): R-Squared $(R^{2})$ Score: The R-squared performs the coefficient of determination. It provides an indication of goodness of fit, and therefore, a measure of how well unseen samples are likely to be predicted by the model through the proportion of explained variance. Assuming that $y_{i}$ is the actual value and ${\hat{y}}_{i}$ is the predicted value of the $i$ th sample, $\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}$ and $n$ denotes the number of samples; then, $R^{2}$ is defined as in Table 8.

7. Overall Assessment: Past, Present, and Future

Based on the reviewed literature, this section presents a summary of the past and present, and insights into the future of this active field (See Figure 15 and Table 9). It is important to mention that the timeline and events only reflect the research articles reviewed in this paper and not necessarily that of the entire wind energy field. With that said, we may expect SCADA data to continue play a huge role, not only at the research level, but also in industry. Models’ robustness, accuracy, and efficiency are also expected to be improved.

8. Discussion and Prospects

This paper presents an extensive study on wind turbine power curves. With regards to wind turbine configurations, it should be mentioned that although vertical axis wind turbines have also gained attention in recent years and claim their relevance in the market, in particular with a recently renewed interest for use in offshore, most of the papers reviewed in this research article refer mainly to horizontal axis wind turbines, hence the scope of this paper; the main reasons include the fact that, vertical axis wind turbines are considered less efficient than conventional horizontal axis wind turbines and none of its variant designs have been launched with great commercial success [129]. For a better understanding on what the aforementioned terms entail, Figure 16 shows a simplified illustration of both configurations.

The applications and modeling techniques of wind turbine power curves for wind farms have been reviewed exhaustively. The drawn inferences, deficiencies, and general suggestions are described as follows:

With regards to applications, several key applications of wind turbine power curves, including wind turbine selection, capacity factor estimation, wind energy assessment and forecasting, and condition monitoring, have been discussed in Section 3. According to the reviewed literature, the power curve plays a critical role in wind farm technology, even from the beginnings, when deciding and selecting the characteristics of the wind turbine type to invest in. Capacity factor estimation and wind energy assessment and forecasting provide useful information for operators and wind energy management system; in particular, an accurate implementation of wind energy forecasting can improve services in terms of scheduling and dispatching. Power curve based condition monitoring can provide a general assessment of the entire wind turbine with regards to a wide range of anomaly and fault types, which is very effective for root cause analysis and diagnosis.
With regards to modeling techniques, a wide range of algorithms, mostly including parametric and non-parametric algorithms, have been explored in Section 6. According to the reviewed literature, most parametric algorithms are based on a typical polynomial expression which is essentially a natural extension of the linear regression (i.e., linear, quadratic, and cubic); a logistic function with four parameters, for instance, is also frequently used. Although they are simple to implement, remarks also indicate that most parametric models do not consider several important factors, hence, they may often result in large errors. Alternatively, non-parametric algorithms including KNN, DTR, RFR, SVR, ANN, Copula, GP, MP, CCFL, ANFIS, and ensemble learning algorithms, among others, are derived from actual wind farm data and mostly attain from the SCADA system; hence, they seem to minimize the prediction error. However, they also are not exempt from limitations; in particular, some non-parametric algorithms can suffer from time-cost due to the required data processing and model training procedures.
Prospects in this active field is soaring towards robustness; the volatile nature of the wind, not only causes challenges in the utilities and wind energy management system, but also in developing an accurate model. With the recent development of data acquisition systems, researchers are working towards solutions such as multivariate and multi-target models; that is, involving other important condition indicators in addition to the “wind speed” as input variables, and multiple indicators as target variables in addition to “active power”. On another note, the wind energy industry is expanding from onshore to offshore; extensive studies tailored to specific offshore scenarios are expected to appear more frequently. The technology to harvest the wind energy is also improving and advanced wind turbines are being manufactured; hence, there is still room for studies, analysis, and investigations on specific aspects.

In the section which follows (Section 9), the major conclusions of this study are provided, along with takeaways on the findings from a wide range of sources. Moreover, areas for future studies are also suggested.

9. Conclusions

Wind turbine power curve plays a significant role in several key applications in the wind industry. These applications include wind turbine selection, capacity factor estimation, wind energy assessment and forecasting, and condition monitoring, among others. Ensuring an effective implementation of the aforementioned applications mostly requires a modeling technique that best approximates the normal characteristics of wind turbine operation on a particular wind farm.

Two modeling techniques (parametric and nonparametric) and a wide range of algorithms were reviewed to investigate the state of the art of power curve modeling. The first takeaway is that although the reviewed algorithms employ different mechanisms, most of the proposed models currently available in the literature can be classified as normal behavior models (NBMs), since they require normal (fault-free) data to be available in advance for model training and validation purposes. This fact further emphasizes the importance of data integrity, and, hence, data preprocessing and correction are crucial in the power curve modeling process. Three different approaches (filtering, clustering, and isolation) and a wide range of algorithms have been investigated with regards to data preprocessing and correction, prior to modeling. The second takeaway is that although the reviewed algorithms for data preprocessing and correction are self-sufficient, a combination of methods may often result in a better outcome. Lastly, the third takeaway is that throughout years of extensive research, a wide range of power curve based anomaly and fault types have been identified along with their root causes. Since the power curve is a condition indicator which reflects the health state of the entire wind turbine, these findings have enhanced imputation in applications, such as wind turbine power curve-based fault diagnosis. Hence, future research can focus more on this task. Such anomaly and fault signatures, identifiable through the power curve, can serve as the first indication that something is wrong and, thus, effective decisions can be made before they escalate into major problems.

Power curve based condition monitoring distinguishes itself from other wind turbine condition monitoring techniques by providing a simplified overall assessment of the entire wind turbine, since most faults that occur in wind turbine components will eventually be reflected in the power curve. Thus, the benefits and advantages of this approach include simplicity in overall assessment and a clear indication on what type of anomaly or fault is causing suboptimal performance. Having said that, the main challenge in most power curve based applications is ensuring the model’s robustness, as wind appears to be stochastic and intermittent. In terms of condition monitoring, the main limitation is that several anomalies and fault types may not be directly isolated solely from the information provided by the power curve. In such cases, further analysis may be required. Some researchers have already worked towards a solution for this problem by developing a multi-target model; such a model supports the simultaneous monitoring of multiple state variables at once with a single (multi-target) model. For instance, it can enable one to monitor not only the power curve, but also several other critical condition indicators simultaneously in the same model, without compromising accuracy. As a result, its implementation in industry could have a significant and effective impact. Moreover, there are two potential advantages for such a model (which have not yet been fully demonstrated in wind turbine condition monitoring) to look forward to in the future: (i) expected higher accuracy, and (ii) improved interpretability. However, further studies are required to investigate under which circumstances such attributes may be achieved.

Author Contributions

Conceptualization, investigation, formal analysis, and visualization, F.B.; writing—original draft preparation, F.B. and A.M.; writing—review and editing, F.B. and H.B.; data curation, F.B. and P.C.; resources, N.L., H.B. and B.J.; supervision, N.L. and H.B.; project administration, N.L. and B.J.; funding acquisition, N.L., H.B. and F.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61873122 and Grant 62003166, in part by the Research Fund of State Key Laboratory of Mechanics and Control of Mechanical Structures, Nanjing University of Aeronautics and Astronautics under Grant MCMS-I-0521G02, and in part by China Scholarship Council under Grant 2019ZFY014342.

Data Availability Statement

Restrictions apply to the availability of these data.

Conflicts of Interest

The authors declare that there is no conflict of interest.

Nomenclature

EMS	Energy Management System
CMS	Condition Monitoring System
SCADA	Supervisory Control and Data Acquisition
IEC	International Electrotechnical Commission
LCOE	Levelized Cost of Electricity
NBMs	Normal Behavior Model
LOESS	Locally Weighted Regression
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
SOM	Self-Organizing Map
iForest	Isolation Forest
LOF	Local Outlier Factor
WCSS	Within Cluster Sum-of-Squares
KNN	K-Nearest Neighbor
DTR	Decision Tree Regression
RFR	Random Forest Regression
SVR	Support Vector Regression
ANN	Artificial Neural Network
GP	Gaussian Process
MP	Markov Process
CCFL	Clustering Center Fuzzy Logic
ANFIS	Adaptive Neuro-Fuzzy Interference System
MLE	Maximum Likelihood Estimation
MCS	Monte Carlo Simulation
PDF	Probability Density Function
MAE	Mean Absolute Error
RMSE	Root Mean Squared Error
MSE	Mean Squared Error
MAPE	Mean Absolute Percentage Error

References

IRENA. Renewable Power Generation Costs in 2021; International Renewable Energy Agency: Abu Dhabi, United Arab Emirates, 2022. [Google Scholar]
REN21. Renewables 2022 Global Status Report. In Renewable Energy Policy Network for the 21st Century (REN21); REN21: Paris, France, 2022. [Google Scholar]
Li, K.; Bian, H.; Liu, C.; Zhang, D.; Yang, Y. Comparison of geothermal with solar and wind power generation systems. Renew. Sustain. Energy Rev. 2015, 42, 1464–1474. [Google Scholar] [CrossRef]
Shokrzadeh, S.; Jozani, M.J.; Bibeau, E. Wind Turbine Power Curve Modeling Using Advanced Parametric and Nonparametric Methods. IEEE Trans. Sustain. Energy 2014, 5, 1262–1269. [Google Scholar] [CrossRef]
Tascikaraoglu, A.; Uzunoglu, M. A review of combined approaches for prediction of short-term wind speed and power. Renew. Sustain. Energy Rev. 2014, 34, 243–254. [Google Scholar] [CrossRef]
Mehrjoo, M.; Jozani, M.J.; Pawlak, M.; Bagen, B. A Multilevel Modeling Approach Towards Wind Farm Aggregated Power Curve. IEEE Trans. Sustain. Energy 2021, 12, 2230–2237. [Google Scholar] [CrossRef]
Bilendo, F.; Badihi, H.; Lu, N. Wind Turbine Anomaly Detection Based on SCADA Data. Handb. Smart Energy Syst. 2022, 1–24. [Google Scholar] [CrossRef]
Long, H.; Wang, L.; Zhang, Z.; Song, Z.; Xu, J. Data-Driven Wind Turbine Power Generation Performance Monitoring. IEEE Trans. Ind. Electron. 2015, 62, 6627–6635. [Google Scholar] [CrossRef]
Cambron, P.; Masson, C.; Tahan, A.; Pelletier, F. Control chart monitoring of wind turbine generators using the statistical inertia of a wind farm average. Renew. Energy 2018, 116, 88–98. [Google Scholar] [CrossRef]
Helbing, G.; Ritter, M. Improving wind turbine power curve monitoring with standardisation. Renew. Energy 2019, 145, 1040–1048. [Google Scholar] [CrossRef]
Thapar, V.; Agnihotri, G.; Sethi, V.K. Critical analysis of methods for mathematical modelling of wind turbines. Renew. Energy 2011, 36, 3166–3177. [Google Scholar] [CrossRef]
Carrillo, C.; Montaño, A.O.; Cidrás, J.; Díaz-Dorado, E. Review of power curve modelling for wind turbines. Renew. Sustain. Energy Rev. 2013, 21, 572–581. [Google Scholar] [CrossRef]
Lydia, M.; Selvakumar, A.I.; Kumar, S.S.; Kumar, G.E.P. Advanced Algorithms for Wind Turbine Power Curve Modeling. IEEE Trans. Sustain. Energy 2013, 4, 827–835. [Google Scholar] [CrossRef]
Lydia, M.; Kumar, S.S.; Selvakumar, A.I.; Kumar, G.E.P. A comprehensive review on wind turbine power curve modeling techniques. Renew. Sustain. Energy Rev. 2014, 30, 452–460. [Google Scholar] [CrossRef]
Yan, J.; Liu, Y.; Han, S.; Wang, Y.; Feng, S. Reviews on uncertainty analysis of wind power forecasting. Renew. Sustain. Energy Rev. 2015, 52, 1322–1330. [Google Scholar] [CrossRef]
Sohoni, V.; Gupta, S.C.; Nema, R.K. A Critical Review on Wind Turbine Power Curve Modelling Techniques and Their Applications in Wind Based Energy Systems. J. Energy 2016, 2016, 1–18. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Hu, Q.; Li, L.; Foley, A.; Srinivasan, D. Approaches to wind power curve modeling: A review and discussion. Renew. Sustain. Energy Rev. 2019, 116, 109422. [Google Scholar] [CrossRef]
Lee, G.; Ding, Y.; Xie, L.; Genton, M.G. A kernel plus method for quantifying wind turbine performance upgrades. Wind Energy 2014, 18, 1207–1219. [Google Scholar] [CrossRef]
Al-Quraan, A.; Al-Masri, H.; Al-Mahmodi, M.; Radaideh, A. Power curve modelling of wind turbines—A comparison study. IET Renew. Power Gener. 2021, 16, 362–374. [Google Scholar] [CrossRef]
Manobel, B.; Sehnke, F.; Lazzús, J.A.; Salfate, I.; Felder, M.; Montecinos, S. Wind turbine power curve modeling based on Gaussian Processes and Artificial Neural Networks. Renew. Energy 2018, 125, 1015–1020. [Google Scholar] [CrossRef]
Bilendo, F.; Badihi, H.; Lu, N.; Cambron, P.; Jiang, B. A Normal Behavior Model Based on Power Curve and Stacked Regressions for Condition Monitoring of Wind Turbines. IEEE Trans. Instrum. Meas. 2022, 71, 1–13. [Google Scholar] [CrossRef]
Stanley, A.P.; Roberts, O.; Lopez, A.; Williams, T.; Barker, A. Turbine scale and siting considerations in wind plant layout optimization and implications for capacity density. Energy Rep. 2022, 8, 3507–3525. [Google Scholar] [CrossRef]
Hu, S.-Y.; Cheng, J.-H. Performance evaluation of pairing between sites and wind turbines. Renew. Energy 2007, 32, 1934–1947. [Google Scholar] [CrossRef]
Liu, Y.; Fu, Y.; Huang, L.-L.; Zhang, K. Reborn and upgrading: Optimum repowering planning for offshore wind farms. Energy Rep. 2022, 8, 5204–5214. [Google Scholar] [CrossRef]
Pallabazzer, R. Parametric analysis of wind siting efficiency. J. Wind. Eng. Ind. Aerodyn. 2003, 91, 1329–1352. [Google Scholar] [CrossRef]
Song, D.; Yang, Y.; Zheng, S.; Tang, W.; Yang, J.; Su, M.; Yang, X.; Joo, Y.H. Capacity factor estimation of variable-speed wind turbines considering the coupled influence of the QN-curve and the air density. Energy 2019, 183, 1049–1060. [Google Scholar] [CrossRef]
Albadi, M.H.; El-Saadany, E.F. Wind Turbines Capacity Factor Modeling—A Novel Approach. IEEE Trans. Power Syst. 2009, 24, 1637–1638. [Google Scholar] [CrossRef]
Ayodele, T.; Jimoh, A.; Munda, J.; Agee, J. Wind distribution and capacity factor estimation for wind turbines in the coastal region of South Africa. Energy Convers. Manag. 2012, 64, 614–625. [Google Scholar] [CrossRef]
Ditkovich, Y.; Kuperman, A.; Yahalom, A.; Byalsky, M. A Generalized Approach to Estimating Capacity Factor of Fixed Speed Wind Turbines. IEEE Trans. Sustain. Energy 2012, 3, 607–608. [Google Scholar] [CrossRef]
Yeh, T.-H.; Wang, L. A Study on Generator Capacity for Wind Turbines Under Various Tower Heights and Rated Wind Speeds Using Weibull Distribution. IEEE Trans. Energy Convers. 2008, 23, 592–602. [Google Scholar] [CrossRef]
Souloukngaa, M.H.; Coban, H.H. Determination of Feasibility Analysis of Wind Turbines Using Weibull Parameter for Chad. J. Smart Sci. Technol. 2022, 2, 1–15. [Google Scholar] [CrossRef]
Zeng, J.; Qiao, W. Short-Term Wind Power Prediction Using a Wavelet Support Vector Machine. IEEE Trans. Sustain. Energy 2012, 3, 255–264. [Google Scholar] [CrossRef]
Faulstich, S.; Hahn, B.; Tavner, P.J. Wind turbine downtime and its importance for offshore deployment. Wind Energy 2011, 14, 327–337. [Google Scholar] [CrossRef]
Carroll, J.; McDonald, A.; McMillan, D. Failure rate, repair time and unscheduled O&M cost analysis of offshore wind turbines. Wind Energy 2016, 19, 1107–1119. [Google Scholar] [CrossRef] [Green Version]
Pfaffel, S.; Faulstich, S.; Rohrig, K. Performance and Reliability of Wind Turbines: A Review. Energies 2017, 10, 1904. [Google Scholar] [CrossRef] [Green Version]
Garcia Márquez, F.P.; Tobias, A.M.; Pérez, J.M.P.; Papaelias, M. Condition monitoring of wind turbines: Techniques and methods. Renew. Energy 2012, 46, 169–178. [Google Scholar] [CrossRef]
Zaher, A.; McArthur, S.; Infield, D.; Patel, Y. Online wind turbine fault detection through automated SCADA data analysis. Wind. Energy 2009, 12, 574–593. [Google Scholar] [CrossRef]
Kusiak, A.; Zheng, H.; Song, Z. On-line monitoring of power curves. Renew. Energy 2009, 34, 1487–1493. [Google Scholar] [CrossRef]
Kusiak, A.; Li, W. The prediction and diagnosis of wind turbine faults. Renew. Energy 2011, 36, 16–23. [Google Scholar] [CrossRef]
Schlechtingen, M.; Santos, I.F.; Achiche, S. Wind turbine condition monitoring based on SCADA data using normal behavior models. Part 1: System description. Appl. Soft Comput. 2013, 13, 259–270. [Google Scholar] [CrossRef]
Tautz-Weinert, J.; Watson, S. Using SCADA data for wind turbine condition monitoring—A review. IET Renew. Power Gener. 2016, 11, 382–394. [Google Scholar] [CrossRef] [Green Version]
Maldonado-Correa, J.; Martín-Martínez, S.; Artigao, E.; Gómez-Lázaro, E. Using SCADA Data for Wind Turbine Condition Monitoring: A Systematic Literature Review. Energies 2020, 13, 3132. [Google Scholar] [CrossRef]
Marvuglia, A.; Messineo, A. Monitoring of wind farms’ power curves using machine learning techniques. Appl. Energy 2012, 98, 574–583. [Google Scholar] [CrossRef]
Schlechtingen, M.; Santos, I.F.; Achiche, S. Using Data-Mining Approaches for Wind Turbine Power Curve Monitoring: A Comparative Study. IEEE Trans. Sustain. Energy 2013, 4, 671–679. [Google Scholar] [CrossRef]
Schlechtingen, M.; Santos, I.F. Wind turbine condition monitoring based on SCADA data using normal behavior models. Part 2: Application examples. Appl. Soft Comput. 2014, 14, 447–460. [Google Scholar] [CrossRef]
Meyer, A.; Brodbeck, B. Data-driven Performance Fault Detection in Commercial Wind Turbines. In Proceedings of the 5th European Conference of the Prognostics and Health Management Society (PHME20), Turin, Italy, 1–3 July 2020. [Google Scholar]
Meyer, A. Multi-target normal behaviour models for wind farm condition monitoring. Appl. Energy 2021, 300, 117342. [Google Scholar] [CrossRef]
Stetco, A.; Dinmohammadi, F.; Zhao, X.; Robu, V.; Flynn, D.; Barnes, M.; Keane, J.; Nenadic, G. Machine learning methods for wind turbine condition monitoring: A review. Renew. Energy 2019, 133, 620–635. [Google Scholar] [CrossRef]
Helbing, G.; Ritter, M. Deep Learning for fault detection in wind turbines. Renew. Sustain. Energy Rev. 2018, 98, 189–198. [Google Scholar] [CrossRef]
Meyer, A. Early Fault Detection with Multi-target Neural Networks. In Computational Science and Its Applications—ICCSA 2021; Lecture Notes in Computer Science; Gervasi, O., Ed.; Springer: Berlin/Heidelberg, Germany, 2021; Volume 12951. [Google Scholar]
Li, Y.; Liu, S.; Shu, L. Wind turbine fault diagnosis based on Gaussian process classifiers applied to operational data. Renew. Energy 2018, 134, 357–366. [Google Scholar] [CrossRef]
Xu, Q.; Fan, Z.; Jia, W.; Jiang, C. Fault detection of wind turbines via multivariate process monitoring based on vine copulas. Renew. Energy 2020, 161, 939–955. [Google Scholar] [CrossRef]
Aziz, U.; Charbonnier, S.; Bérenguer, C.; Lebranchu, A.; Prevost, F. Critical comparison of power-based wind turbine fault-detection methods using a realistic framework for SCADA data simulation. Renew. Sustain. Energy Rev. 2021, 144, 110961. [Google Scholar] [CrossRef]
Badihi, H.; Zhang, Y.; Jiang, B.; Pillay, P.; Rakheja, S. A Comprehensive Review on Signal-Based and Model-Based Condition Monitoring of Wind Turbines: Fault Diagnosis and Lifetime Prognosis. Proc. IEEE 2022, 110, 754–806. [Google Scholar] [CrossRef]
Gill, S.; Stephen, B.; Galloway, S. Wind Turbine Condition Assessment Through Power Curve Copula Modeling. IEEE Trans. Sustain. Energy 2011, 3, 94–101. [Google Scholar] [CrossRef] [Green Version]
Shen, X.; Fu, X.; Zhou, C. A Combined Algorithm for Cleaning Abnormal Data of Wind Turbine Power Curve Based on Change Point Grouping Algorithm and Quartile Algorithm. IEEE Trans. Sustain. Energy 2018, 10, 46–54. [Google Scholar] [CrossRef]
Badihi, H.; Zhang, Y.; Pillay, P.; Rakheja, S. Fault-Tolerant Individual Pitch Control for Load Mitigation in Wind Turbines with Actuator Faults. IEEE Trans. Ind. Electron. 2020, 68, 532–543. [Google Scholar] [CrossRef]
Badihi, H.; Jadidi, S.; Zhang, Y.; Pillay, P.; Rakheja, S. Fault-Tolerant Cooperative Control in a Wind Farm Using Adaptive Control Reconfiguration and Control Reallocation. IEEE Trans. Sustain. Energy 2019, 11, 2119–2129. [Google Scholar] [CrossRef]
Park, J.-Y.; Lee, J.-K.; Oh, K.-Y.; Lee, J.-S. Development of a Novel Power Curve Monitoring Method for Wind Turbines and Its Field Tests. IEEE Trans. Energy Convers. 2014, 29, 119–128. [Google Scholar] [CrossRef]
Ye, X.; Lu, Z.; Qiao, Y.; Min, Y.; O’Malley, M. Identification and Correction of Outliers in Wind Farm Time Series Power Data. IEEE Trans. Power Syst. 2016, 31, 4197–4205. [Google Scholar] [CrossRef]
Yesilbudak, M. Implementation of novel hybrid approaches for power curve modeling of wind turbines. Energy Convers. Manag. 2018, 171, 156–169. [Google Scholar] [CrossRef]
Yuan, T.; Sun, Z.; Ma, S. Gearbox Fault Prediction of Wind Turbines Based on a Stacking Model and Change-Point Detection. Energies 2019, 12, 4224. [Google Scholar] [CrossRef] [Green Version]
Long, H.; Sang, L.; Wu, Z.; Gu, W. Image-Based Abnormal Data Detection and Cleaning Algorithm via Wind Power Curve. IEEE Trans. Sustain. Energy 2019, 11, 938–946. [Google Scholar] [CrossRef]
Han, S.; Qiao, Y.; Yan, P.; Yan, J.; Liu, Y.; Li, L. Wind turbine power curve modeling based on interval extreme probability density for the integration of renewable energies and electric vehicles. Renew. Energy 2020, 157, 190–203. [Google Scholar] [CrossRef]
Zhang, S.; Lang, Z.-Q. SCADA-data-based wind turbine fault detection: A dynamic model sensor method. Control. Eng. Pract. 2020, 102, 104546. [Google Scholar] [CrossRef]
Moreno, S.R.; Coelho, L.D.S.; Ayala, H.V.; Mariani, V.C. Wind turbines anomaly detection based on power curves and ensemble learning. IET Renew. Power Gener. 2020, 14, 4086–4093. [Google Scholar] [CrossRef]
Wang, Y.; Hu, Q.; Pei, S. Wind Power Curve Modeling with Asymmetric Error Distribution. IEEE Trans. Sustain. Energy 2019, 11, 1199–1209. [Google Scholar] [CrossRef]
Liang, G.; Su, Y.; Chen, F.; Long, H.; Song, Z.; Gan, Y. Wind Power Curve Data Cleaning by Image Thresholding Based on Class Uncertainty and Shape Dissimilarity. IEEE Trans. Sustain. Energy 2020, 12, 1383–1393. [Google Scholar] [CrossRef]
Morrison, R.; Liu, X.; Lin, Z. Anomaly detection in wind turbine SCADA data for power curve cleaning. Renew. Energy 2022, 184, 473–486. [Google Scholar] [CrossRef]
Bilendo, F.; Badihi, H.; Lu, N.; Cambron, P.; Jiang, B. Power Curve-Based Fault Detection Method for Wind Turbines. Ifac-Papersonline 2022, 55, 408–413. [Google Scholar] [CrossRef]
Zeng, X.; Yang, M.; Bo, Y. Gearbox oil temperature anomaly detection for wind turbine based on sparse Bayesian probability estimation. Int. J. Electr. Power Energy Syst. 2020, 123, 106233. [Google Scholar] [CrossRef]
Duong, B.P.; Khan, S.A.; Shon, D.; Im, K.; Park, J.; Lim, D.-S.; Jang, B.; Kim, J.-M. A Reliable Health Indicator for Fault Prognosis of Bearings. Sensors 2018, 18, 3740. [Google Scholar] [CrossRef] [Green Version]
Kusiak, A.; Verma, A. Monitoring Wind Farms with Performance Curves. IEEE Trans. Sustain. Energy 2012, 4, 192–199. [Google Scholar] [CrossRef]
Yan, J.; Zhang, H.; Liu, Y.; Han, S.; Li, L. Uncertainty estimation for wind energy conversion by probabilistic wind turbine power curve modelling. Appl. Energy 2019, 239, 1356–1370. [Google Scholar] [CrossRef]
Zhao, Y.; Ye, L.; Wang, W.; Sun, H.; Ju, Y.; Tang, Y. Data-Driven Correction Approach to Refine Power Curve of Wind Farm Under Wind Curtailment. IEEE Trans. Sustain. Energy 2017, 9, 95–105. [Google Scholar] [CrossRef]
Souza, L.G.M.; Santos, D.C. A Performance Comparison of Robust Models in Wind Turbines Power Curve Estimation: A Case Study. Neural Process. Lett. 2022, 54, 3375–3400. [Google Scholar] [CrossRef]
Bilendo, F.; Badihi, H.; Lu, N.; Cambron, P.; Jiang, B. Imaging Wind Turbine Fault Signatures Based on Power Curve and Self-Organizing Map for Image-Based Fault Diagnosis. In Proceedings of the 2022 IEEE International Symposium on Advanced Control of Industrial Processes, Vancouver, BC, Canada, 7–9 August 2022; pp. 204–209. [Google Scholar] [CrossRef]
Wang, Y.; Infield, D.G.; Stephen, B.; Galloway, S.J. Copula-based model for wind turbine power curve outlier rejection. Wind Energy 2013, 17, 1677–1688. [Google Scholar] [CrossRef] [Green Version]
Li, T.; Liu, X.; Lin, Z.; Morrison, R. Ensemble offshore Wind Turbine Power Curve modelling—An integration of Isolation Forest, fast Radial Basis Function Neural Network, and metaheuristic algorithm. Energy 2022, 239. [Google Scholar] [CrossRef]
Zheng, L.; Hu, W.; Min, Y. Raw Wind Data Preprocessing: A Data-Mining Approach. IEEE Trans. Sustain. Energy 2014, 6, 11–19. [Google Scholar] [CrossRef]
Sainz, E.; Llombart, A.; Guerrero, J. Robust filtering for the characterization of wind turbines: Improving its operation and maintenance. Energy Convers. Manag. 2009, 50, 2136–2147. [Google Scholar] [CrossRef]
Bangalore, P.; Letzgus, S.; Karlsson, D.; Patriksson, M. An artificial neural network-based condition monitoring method for wind turbines, with application to the monitoring of the gearbox. Wind Energy 2017, 20, 1421–1438. [Google Scholar] [CrossRef]
Zhao, Y.; Li, D.; Dong, A.; Kang, D.; Lv, Q.; Shang, L. Fault Prediction and Diagnosis of Wind Turbine Generators Using SCADA Data. Energies 2017, 10, 1210. [Google Scholar] [CrossRef] [Green Version]
Javadi, M.; Malyscheff, A.M.; Wu, D.; Kang, C.; Jiang, J.N. An algorithm for practical power curve estimation of wind turbines. CSEE J. Power Energy Syst. 2018, 4, 93–102. [Google Scholar] [CrossRef]
Teng, W.; Cheng, H.; Ding, X.; Liu, Y.; Ma, Z.; Mu, H. DNN-based approach for fault detection in a direct drive wind turbine. IET Renew. Power Gener. 2018, 12, 1164–1171. [Google Scholar] [CrossRef]
Hu, Y.; Qiao, Y.; Liu, J.-Z.; Zhu, H. Adaptive Confidence Boundary Modeling of Wind Turbine Power Curve Using SCADA Data and Its Application. IEEE Trans. Sustain. Energy 2018, 10, 1330–1341. [Google Scholar] [CrossRef]
Hu, Y.; Xi, Y.; Pan, C.; Li, G.; Chen, B. Daily condition monitoring of grid-connected wind turbine via high-fidelity power curve and its comprehensive rating. Renew. Energy 2020, 146, 2095–2111. [Google Scholar] [CrossRef]
Trizoglou, P.; Liu, X.; Lin, Z. Fault detection by an ensemble framework of Extreme Gradient Boosting (XGBoost) in the operation of offshore wind turbines. Renew. Energy 2021, 179, 945–962. [Google Scholar] [CrossRef]
Xiang, L.; Yang, X.; Hu, A.; Su, H.; Wang, P. Condition monitoring and anomaly detection of wind turbine based on cascaded and bidirectional deep learning networks. Appl. Energy 2021, 305, 117925. [Google Scholar] [CrossRef]
Yao, Q.; Hu, Y.; Liu, J.; Zhao, T.; Qi, X.; Sun, S. Power Curve Modeling for Wind Turbine using Hybrid-Driven Outlier Detection Method. J. Mod. Power Syst. Clean Energy 2022, 1–11. [Google Scholar]
Luo, Z.; Fang, C.; Liu, C.; Liu, S. Method for Cleaning Abnormal Data of Wind Turbine Power Curve Based on Density Clustering and Boundary Extraction. IEEE Trans. Sustain. Energy 2021, 13, 1147–1159. [Google Scholar] [CrossRef]
Mehrjoo, M.; Jozani, M.J.; Pawlak, M. Wind turbine power curve modeling for reliable power prediction using monotonic regression. Renew. Energy 2019, 147, 214–222. [Google Scholar] [CrossRef]
Taslimi-Renani, E.; Modiri-Delshad, M.; Elias, M.F.M.; Rahim, N.A. Development of an enhanced parametric model for wind turbine power curve. Appl. Energy 2016, 177, 544–552. [Google Scholar] [CrossRef]
Chen, J.; Wang, F.; Stelson, K.A. A mathematical approach to minimizing the cost of energy for large utility wind turbines. Appl. Energy 2018, 228, 1413–1422. [Google Scholar] [CrossRef]
Chang, T.-P.; Liu, F.-J.; Ko, H.-H.; Cheng, S.-P.; Sun, L.-C.; Kuo, S.-C. Comparative analysis on power curve models of wind turbine generator in estimating capacity factor. Energy 2014, 73, 88–95. [Google Scholar] [CrossRef]
Liu, X. An Improved Interpolation Method for Wind Power Curves. IEEE Trans. Sustain. Energy 2012, 3, 528–534. [Google Scholar] [CrossRef]
Seo, S.; Oh, S.-D.; Kwak, H.-Y. Wind turbine power curve modeling using maximum likelihood estimation method. Renew. Energy 2018, 136, 1164–1169. [Google Scholar] [CrossRef]
Villanueva, D.; Feijóo, A.E. Reformulation of parameters of the logistic function applied to power curves of wind turbines. Electr. Power Syst. Res. 2016, 137, 51–58. [Google Scholar] [CrossRef]
Kusiak, A.; Zheng, H.; Song, Z. Models for monitoring wind farm power. Renew. Energy 2009, 34, 583–590. [Google Scholar] [CrossRef]
Janssens, O.; Noppe, N.; Devriendt, C.; Van de Walle, R.; Van Hoecke, S. Data-driven multivariate power curve modeling of offshore wind turbines. Eng. Appl. Artif. Intell. 2016, 55, 331–338. [Google Scholar] [CrossRef]
Pandit, R.K.; Infield, D.; Kolios, A. Comparison of advanced non-parametric models for wind turbine power curves. IET Renew. Power Gener. 2019, 13, 1503–1510. [Google Scholar] [CrossRef] [Green Version]
Ouyang, T.; Kusiak, A.; He, Y. Modeling wind-turbine power curve: A data partitioning and mining approach. Renew. Energy 2017, 102, 1–8. [Google Scholar] [CrossRef]
Pelletier, F.; Masson, C.; Tahan, A. Wind turbine power curve modelling using artificial neural network. Renew. Energy 2016, 89, 207–214. [Google Scholar] [CrossRef]
Stephen, B.; Galloway, S.J.; McMillan, D.; Hill, D.C.; Infield, D.G. A Copula Model of Wind Turbine Performance. IEEE Trans. Power Syst. 2010, 26, 965–966. [Google Scholar] [CrossRef] [Green Version]
Wei, D.; Wang, J.; Li, Z.; Wang, R. Wind Power Curve Modeling with Hybrid Copula and Grey Wolf Optimization. IEEE Trans. Sustain. Energy 2021, 13, 265–276. [Google Scholar] [CrossRef]
Pandit, R.K.; Infield, D. SCADA-based wind turbine anomaly detection using Gaussian process models for wind turbine condition monitoring purposes. IET Renew. Power Gener. 2018, 12, 1249–1255. [Google Scholar] [CrossRef] [Green Version]
Pandit, R.K.; Infield, D. Comparative analysis of Gaussian Process power curve models based on different stationary covariance functions for the purpose of improving model accuracy. Renew. Energy 2019, 140, 190–202. [Google Scholar] [CrossRef] [Green Version]
Pandit, R.; Infield, D.; Kolios, A. Gaussian process power curve models incorporating wind turbine operational variables. Energy Rep. 2020, 6, 1658–1669. [Google Scholar] [CrossRef]
Virgolino, G.C.; Mattos, C.L.; Magalhães, J.A.F.; Barreto, G.A. Gaussian processes with logistic mean function for modeling wind turbine power curves. Renew. Energy 2020, 162, 458–465. [Google Scholar] [CrossRef]
Guo, P.; Infield, D. Wind Turbine Power Curve Modeling and Monitoring with Gaussian Process and SPRT. IEEE Trans. Sustain. Energy 2018, 11, 107–115. [Google Scholar] [CrossRef] [Green Version]
Rogers, T.; Gardner, P.; Dervilis, N.; Worden, K.; Maguire, A.; Papatheou, E.; Cross, E. Probabilistic modelling of wind turbine power curves with application of heteroscedastic Gaussian Process regression. Renew. Energy 2020, 148, 1124–1136. [Google Scholar] [CrossRef]
Bull, L.; Gardner, P.; Rogers, T.; Dervilis, N.; Cross, E.; Papatheou, E.; Maguire, A.; Campos, C.; Worden, K. Bayesian modelling of multivalued power curves from an operational wind farm. Mech. Syst. Signal Process. 2021, 169, 108530. [Google Scholar] [CrossRef]
Anahua, E.; Barth, S.; Peinke, J. Markovian power curves for wind turbines. Wind Energy 2007, 11, 219–232. [Google Scholar] [CrossRef]
Ustuntas, T.; Şahin, A.D. Wind turbine power curve estimation based on cluster center fuzzy logic modeling. J. Wind Eng. Ind. Aerodyn. 2008, 96, 611–620. [Google Scholar] [CrossRef]
De la Hermosa González, R.R. Wind farm monitoring using Mahalanobis distance and fuzzy clustering. Renew. Energy 2018, 123, 526–540. [Google Scholar] [CrossRef]
Cambron, P.; Lepvrier, R.; Masson, C.; Tahan, A.; Pelletier, F. Power curve monitoring using weighted moving average control charts. Renew. Energy 2016, 94, 126–135. [Google Scholar] [CrossRef]
Yun, E.; Hur, J. Probabilistic estimation model of power curve to enhance power output forecasting of wind generating resources. Energy 2021, 223, 120000. [Google Scholar] [CrossRef]
Villanueva, D.; Feijoo, A. Normal-Based Model for True Power Curves of Wind Turbines. IEEE Trans. Sustain. Energy 2016, 7, 1005–1011. [Google Scholar] [CrossRef]
Albadi, M.; El-Saadany, E. Optimum turbine-site matching. Energy 2010, 35, 3593–3602. [Google Scholar] [CrossRef]
Marčiukaitis, M.; Žutautaitė, I.; Martišauskas, L.; Jokšas, B.; Gecevičius, G.; Sfetsos, A. Non-linear regression model for wind turbine power curve. Renew. Energy 2017, 113, 732–741. [Google Scholar] [CrossRef]
You, M.; Liu, B.; Byon, E.; Huang, S.; Jin, J. Direction-Dependent Power Curve Modeling for Multiple Interacting Wind Turbines. IEEE Trans. Power Syst. 2017, 33, 1725–1733. [Google Scholar] [CrossRef]
Wang, Y.; Hu, Q.; Srinivasan, D.; Wang, Z. Wind Power Curve Modeling and Wind Power Forecasting with Inconsistent Data. IEEE Trans. Sustain. Energy 2018, 10, 16–25. [Google Scholar] [CrossRef]
Saint-Drenan, Y.-M.; Besseau, R.; Jansen, M.; Staffell, I.; Troccoli, A.; Dubus, L.; Schmidt, J.; Gruber, K.; Simoes, S.; Heier, S. A parametric model for wind turbine power curves incorporating environmental conditions. Renew. Energy 2020, 157, 754–768. [Google Scholar] [CrossRef]
Karamichailidou, D.; Kaloutsa, V.; Alexandridis, A. Wind turbine power curve modeling using radial basis function neural networks and tabu search. Renew. Energy 2020, 163, 2137–2152. [Google Scholar] [CrossRef]
Xu, K.; Yan, J.; Zhang, H.; Zhang, H.; Han, S.; Liu, Y. Quantile based probabilistic wind turbine power curve model. Appl. Energy 2021, 296, 116913. [Google Scholar] [CrossRef]
Zou, R.; Yang, J.; Wang, Y.; Liu, F.; Essaaidi, M.; Srinivasan, D. Wind turbine power curve modeling using an asymmetric error characteristic-based loss function and a hybrid intelligent optimizer. Appl. Energy 2021, 304, 117707. [Google Scholar] [CrossRef]
Wang, Y.; Li, Y.; Zou, R.; Foley, A.M.; Al Kez, D.; Song, D.; Hu, Q.; Srinivasan, D. Sparse Heteroscedastic Multiple Spline Regression Models for Wind Turbine Power Curve Modeling. IEEE Trans. Sustain. Energy 2020, 12, 191–201. [Google Scholar] [CrossRef]
Yang, L.; Wang, L.; Zhang, Z. Generative Wind Power Curve Modeling Via Machine Vision: A Deep Convolutional Network Method with Data-Synthesis-Informed-Training. IEEE Trans. Power Syst. 2022, 1. [Google Scholar] [CrossRef]
Breeze, P. (Ed.) The Anatomy of a Wind Turbine. In Wind Power Generation; Academic Press: Cambridge, MA, USA, 2016; pp. 19–27. [Google Scholar]

Figure 1. The ideal power curve of the wind turbine and appointed modes of operation.

Figure 2. Wind turbine types and the corresponding power curves.

Figure 3. Wind turbine production efficiency.

Figure 4. A typical control chart for power curve-based condition monitoring.

Figure 5. Wind turbine “actual power curve” marked with anomaly and fault signatures [21].

Figure 6. Type 1: Lower Stacked Data.

Figure 7. Type 2: Down-Rating.

Figure 8. Type 3: Wind Speed Under-Reading.

Figure 9. Type 4: Dispersive (Spread) Data.

Figure 10. Type 5: Icing/Debris Build-up on Blades.

Figure 11. Normal Power Curve.

Figure 12. Combined methods to attain optimal (normal) power curve [21].

Figure 13. Reviewed data preprocessing and correction schemes.

Figure 14. Reviewed power curve modeling techniques.

Figure 15. Evolution and important milestones.

Figure 16. Wind turbine configurations [129]: (a) horizontal axis, and (b) vertical axis.

Table 1. Available reviews on power curve modeling and applications.

Ref.	Author(s) and Year	Brief Description
[11]	(Thapar, et al., 2011)	A critical analysis on “mathematical modelling” methods for power curve.
[12]	(Carrillo, et al., 2013)	A review on the equations for power curve modeling; the “polynomial, exponential, and cubic”.
[13]	(Lydia, et al., 2013)	An overview on several “parametric” and “non-parametric” algorithms for power curve modeling.
[5]	(Tascikaraoglu & Uzunoglu, 2014)	A review on combined methods for wind power forecasting.
[14]	(Lydia, et al., 2014)	A review on the parametric and nonparametric techniques for power curve modeling.
[15]	(Yan, et al., 2015)	A review on “uncertainty analysis” of wind power forecasting; a probabilistic approach.
[16]	(Sohoni, et al., 2016)	A critical overview on power curve modelling techniques and applications for wind turbines.
[17]	(Wang, et al., 2019)	An overview on power curve modelling; analysis in distinct seasons, including wind farms.

Table 2. Commonly identified anomaly and fault types on power curves.

Types	Anomaly and Fault Description [21]
Type 1	Lower Stacked Data: In such events, anomalous and fault signature can be identified through the “lower-horizontal data”, in which the assigned power value is relatively equal to zero, including in situations where the wind speed is above the “rated wind speed”. Root cause: according to the reviewed literature, such events are typically caused due to “damaged power measuring instrument”, or “communication equipment fault”.
Type 2	Down-Rating: Also known as “Power Curtailment or Derated”, such events are identifiable by its signature in “the middle-horizontal data”; in those situations, the power curve stays mostly constant at a specified rate and no changes are made with regards to wind speed. Root cause 1: according to the reviewed literature, down-rating may occur due to an imposed “control action”, to restrict power production to a level below its maximum efficiency. Reasons for activation of such action, includes the excess capacity of wind power with no substantial resources for storing; also, the wind speed instability, in particular “huge fluctuations”, can play a role. Root cause 2: Alternatively, such events may also be caused by “load sensor failure”.
Type 3	Wind Speed Under-Reading: The “stacked-vertical data” beyond the power curve reference, in situations where there is an increase in power value, but not necessarily in regard to wind speed, can often be an indication of such event. Root-cause(s): There may be a variety of reasons that may trigger such events, for instance “wind speed sensor malfunction/failure”, “communication equipment error”, and/or “power measuring instrument failure”. However, to the best of author’s knowledge, specific details on this regard have not been broadly reported.
Type 4	Dispersive (Spread) Data: It refers to the “randomly distributed data points” all around the curve. Root cause 1: According to the reviewed literature, the “turbulence produced by the wake effects” or “wind turbine loading ramp” that occurs while starting or stopping the wind turbine, are reportedly some of the root-causes. Root cause 2: Alternatively, it may be caused by “sensor accuracy degradation”, “instrument fault”, and “noise during signal processing by the system”.
Type 5	Icing/Debris Build-up on Blades: The shift of a cluster of data points when compared with the reference power curve, often indicates events such as icing and/or debris build-up on blades. Root cause: Such events are mostly dependent on the environmental conditions and geographic location of the wind farm.

Table 3. Research articles addressing anomalies and faults.

Ref.	Author(s) and Year	Brief Description
[43]	(Marvuglia & Messineo, 2012)	A discussion on the presence of outliers and abnormal values on power curve, including root-causes.
[55]	(Gill, et al., 2012)	A discussion on three “fault modes” that provide a subdivision of all possible faults.
[59]	(Park, et al., 2014)	Visual illustration of seven types of power curve deviations, and the related anomaly or fault types.
[60]	(Ye, et al., 2016)	Addressing four types of anomaly and fault types, including the root-causes.
[61]	(Yesilbudak, 2018)	Addressing several root-causes that cause different types of anomaly and fault types on power curve.
[62]	(Yuan, et al., 2019)	A discussion on three common anomaly and fault types on power curve.
[56]	(Shen, et al., 2019)	Addressing four types of anomaly and fault, according to the data distribution characteristics.
[63]	(Long, et al., 2020)	A brief discussion on three common anomaly and fault types.
[64]	(Han, et al., 2020)	Addressing three common anomaly and fault types on power curve, including root-causes.
[65]	(Zhang & Lang, 2020)	A brief discussion on power curve-based curtailment outliers in the SCADA data.
[66]	(Moreno, et al., 2020)	Addressing three modes of degradation through power curve.
[67]	(Wang, et al., 2020)	Briefly addressing and illustrating three types of reprehensive outliers on power curve.
[68]	(Liang, et al., 2021)	Briefly addressing three categories of abnormal data on wind turbine power curve.
[21]	(Bilendo, et al., 2022)	Addressing five types of most common anomalies and faults on power curve, including root-causes.
[69]	(Morrison, et al., 2022)	Briefly addressing three common anomaly and fault types on power curve.

Table 4. Advantages and limitations (data preprocessing methods).

Method	Essential Parameters	Advantages and Limitations (Pros and Cons)
Moving Mean	The moving window size	Pros: The method is simple and easy to implement. Cons: A large size moving window can lead to loss of information.
LOESS	The span size	Pros: The method is simple and easy to implement. Cons: A large span value can lead to loss of information.
K-Means Clustering	Number of clusters	Pros: The method is simple and easy to implement. Cons: It is challenging to attain an optimal number of clusters.
DBSCAN	Maximum distance; Number of samples in neighborhood; Distance metric	Pros: The model does not require specification of number of clusters. Cons: May suffer from time-cost when used on large-size datasets.
Self-Organizing Map	Grid size; Neighborhood radius; Learning rate	Pros: The method enables time-series and image-based approaches. Cons: Difficult to determine input weights.
Isolation Forest	Number of estimators; Contamination	Pros: Works well on small sample size, but also capable of rescaling. Cons: The model can suffer from bias.
Local Outlier Factor	Number of neighbors; Distance metric; Contamination	Pros: The model is density-based. Cons: Challenging to understand the decisions based on score.
Combined Methods	Depends on the constituents;	Pros: Such method may attain better result compared to single-based. Cons: May suffer from time-cost depending on its complexity.

Table 5. Research articles addressing power curve data correction.

Ref.	Author(s) and Year	Brief Description
[81]	(Sainz, et al., 2009)	A robust statistical filtering method based on “Least Median Squares” combined with random search.
[73]	(Kusiak & Verma, 2013)	A multivariate approach for outlier detection based on “k-means clustering” and “Mahalanobis distance”.
[59]	(Park, et al., 2014)	An algorithm for power curve modeling, which automatically calculates the power curve limits.
[78]	(Wang, et al., 2014)	A data clustering procedure based on SOM.
[80]	(Zheng, et al., 2015)	Data preprocessing and filtering approach involving LOF.
[60]	(Ye, et al., 2016)	Outlier detection and correction method.
[82]	(Bangalore, et al., 2017)	A methodology for data preprocessing and post-processing including anomaly detection.
[83]	(Zhao, et al., 2017)	SCADA data processing including a data cleaning procedure for diagnosis and prognosis purposes.
[20]	(Manobel, et al., 2018)	An automatic filtering, by means of Gaussian Process, is employed prior to modeling through ANN.
[61]	(Yesilbudak, 2018)	“Clustering, filtering and modeling”; using the “k-means-based Smoothing Spline hybrid” model.
[84]	(Javadi, et al., 2018)	A model to minimize the modeling error while bias error is reduced by recursive cleaning of outliers.
[85]	(Teng, et al., 2018)	A preprocessing method to get rid of the outliers in SCADA dataset.
[75]	(Zhao, et al., 2018)	Data-driven outlier elimination method combining quartile and DBSCAN.
[74]	(Yan, et al., 2019)	A data cleaning method using a control strategy based on intuitive rules and DBSCAN.
[56]	(Shen, et al., 2019)	A method for cleaning the outliers using the “change point grouping”, and the “quartile algorithm”.
[86]	(Hu, et al., 2019)	A “confidence boundary” modeling process for power curve to detect and eliminate abnormal data.
[63]	(Long, et al., 2020)	An “image-based” abnormal data identification and cleaning algorithm for wind turbine power curve.
[87]	(Hu, et al., 2020)	A “stepwise data cleaning” procedure based on “irregular space-division and nonlinear space-mapping”.
[68]	(Liang, et al., 2021)	Power curve data cleaning by “image thresholding” using “class uncertainty and shape dissimilarity”.
[88]	(Trizoglou, et al., 2021)	Data preprocessing and resampling, anomaly detection and treatment for a normal behavior model.
[70]	(Bilendo, et al., 2022)	A signal processing scheme using Moving Mean, DBSCAN, LOESS, and Upper-Lower limits.
[89]	(Xiang, et al., 2022)	A quartile method for SCADA data cleaning.
[90]	(Yao, et al., 2022)	A hybrid-driven outlier detection method.
[69]	(Morrison, et al., 2022)	The impact of filtering by a comparison of performances of different methodologies with-without filtering.
[79]	(Li, et al., 2022)	Anomaly detection and removal on power curve data based on iForest.
[91]	(Luo, et al., 2022)	A set of procedures to detect and remove outliers by the framework of classification processing.
[77]	(Bilendo, et al., 2022)	“Imaging” wind turbine fault signatures using power curve and “SOM” for fault diagnosis purpose.
[76]	(Souza & Santos, 2022)	Using the SOM to minimize the squared error generated by the local interpolation for data with outliers.

Table 6. Advantages and limitations (modeling methods).

Method	Essential Parameters	Advantages and Limitations (Pros and Cons)
Linear	The degree of polynomial regression; (first-degree)	Pros: The method is based on a simple mathematical expression which is easy to implement. Cons: The result is linear whereas the power curve is not entirely.
Quadratic	The degree of polynomial regression; (second-degree)	Pros: Similar to linear, the method is based on a simple mathematical expression which is easy to implement. Cons: The curve may not fully fit the actual power curve.
Cubic	The degree of polynomial regression; (third-degree)	Pros: Alike to linear and quadratic, the method is based on a simple mathematical expression which is easy to implement. Cons: The curve may not be properly aligned with actual power curve.
Logistic Function	The vector parameters that determine its shape	Pros: The logistic function is a simple mathematical expression. Cons: On high dimensional data, it may suffer from over-fitting.
K-Nearest Neighbor	Number of neighbors; Distance metric	Pros: The method is simple and easy to implement. Cons: Neighbors-based methodologies are known as “non-generalizing” methods, as they simply “remember” all the data learned.
Decision Tree Regression	Criterion; Strategy to split node; Depth of the tree	Pros: The model learns simple “decision rules”. Cons: The deeper the tree, the more complex the “decision rules”.
Random Forest Regression	Number of estimators; Criterion; Depth of the tree	Pros: The model learns simple “decision rules”. Cons: The deeper the tree, the more complex the “decision rules”.
Support Vector Regression	Kernel; Gamma; Regularization; Epsion	Pros: The model is suitable for small size data. Cons: Not suitable for large size data; also, decision model may underperform on noisy data.
Artificial Neural Network	Hidden layers size; Activation function; Optimizer; Learning rate	Pros: The model accuracy can be enhanced by adding layers (deep neural network). Cons: The deeper the neural network, the more parameters to be processed which may result in time-cost concerns.
Copulas	Parameters based on correlations	Pros: Allows the “marginal distributions” and the “dependency structure” to be specified separately. Cons: In higher dimensions, the copula model may lose some useful details.
Gaussian Process	The kernel specifying the covariance function	Pros: The prediction interpolates the observations. Cons: It may lose efficiency in high dimensional spaces when the number of features exceeds a few dozens.
Markov Process	The previous value prior to the current	Pros: Simplicity due to its non-memory property. Cons: Does not consider historical events.
Clustering Center Fuzzy Logic	The number of clusters	Pros: The methodology is simple and easy to implement. Cons: Fuzzy-based applications require extreme human expertise.
ANFIS	Fuzzy rules; Membership function	Pros: Faster training. Cons: Fuzzy-based applications require extreme human expertise.
Ensemble Method	Depends on the constituents	Pros: Different approaches are employed to solve the same problem. Cons: May suffer from time-cost issues.
Method of Binning	Bin size	Pros: The method is simple and easy to implement. Cons: May cause loss of information.
Weibull	Shape; Scale	Pros: The ability to assume the characteristics of many different types of distributions. Cons: May not be able to produce anomaly and fault signatures.
Maximum Likelihood Estimation	The unknown value that maximizes the likelihood function	Pros: The model enables a consistent technique for parameter estimation problems. Cons: It can be heavily “biased” in cases of small samples. The optimality properties may not be applicable for small samples.
Monte Carlo Simulation	Probability weights	Pros: The model can simulate data for long term assessment. Cons: Ineffective parameters and constraints may lead to poor results.

Table 7. Research articles on parametric and non-parametric models.

Ref.	Author(s) and Year	Modeling Techniques
[113]	(Anahua, et al., 2008)	Power curve modeling using “Markov Process”.
[114]	(Üstüntaş & Şahin, 2008)	Power curve estimation based on “Clustering Center Fuzzy Logic”.
[99]	(Kusiak, et al., 2009)	Combining “KNN” and “Principal Component Analysis” for modeling power curve.
[38]	(Kusiak, et al., 2009)	“Least Squares”, “Maximum Likelihood Estimation”, and “KNN” for modeling.
[119]	(Albadi & El-Saadany, 2010)	Power curve modeling through “linear, cubic, and quadratic models”.
[104]	(Stephen, et al., 2011)	Power curve modeling based on “Copula”.
[43]	(Marvuglia & Messineo, 2012)	Generalized Mapping, Multi-Layer Perceptron, General Regression Neural Network.
[55]	(Gill, et al., 2012)	A probabilistic model of a power curve based on “Copulas”.
[96]	(Liu, 2012)	Power curve model with interpolation formula. Linear, Quadratic, and Cubic models.
[44]	(Schlechtingen, et al., 2013)	“Clustering Center Fuzzy Logic”, “ANN”, “KNN”, and “ANFIS”.
[4]	(Shokrzadeh, et al., 2014)	Polynomial, Locally Weighted Polynomial, Spline, and Penalized Spline Regression.
[95]	(Chang, et al., 2014)	Linear, quadratic, cubic, and general model with Weibull distribution of wind speed.
[78]	(Wang, et al., 2014)	Power curve modeling based on “Copula”.
[18]	(Lee, et al., 2015)	A Kernel Plus method for power curve model.
[98]	(Villanueva & Feijóo, 2016)	“Logistic Function” (with 4-parameters) for power curve modeling.
[118]	(Villanueva & Feijóo, 2016)	“Monte Carlo-based simulation” to reproduce data following the normal power curve.
[93]	(Taslimi-Renani, et al., 2016)	A parametric model using modified hyperbolic tangent to characterize power curve.
[103]	(Pelletier, et al., 2016)	Power curve modeling based on “ANN”.
[100]	(Janssens, et al., 2016)	Non-parametric, asstochastic gradient boosting, random forest, KNN, and binning.
[116]	(Cambron, et al., 2016)	Exponentially and Generally Weighted Moving Average control charts, and binning.
[120]	(Marčiukaitis, et al., 2017)	A non-linear regression model for wind turbine power curve approximation.
[102]	(Ouyang, et al., 2017)	Power curve modeling based on centers of data partitions and SVR.
[94]	(Chen, et al., 2018)	A mathematical approach: “linear, quadratic, cubic, logistic function”.
[121]	(You, et al., 2018)	A “Bayesian hierarchical” framework to model power curves.
[115]	(González-Carrato, 2018)	Using “fuzzy clustering” and “Mahalanobis” distance for modeling.
[106]	(Pandit & Infield, 2018)	A “Gaussian process” algorithm for power curve modeling.
[107]	(Pandit & Infield, 2019)	A study on “Gaussian Process” power curve models.
[101]	(Pandit, et al., 2019)	Gaussian Process, Random Forest Regression, and SVM for power curve modeling.
[97]	(Seo, et al., 2019)	Power curve modeling using “Weibull”, “Logistic Function” and “MLE”.
[122]	(Wang, et al., 2019)	Heteroscedastic spline and robust spline regression models for power curve.
[109]	(Virgolino, et al., 2020)	A probabilistic semi-parametric model using “Gaussian process” for power curve.
[92]	(Mehrjoo, et al., 2020)	Non-parametric techniques: tilting and monotonic spline regression methodology.
[110]	(Guo & Infield, 2020)	A multivariable power curve model with Cholesky decomposition Gaussian process.
[108]	(Pandit, et al., 2020)	A “Gaussian process” power curve model, incorporating operational variables.
[64]	(Han, et al., 2020)	Power curve modeling method based on interval extreme probability density.
[66]	(Moreno, et al., 2020)	An ensemble learning approach for anomaly detection based on power curves.
[111]	(Rogers, et al., 2020)	A probabilistic modelling of power curve using a heteroscedastic Gaussian Process.
[67]	(Wang, et al., 2020)	Asymmetric Error, mixture of asymmetric Gaussian and asymmetric exponential.
[123]	(Saint-Drenan, et al., 2020)	A parametric model for power curves incorporating environmental conditions.
[124]	(Karamichailidou, et al., 2021)	Power curve modeling using radial basis function neural networks and tabu search.
[117]	(Yun & Hur, 2021)	Weibull, Monte-Carlo simulation, and spatial interpolation based on Ordinary Kriging.
[125]	(Xu, et al., 2021)	A quantile model, which generates a series of power curves under any confidence level.
[126]	(Zou, et al., 2021)	Asymmetric error characteristic-based loss function and hybrid intelligent optimizer.
[127]	(Wang, et al., 2021)	Multiple spline regression with Gaussian and Student’s t-distribution for modeling.
[105]	(Wei, et al., 2022)	Power curve modeling based on hybrid Copula and Grey Wolf optimization algorithm.
[21]	(Bilendo, et al., 2022)	Power curve modeling based on “stacking regressions”.
[112]	(Bull, et al., 2022)	A mixture of Gaussian Processes which infers multivalued wind-power relationships.
[76]	(Souza & Santos, 2022)	Modeling using a “Self-Organizing Map” with K winners and Local Linear Mapping.
[128]	(Yang, et al., 2022)	Power curve modeling based on “Deep convolutional network”.

Table 8. Commonly used performance metrics.

Method	Equation
Mean absolute error (MAE)	$MAE = \frac{1}{n} \sum_{i = 1}^{n} \| y_{i} - {\hat{y}}_{i} \|$
Root mean squared error (RMSE)	$RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}$
Mean absolute percentage error (MAPE)	$MAPE = \frac{1}{n} \sum_{i = 1}^{n} \frac{\| y_{i} - {\hat{y}}_{i} \|}{\| y_{i} \|} \times 100 %$
R-Squared $(R^{2})$ score	$R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}$

Table 9. Main trends over the years.

Years	Main Trends
–2008	Onshore wind farm development studies: Site efficiency, capacity factor estimation, power estimation, and matching between wind turbine models and site characteristics. An extensive use of classical methods such as Weibull in most wind farm development studies.
2009–2010	Power curve modeling (based on linear, quadratic, and cubic method). Wind turbine condition monitoring, in particular, online condition monitoring of power curve and online fault detection through SCADA data and classical machine learning methods. The need for filtering SCADA data.
2011–2012	Offshore wind farm development studies. Wind turbine performance assessment, prediction, and diagnosis of faults.Techniques and methods for condition monitoring, mathematical modeling, machine learning-based techniques, and probabilistic models. Short-term wind power prediction.
2013–2014	Investigations on the most appropriate and advanced algorithms for power curve modeling. Analysis on parametric and nonparametric power curve modeling techniques. Investigations on power curve deviations and identification of the type of anomaly and fault indication. Combined approaches for short-term power prediction. Monitoring wind farms through power curve. Condition monitoring based on SCADA data using normal behavior model.
2015–2016	Quantification of performance upgrades for wind turbines and uncertainty analysis of wind power forecasting. Investigation of wind performance through SCADA data along with raw wind data preprocessing and normal behavior models. Offshore power monitoring, failure rate, repair time, and unscheduled O&M cost analysis. Power curve monitoring through control charts, and modeling using artificial neural networks. Identification and correction of outliers in wind farm time series power data.
2017–2018	Deep learning, control charts, and hybrid wind turbine power monitoring. Extensive use of SCADA data for modeling, performance analysis, monitoring, fault prediction, and diagnosis. Data correction approaches to refine power curve data.
2019–2020	Data cleaning with combined algorithms, abnormal data detection, and power curve modeling and forecasting with inconsistent data. High-fidelity and reliable power curve modeling. Multivariate process monitoring along with incorporating environmental conditions in modeling power curve. Using probabilistic and ensemble models.
2021–2022	Power curve modeling methods mostly based on SCADA data. Data cleaning and outlier detection along with an extensive use of advanced machine learning, deep leaning, probabilistic, ensemble, and hybrid algorithms to improve models’ accuracy. Implementation of power curve-based anomaly and fault detection, and advanced approaches such as multi-target models to improve model’s efficiency.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bilendo, F.; Meyer, A.; Badihi, H.; Lu, N.; Cambron, P.; Jiang, B. Applications and Modeling Techniques of Wind Turbine Power Curve for Wind Farms—A Review. Energies 2023, 16, 180. https://doi.org/10.3390/en16010180

AMA Style

Bilendo F, Meyer A, Badihi H, Lu N, Cambron P, Jiang B. Applications and Modeling Techniques of Wind Turbine Power Curve for Wind Farms—A Review. Energies. 2023; 16(1):180. https://doi.org/10.3390/en16010180

Chicago/Turabian Style

Bilendo, Francisco, Angela Meyer, Hamed Badihi, Ningyun Lu, Philippe Cambron, and Bin Jiang. 2023. "Applications and Modeling Techniques of Wind Turbine Power Curve for Wind Farms—A Review" Energies 16, no. 1: 180. https://doi.org/10.3390/en16010180

APA Style

Bilendo, F., Meyer, A., Badihi, H., Lu, N., Cambron, P., & Jiang, B. (2023). Applications and Modeling Techniques of Wind Turbine Power Curve for Wind Farms—A Review. Energies, 16(1), 180. https://doi.org/10.3390/en16010180

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applications and Modeling Techniques of Wind Turbine Power Curve for Wind Farms—A Review

Abstract

1. Introduction

2. Wind Turbine Power Curve

2.1. Ideal Power Curve

2.2. Actual Power Curve

3. Applications of Power Curve

3.1. Wind Turbine Selection

3.2. Capacity Factor Estimation

3.3. Wind Energy Assessment and Forecasting

3.4. Condition Monitoring

4. Anomaly and Fault Signatures

4.1. Indications of Suboptimal Performance

4.2. Most Common Anomaly and Fault Types

5. Data Preprocessing and Correction

5.1. Filtering Approach

5.2. Clustering Approach

5.3. Isolation Approach

5.4. Other Approaches

6. Modeling Techniques

6.1. State of the Art Methods

6.1.1. Parametric Algorithms

6.1.2. Non-Parametric Algorithms

6.1.3. Other Algorithms

6.2. Performance Metrics

7. Overall Assessment: Past, Present, and Future

8. Discussion and Prospects

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI