A Review of Approaches for the Detection and Treatment of Outliers in Processing Wind Turbine and Wind Farm Measurements
Abstract
:1. Introduction
2. Wind Turbine and Wind Farm Measurements
- Analysis of whole WF performance, using aggregation of measurements of all WTs in the same WF, i.e., not using directly measured single-sensor WF data [14,15]. For example, Ref. [14] applies principal component analysis (PCA, [54]) on the measurements from 89 individual WTs in a WF, which are processed to reduce dimensionality and obtain an equivalent WS for the whole WF, while equivalent Pout WF values are calculated from the Pout measurements of 89 WTs. In [15], average values of Pout, WS, rotor speed and blade pitch angle are collected from 22 individual WTs and then used to obtain three PC models for the whole WF: (i) power curve model (Pout-WS relationship), (ii) rotor curve model (rotor speed-WS relationship), and (iii) blade pitch curve model (blade pitch angle-WS relationship);
- Analysis of both WT and WF performance: [20,50]. For example, a data set consisting of 20 WTs in a WF is used in [20] to propose an outlier elimination approach for WT measurements, which is then applied to the WF’s Pout-WS scatter-plot data, without specifying the method for obtaining the WF measurements (an example illustrating WF’s Pout-WS scatter-plots can be found in Section 3.3). The measurements of both individual 274 WTs (Pout, WS and turbine availability binary data, which are sampled every 10 min) and of the whole WF (WS, WD, Pout, etc.) are used in [50], with a focus on the whole WF performance. Individual WT availability data are used to help identification of outliers in the WF measurements, underlining the difference between the aggregated WF Pout (from the individual WTs’ Pout data) and single-point measured whole WF Pout, which is attributed to losses.
- Measurements of additional variables are used for the transformation of Pout or WS data, e.g., based on guidance in [21] for data normalisation: in [24], temperature, humidity and air pressure are used to obtain the air density and then to transform Pout data; similar approach is applied in [50] to transform WS data. In [23,34], the same WS or Pout corrections as in [24,50] are used to eliminate the impact of the variance of WS fluctuations by expanding the PC into a Taylor series with the higher order terms neglected. Another example is use of turbulence intensity, as defined in [21], which has significant impact on ten-minute averaged WS and Pout measurements (see Section 3.2.2). Typically, the increased turbulence intensity leads to an increase of power output at the lower wind speeds (“ankle” of the power curve), while at the higher wind speeds (“knee” of the power curve), increased turbulence intensity results in a decrease of power output. Accordingly, a procedure for normalising PC data to a reference turbulence intensity from [21] is used in [55]. In that paper, a simulation-based approach and normal distribution (location parameter is the average wind speed within 10 min, scale parameter is the standard deviation of the wind speed within 10 min) of WS values within a 10-min window (and some other assumptions, for example, that at each instant the WT follows a zero turbulence PC) are used instead of the Taylor-series approach [23,34] (both approaches are presented in [55,56]).
- Examples of use of additional data where efforts are also made to detect potential outliers in these measurements are: [7], where the importance of ambient temperature and wind direction is emphasised, as temperature has the biggest influence on air density and, in turn, Pout (up to 20%) and [15,39,47], where various multivariate outlier detection approaches were applied on all considered variables (Pout, WS, rotor speed, blade pitch angle, etc.), as well as [27,50], which tried to detect outliers by analysing them separately in one dimension and simultaneously in multi-dimensional space. Finally, Ref. [53] does not explicitly present the outlier detection method, but it employs several parameters (Pout, WS, WD, rotor speed, gear temperature, blade pitch angle) and treats the outliers in these parameters as missing values.
- The outliers are analysed in the measured high frequency vibration data (sampling frequency equal to 25.6 kHz) for health condition monitoring in [33], where they are collected via an accelerometer installed on the drivetrain structure of a WT.
- The outliers are considered for cyber-security analysis of false data injection attacks for two WFs (each with 18 WTs) using real-world hourly WS information in [44], but Pout values of WTs are calculated with a deterministic piecewise linear power curve.
- Robust statistical techniques that may reduce the effects of outliers and could be used as nominal baselines are proposed in [30,32,52], but the specific outlier detection methods are neither stated, nor used. Ref. [30] uses 12 kHz vibration data of WT bearings; Ref. [32] analyses 10-min average WS, Pout and blade bending moment data of WT; Ref. [52] builds models using 5-min average WF Pout data. These models are reviewed in more detail in Section 4.
3. Definitions, Characteristics and Causes of Outliers
3.1. Definitions of Outliers
3.2. Common Causes of Outliers in WT and WF Measurements
- WT downtime, offline operation, or outage: [48]
- Data management failure: [20]
- WT blade damage, or excessive deposition of dirt, icing or insects: [20]
- Wind shear: [50]
- Extreme weather conditions, harsh environment: [40]
- Wind speed (around the cut-in or cut-out wind speeds): [14]
- WT data availability issues: [50]
3.2.1. Outliers Related to Data Acquisition, Data Transfer and Data Management Errors
3.2.2. Outliers Determined by the Operational Logs and Event-Logging Systems
3.2.3. Outliers due to Applied Period of Averaging (Averaging Window)
3.2.4. Outliers due to Cut-Out Effects
3.3. Characteristics and Causes of Outliers in Processing WT and WF Measurement Data
- Low Pout—High WS Outliers: At WT-level, these outliers are characterised with zero, or very low, or even negative Pout values when the corresponding/synchronous WS values are between the cut-in and cut-out wind speeds, i.e., when WT is expected to generate non-negligible output power. These outliers are denoted as “bottom-curve stacked outliers” and “stacked data at the bottom of the curve”, typically characterised by horizontal dense data band in the PC-based model [29,39]. Their causes include WT failure or outage, unplanned WT maintenance, and faults of the WT measurement and/or communication systems [49]. Similarly, [24] stipulates that outliers representing Pout data close to, lower than, or equal to zero are exclusively due to wrong measurements or WT malfunctions (see Figure 2a and outliers marked with A). If the WT does not rotate, Pout should be zero, but if the WT’s control system is kept energised, the Pout measurements might be negative [26]. At WF-level, when data are obtained directly, these outliers are most likely true outliers due to measurement errors or communication system errors [48], or disconnection of the hole WF due to activation of the protection system (see Figure 1, outliers marked with ①). When WF-level data are obtained from the aggregation of individual WT measurements, these outliers are highly unlikely to be present (see Figure 2b, outliers marked with A). Regardless of WT-level or WF-level analysis, some of these outliers may be hidden, e.g., for low WS values.
- High Pout—Low WS Outliers: At WT-level, these outliers are typically manifested as the horizontally stacked data with a narrow range of variations of relatively high Pout values (e.g., close to 1pu), which are clearly visible when they are above the upper boundary of usual/expected/regular reference WT PC values, i.e., for given WS values, these outliers represent higher than expected and relatively constant Pout values. These outliers are denoted as “top-curve stacked outliers” and “stacked data at the top of the curve”, which appear to be one or more horizontal dense data bands, located above and to the left of the regular PC-based WT model (see Figure 1, outliers marked with ③) [29,39]. These outliers are usually caused by communication errors, or wind speed sensor failures ([29] states that faults of WS sensors happen frequently), e.g., when lower than actual WS values are recorded. When WF-level data are obtained directly, these outliers are similar to WT-level outliers [40,48], but may have different Pout values, based on the number of operational WTs in the WF. When WF-level data are obtained from the aggregation of individual WT measurements, these outliers may or may not be visible, based on the operating point of the WT(s) with faulted measurement/communication system, as well as the (aggregated) operational point of the whole WF.
- Low Pout_max Outliers: These outliers are again represented by horizontal bands/ranges of relatively constant Pout values, which are below the lower boundary of the expected range of PC values (see outliers marked as B in Figure 2a,b). These outliers are denoted as “mid-curve stacked outliers” and “stacked data in the middle of the curve” [29,39]. Mentioned causes include curtailment, but also down-rating of WTs and data acquisition and communication system errors [20,49]. In some cases, improper WT operation, for example due to a damage of WT gearbox bearing [27], may also limit the maximum WT Pout (e.g., restricted to 60% of the rated maximum power). When WF-level data are obtained directly, these outliers are similar to WT-level outliers [48] (see outliers marked by ② in Figure 1), but may have different Pout_max values, based on the number of operational or curtailed WTs in the WF ([50]) and may also include communication and measurement errors of single Pout and WS sensors [48]. For aggregated WF-level data, measurement and communication errors of individual WTs will most likely make these outliers to be hidden, but there may be many horizontal bands, based on the actual number of WTs that are curtailed or down-rated. An important feature of this type of outliers, distinguishing them from the next discussed type, is that there are no big differences in WF power outputs up to the point of curtailment. In terms of true/false outlier analysis, if curtailment (or downrated/derated) operation should be included in the WT/WF model, these outliers are false outliers.
- Shifted PC Outliers: At WF-level, these outliers are not consequence of curtailment or down-rating of individual WTs, but are due to the outages or faults of individual WTs in a WF, reflected in shifting of the PC to the right, i.e., corresponding to the situations in which both maximum power output is reduced and higher WS’s are required to produce the Pout values close to these when all WTs are in operation (see outliers marked as C in Figure 2b). A WT outage is defined as the tripping of n WTs in [48], for e.g., protection reasons or unplanned maintenance. Similar outliers may be recorded for an individual WT, when PC is shifted to the left, indicating improper operation, or problem with WT control system, or damaged WT, or inaccurate speed sensor readings, as illustrated in [31].
- Linear Pout—WS Outliers: These WT-level outliers appear as linearly related Pout-WS recordings, possibly occurring during the data processing phase, when linear interpolation is applied to populate missing recordings (see outliers marked as C in Figure 2a). Similar outliers are reported due to malfunction of pitch-control system and dirt deposits on the blades [31]. These outliers may occur at WF-level when data are obtained directly (data recording errors), but they are highly unlikely to be visible when WF-level data are obtained from the aggregation of a larger number of individual WT measurements.
- Scattered Outliers: These outliers are related to irregular or random values around the expected/reference PC range (Figure 1, marked as ④), which may be due to faults and errors, but also due to statistical processing and averaging window. These outliers are called “around-curve outliers” in [29] (in [39], “scattered data around the curve”) and can be caused by random factors, like signal propagation noise, sensor failure and extreme weather conditions. Alternative term is “sparse outliers” (due to random noise) [20]. Also, [37] shows these outliers together with other types of outliers in WF measurements. The reasons for these abnormal data are sensor failure, sensor noise and some uncontrolled random factors [49]. At WT-level and when WF-level data are obtained directly, these outliers have similar causes. When WF-level data are obtained from the aggregation of individual WT measurements, their dominant cause are averaging window and similar statistical processing-based origins (see Section 3.2.3). These outliers are termed as “unnatural data” [40], where they are analysed separately from other outliers (these sparsely located data can also be seen in Figure 3).
- Constant WS-Variable Pout Outliers: These outliers are reported at WF level and are manifested as vertical bands of constant WS values (or only with small WS variations) from the MET towers recorded over a longer period with synchronously recorded relatively large Pout variations, occurring due to errors in data acquisition and data transmission (polluted data) [40,50] (see Figure 3). These constant WS outliers are more strongly pronounced at WT-level and when WF-level data are obtained directly, but may be less pronounced when WF-level data are obtained from the aggregation of individual WT measurements. A variant of the constant outliers is denoted as “slender” in [40], related to approximately vertical band around the cut-out wind speed, explained by the wake effects, i.e., that the WTs within a WF do not cut out together near the cut-out wind speed, because the wind speeds at each WT vary from the measured WS at the MET mast [40]. It is emphasised that these data should be categorised as valid data, as they reflect the natural Pout fluctuation property of the WF around the cut-out WS, which is important to the system operators. However, it is not clear why these outliers are not scattered within an oblique band with a negative slope (as shown in Figure 2b) for outliers marked with E), but are clustered in a relatively narrow vertical band near the cut-out wind speed (wake effect outliers as discussed in [40]). When WF-level single-sensor measured WS is around the cut-out WS, most likely situation is that some of the WTs in the WF will be operating (their WS is below the cut-out wind speed), while some WTs will be stopped (as their WS is above the cut-out wind speed) and therefore the 10-min average Pout values for the whole WF will be between rated power and zero when the single-sensor measured WS’s are around cut-out WS. According to [40], it is necessary to distinguish “(invalid) unnatural” outliers with these “valid data”, but this can be a difficult task, since both types of data may be caused by WT cut-out effects. The methods and results of [40] are further discussed and analysed in Section 4 and Section 5.
3.4. Data Rejection Requirements in IEC Standard 61400-12-1
- When external conditions, other than wind speed, are out of the specified WT operating range;
- If WT cannot operate because of fault condition;
- When WT is manually shut down, or it is in a test, or maintenance operating modes;
- If there is a failure or degradation (e.g., due to icing) of measurement equipment;
- When WD is outside the measurement sector(s), which generally exclude WDs with significant obstacles and other wind turbines, as seen from both the WT and measurement equipment;
- When WDs are outside the valid (complete) site calibration sectors;
- For any special atmospheric condition filtered during the site calibration, which shall also be filtered during the power curve test.
4. Detection of Outliers in Processing Wind Turbine and Wind Farm Measurements
4.1. Statistical Methods for Outlier Detection
4.1.1. Wind Turbine Measurements
4.1.2. Wind Farm Measurements
4.2. Physical Constraint-Based Methods for Outlier Detection
4.3. Outlier Detection by Combinations of Statistical and Physical Constraints-Based Methods
4.4. Robust Statistical Models for Reducing Outlier Effects
5. Approaches for Testing Success of Outlier Detection and Removal/Treatment Procedures
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
ANFIS | Adaptive neuro-fuzzy inference |
DBSCAN | Density-based spatial clustering of applications with noise |
f(X|Y) | The deterministic relationship between predictor X and response Y |
FCM | Fuzzy c-means |
FN | False negative |
FP | False positive |
GARCH | Generalised autoregressive conditional heteroskedasticity |
GMCM | Gaussian mixture copula model |
GW | Gigawatt |
HMM | Hidden Markov model |
IQR | Interquartile range |
k-CV | k-cross validation |
KLOF | Kernel-based local outlier factor |
KNN | k-nearest neighbour |
kW | Kilowatt |
LMedS | Least median of squares |
LOF | Local outlier factor |
LS-SVR | Least squares support vector regression |
MAD | Median absolute deviation |
MAR | Missing at random |
MCAR | Missing completely at random |
MET | Meteorological |
Mfr-PC | Manufacturer’s power curve |
MLP | Multilayer perceptron |
MMO | Mathematical morphology operation |
MNAR | Missing not at random |
NI | Nonignorable |
OSTAR | Outlier smooth transition autoregressive |
PC | Power curve |
PCA | Principal component analysis |
Pout | Power output |
Pout_max | Maximum power output |
Prob(X, Y) | The joint probability of X and Y |
Prob(X|Y) | The conditional probability of X for given Y |
SDAE | Stacked denoising autoencoder |
STW | Sliding time window |
TN | True negative |
TP | True positive |
WD | Wind direction |
WF | Wind farm |
WS | Wind speed |
WT | Wind turbine |
References
- Papaefthymiou, G.; Kurowicka, D. Using Copulas for Modeling Stochastic Dependence in Power System Uncertainty Analysis. IEEE Trans. Power Syst. 2009, 24, 40–49. [Google Scholar] [CrossRef] [Green Version]
- Fang, D.; Zou, M.; Djokic, S. Probabilistic OPF Incorporating Uncertainties in Wind Power Outputs and Line Thermal Ratings. In Proceedings of the International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), Boise, ID, USA, 24–28 June 2018. [Google Scholar]
- Zou, M.; Fang, D.; Djokic, S.; Di Giorgio, V.; Langella, R.; Testa, A. Evaluation of Wind Turbine Power Outputs with and without Uncertainties in Input Wind Speed and Wind Direction Data. IET Renew. Power Gener. 2020. [Google Scholar] [CrossRef]
- Watson, S.J.; Xiang, B.J.; Yang, W.; Tavner, P.J.; Crabtree, C.J. Condition Monitoring of the Power Output of Wind Turbine Generators Using Wavelets. IEEE Trans. Energy Convers. 2010, 25, 715–721. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Z.; Sun, Y.; Gao, D.W.; Lin, J.; Cheng, L. A Versatile Probability Distribution Model for Wind Power Forecast Errors and Its Application in Economic Dispatch. IEEE Trans. Power Syst. 2013, 28, 3114–3125. [Google Scholar] [CrossRef]
- Ghofrani, M.; Arabali, A.; Etezadi-Amoli, M.; Fadali, M.S. Energy Storage Application for Performance Enhancement of Wind Integration. IEEE Trans. Power Syst. 2013, 28, 4803–4811. [Google Scholar] [CrossRef]
- Schlechtingen, M.; Santos, I.F.; Achiche, S. Using Data-Mining Approaches for Wind Turbine Power Curve Monitoring: A Comparative Study. IEEE Trans. Sustain. Energy 2013, 4, 671–679. [Google Scholar] [CrossRef]
- Lydia, M.; Kumar, S.S.; Selvakumar, A.I.; Prem Kumar, G.E. A comprehensive review on wind turbine power curve modeling techniques. Renew. Sustain. Energy Rev. 2014, 30, 452–460. [Google Scholar] [CrossRef]
- Carrillo, C.; Obando, M.A.F.; Cidrás, J.; Díaz-Dorado, E. Review of power curve modelling for wind turbines. Renew. Sustain. Energy Rev. 2013, 21, 572–581. [Google Scholar] [CrossRef]
- Hayes, B.P.; Ilie, I.; Porpodas, A.; Djokic, S.Z.; Chicco, G. Equivalent power curve model of a wind farm based on field measurement data. In Proceedings of the 2011 IEEE Trondheim PowerTech, Trondheim, Norway, 19–23 June 2011. [Google Scholar]
- Ye, L.; Zhao, Y.; Zeng, C.; Zhang, C. Short-term wind power prediction based on spatial model. Renew. Energy 2017, 101, 1067–1074. [Google Scholar] [CrossRef]
- Xu, M.; Pinson, P.; Lu, Z.; Qiao, Y.; Min, Y. Adaptive robust polynomial regression for power curve modeling with application to wind power forecasting. Wind Energy 2016, 19, 2321–2336. [Google Scholar] [CrossRef]
- Gill, S.; Stephen, B.; Galloway, S. Wind Turbine Condition Assessment Through Power Curve Copula Modeling. IEEE Trans. Sustain. Energy 2012, 3, 94–101. [Google Scholar] [CrossRef] [Green Version]
- Kusiak, A.; Zheng, H.; Song, Z. Models for monitoring wind farm power. Renew. Energy 2009, 34, 583–590. [Google Scholar] [CrossRef]
- Kusiak, A.; Verma, A. Monitoring Wind Farms With Performance Curves. IEEE Trans. Sustain. Energy 2013, 4, 192–199. [Google Scholar] [CrossRef]
- Sarkar, S.; Ajjarapu, V. MW Resource Assessment Model for a Hybrid Energy Conversion System With Wind and Solar Resources. IEEE Trans. Sustain. Energy 2011, 2, 383–391. [Google Scholar] [CrossRef]
- Dhungana, D.; Karki, R. Data Constrained Adequacy Assessment for Wind Resource Planning. IEEE Trans. Sustain. Energy 2015, 6, 219–227. [Google Scholar] [CrossRef]
- Taslimi-Renani, E.; Modiri-Delshad, M.; Elias, M.F.M.; Rahim, N.A. Development of an enhanced parametric model for wind turbine power curve. Appl. Energy 2016, 177, 544–552. [Google Scholar] [CrossRef]
- Wang, S.; Zhang, X.; Ge, L.; Wu, L. 2-D Wind Speed Statistical Model for Reliability Assessment of Microgrid. IEEE Trans. Sustain. Energy 2016, 7, 1159–1169. [Google Scholar] [CrossRef]
- Zhao, Y.; Ye, L.; Wang, W.; Sun, H.; Ju, Y.; Tang, Y. Data-Driven Correction Approach to Refine Power Curve of Wind Farm Under Wind Curtailment. IEEE Trans. Sustain. Energy 2018, 9, 95–105. [Google Scholar] [CrossRef]
- BS EN 61400-12-1:2017. Wind Energy Generation Systems. Part 12-1: POWER Performance Measurements of Electricity Producing Wind Turbines; BSI Standards Publication: London, UK, 2017. [Google Scholar]
- Kim, H.; Singh, C.; Sprintson, A. Simulation and Estimation of Reliability in a Wind Farm Considering the Wake Effect. IEEE Trans. Sustain. Energy 2012, 3, 274–282. [Google Scholar] [CrossRef]
- Sainz, E.; Llombart, A.; Guerrero, J.J. Robust filtering for the characterization of wind turbines: Improving its operation and maintenance. Energy Convers. Manag. 2009, 50, 2136–2147. [Google Scholar] [CrossRef]
- Villanueva, D.; Feijóo, A. Normal-Based Model for True Power Curves of Wind Turbines. IEEE Trans. Sustain. Energy 2016, 7, 1005–1011. [Google Scholar] [CrossRef]
- Wang, S.; Huang, Y.; Li, L.; Liu, C. Wind turbines abnormality detection through analysis of wind farm power curves. Measurement 2016, 93, 178–188. [Google Scholar] [CrossRef]
- Javadi, M.; Malyscheff, A.M.; Wu, D.; Kang, C.; Jiang, J.N. An algorithm for practical power curve estimation of wind turbines. CSEE J. Power Energy Syst. 2018, 4, 93–102. [Google Scholar] [CrossRef]
- Schlechtingen, M.; Ferreira Santos, I. Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection. Mech. Syst. Signal Process. 2011, 25, 1849–1875. [Google Scholar] [CrossRef] [Green Version]
- Stephen, B.; Galloway, S.J.; McMillan, D.; Hill, D.C.; Infield, D.G. A Copula Model of Wind Turbine Performance. IEEE Trans. Power Syst. 2011, 26, 965–966. [Google Scholar] [CrossRef] [Green Version]
- Shen, X.; Fu, X.; Zhou, C. A Combined Algorithm for Cleaning Abnormal Data of Wind Turbine Power Curve Based on Change Point Grouping Algorithm and Quartile Algorithm. IEEE Trans. Sustain. Energy 2019, 10, 46–54. [Google Scholar] [CrossRef]
- Zhao, Y.; Liu, Y.; Wang, R. Fuzzy scalar quantisation based on hidden Markov model and application in fault diagnosis of wind turbine. J. Eng. 2017, 2017, 2685–2689. [Google Scholar] [CrossRef]
- Park, J.; Lee, J.; Oh, K.; Lee, J. Development of a Novel Power Curve Monitoring Method for Wind Turbines and Its Field Tests. IEEE Trans. Energy Convers. 2014, 29, 119–128. [Google Scholar] [CrossRef]
- Yampikulsakul, N.; Byon, E.; Huang, S.; Sheng, S.; You, M. Condition Monitoring of Wind Power System With Nonparametric Regression Analysis. IEEE Trans. Energy Convers. 2014, 29, 288–299. [Google Scholar]
- Xu, X.; Lei, Y.; Li, Z. An Incorrect Data Detection Method for Big Data Cleaning of Machinery Condition Monitoring. IEEE Trans. Ind. Electron. 2020, 67, 2326–2336. [Google Scholar] [CrossRef]
- Cambron, P.; Lepvrier, R.; Masson, C.; Tahan, A.; Pelletier, F. Power curve monitoring using weighted moving average control charts. Renew. Energy 2016, 94, 126–135. [Google Scholar] [CrossRef]
- Kusiak, A.; Zheng, H.; Song, Z. On-line monitoring of power curves. Renew. Energy 2009, 34, 1487–1493. [Google Scholar] [CrossRef]
- Pinson, P.; Madsen, H. Adaptive modelling and forecasting of offshore wind power fluctuations with Markov-switching autoregressive models. J. Forecast. 2012, 31, 281–313. [Google Scholar] [CrossRef] [Green Version]
- Mangalova, E.; Agafonov, E. Wind power forecasting using the k-nearest neighbors algorithm. Int. J. Forecast. 2014, 30, 402–406. [Google Scholar] [CrossRef]
- Lowery, C.; Malley, M.O. Impact of Wind Forecast Error Statistics Upon Unit Commitment. IEEE Trans. Sustain. Energy 2012, 3, 760–768. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, Q.; Han, S.; Yan, J.; Li, L. The detection method of wind turbine operation outliers based on muti-dimensional cluster. In Proceedings of the 8th Renewable Power Generation Conference (RPG 2019), Shanghai, China, 24–25 October 2019. [Google Scholar]
- Zheng, L.; Hu, W.; Min, Y. Raw Wind Data Preprocessing: A Data-Mining Approach. IEEE Trans. Sustain. Energy 2015, 6, 11–19. [Google Scholar] [CrossRef]
- Wang, Y.; Infield, D.G.; Stephen, B.; Galloway, S.J. Copula based model for wind turbine power curve outlier rejection. Wind Energy 2014, 17, 1677–1688. [Google Scholar] [CrossRef] [Green Version]
- Hu, Y.; Qiao, Y.; Liu, J.; Zhu, H. Adaptive Confidence Boundary Modeling of Wind Turbine Power Curve Using SCADA Data and Its Application. IEEE Trans. Sustain. Energy 2019, 10, 1330–1341. [Google Scholar] [CrossRef]
- Ouyang, T.; Kusiak, A.; He, Y. Modeling wind-turbine power curve: A data partitioning and mining approach. Renew. Energy 2017, 102, 1–8. [Google Scholar] [CrossRef]
- Mohammadpourfard, M.; Sami, A.; Weng, Y. Identification of False Data Injection Attacks With Considering the Impact of Wind Generation and Topology Reconfigurations. IEEE Trans. Sustain. Energy 2018, 9, 1349–1364. [Google Scholar] [CrossRef]
- Yesilbudak, M. Partitional clustering-based outlier detection for power curve optimization of wind turbines. In Proceedings of the 2016 IEEE International Conference on Renewable Energy Research and Applications (ICRERA), Birmingham, UK, 20–23 November 2016. [Google Scholar]
- Zhou, Y.; Hu, W.; Min, Y.; Zheng, L.; Liu, B.; Yu, R.; Dong, Y. A semi-supervised anomaly detection method for wind farm power data preprocessing. In Proceedings of the 2017 IEEE Power & Energy Society General Meeting, Chicago, IL, USA, 16–20 July 2017. [Google Scholar]
- Sun, Z.; Sun, H. Stacked Denoising Autoencoder With Density-Grid Based Clustering Method for Detecting Outlier of Wind Turbine Components. IEEE Access 2019, 7, 13078–13091. [Google Scholar] [CrossRef]
- Ye, X.; Lu, Z.; Qiao, Y.; Min, Y.; Malley, M.O. Identification and Correction of Outliers in Wind Farm Time Series Power Data. IEEE Trans. Power Syst. 2016, 31, 4197–4205. [Google Scholar] [CrossRef]
- Long, H.; Sang, L.; Wu, Z.; Gu, W. Image-Based Abnormal Data Detection and Cleaning Algorithm via Wind Power Curve. IEEE Trans. Sustain. Energy 2020, 11, 938–946. [Google Scholar] [CrossRef]
- Wan, Y.-H.; Ela, E.; Orwig, K. Development of an equivalent wind plant power curve. In Proceedings of the Windpower 2010 Conference & Exhibition, Dallas, TX, USA, 23–26 May 2010. [Google Scholar]
- Paiva, L.T.; Veiga, R.C.; Palma, J.M.L.M. Determining wind turbine power curves based on operating conditions. Wind Energy 2014, 17, 1563–1575. [Google Scholar] [CrossRef]
- Chen, H.; Li, F.; Wang, Y. Wind power forecasting based on outlier smooth transition autoregressive GARCH model. J. Mod. Power Syst. Clean Energy 2018, 6, 532–539. [Google Scholar] [CrossRef] [Green Version]
- Morshedizadeh, M.; Kordestani, M.; Carriveau, R.; Ting, D.S.; Saif, M. Power production prediction of wind turbines using a fusion of MLP and ANFIS networks. IET Renew. Power Gener. 2018, 12, 1025–1033. [Google Scholar] [CrossRef]
- Härdle, W.K.; Simar, L. Principal Components Analysis. In Applied Multivariate Statistical Analysis; Springer: Berlin/Heidelberg, Germany, 2015; pp. 319–358. [Google Scholar]
- Albers, A. Turbulence and Shear Normalisation of Wind Turbine Power Curve. In Proceedings of the European Wind Energy Conference and Exhibition (EWEC), Warsaw, Poland, 20–23 April 2010. [Google Scholar]
- Albers, A.; Jakobi, T.; Rohden, R.; Stoltenjohannes, J. Influence of meteorological variables on measured wind turbine power curves. In Proceedings of the European Wind Energy Conference and Exhibition 2007 (EWEC), Milan, Italy, 7–10 May 2007. [Google Scholar]
- King, G.; Honaker, J.; Joseph, A.; Scheve, K. Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation. Am. Political Sci. Rev. 2001, 95, 49–69. [Google Scholar] [CrossRef] [Green Version]
- Zou, M.; Fang, D.; Djokic, S.; Hawkins, S. Assessment of wind energy resources and identification of outliers in on-shore and off-shore wind farm measurements. In Proceedings of the 3rd International Conference on Offshore Renewable Energy (CORE), Glasgow, UK, 29–30 August 2018. [Google Scholar]
- Gayo, J.B. Reliability focused research on optimizing Wind Energy Systems Design, Operation and Maintenance: Tools, Proof of Concepts, Guidelines & Methodologies for a New Generation (Reliawind); Gamesa Innovation and Technology: Madrid, Spain, 2011. [Google Scholar]
- Vestas. Vestas Wind Systems A/S, Report 950010.R1: General Specification V90–3.0 MW; Vestas: Aarhus, Denmark, 2013. [Google Scholar]
- Gupta, M.; Gao, J.; Aggarwal, C.C.; Han, J. Outlier Detection for Temporal Data: A Survey. IEEE Trans. Knowl. Data Eng. 2014, 26, 2250–2267. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
- Tukey, J.W. Exploratory Data Analysis; Reading, Mass.; Addison-Wesley Pub. Co.: Boston, MA, USA, 1977. [Google Scholar]
- Lange, M. On the Uncertainty of Wind Power Predictions—Analysis of the Forecast Accuracy and Statistical Distribution of Errors. J. Sol. Energy Eng. 2005, 127, 177–184. [Google Scholar] [CrossRef]
- Zhou, Y.; Wan, A.T.K.; Xie, S.; Wang, X. Wavelet analysis of change-points in a non-parametric regression with heteroscedastic variance. J. Econom. 2010, 159, 183–201. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; AAAI Press: Portland, Oregon, 1996; pp. 226–231. [Google Scholar]
- Gao, J.; Hu, W.; Li, W.; Zhang, Z.; Wu, O. Local Outlier Detection Based on Kernel Regression. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010. [Google Scholar]
- Breunig, M.M.; Kriegel, H.-P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Renton, WA, USA, 19–21 October 2020; ACM: Dallas, TX, USA, 2000; pp. 93–104. [Google Scholar]
- Ruiyu, J.; Li, Y. K-means algorithm of clustering number and centers self-determination. Comput. Eng. Appl. 2018, 54, 152–158. [Google Scholar]
- Khodayar, M.; Kaynak, O.; Khodayar, M.E. Rough Deep Neural Architecture for Short-Term Wind Speed Forecasting. IEEE Trans. Ind. Inform. 2017, 13, 2770–2779. [Google Scholar] [CrossRef]
- Lu, J.; Zhu, Q. An Effective Algorithm Based on Density Clustering Framework. IEEE Access 2017, 5, 4991–5000. [Google Scholar] [CrossRef]
- Wu, B.; Wilamowski, B.M. A Fast Density and Grid Based Clustering Method for Data With Arbitrary Shapes and Noise. IEEE Trans. Ind. Inform. 2017, 13, 1620–1628. [Google Scholar] [CrossRef]
- Witten, I.H.; Frank, E.; Hall, M.A. Chapter 3—Output: Knowledge Representation. In Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Witten, I.H., Frank, E., Hall, M.A., Eds.; Morgan Kaufmann: Boston, MA, USA, 2011; pp. 61–83. [Google Scholar]
- Mitra, A. Fundamentals of Quality Control. and Improvement, 3rd ed.; John Wiley & Sons, Inc: Hoboken, NJ, USA, 2008. [Google Scholar]
- Sklar, A. Fonctions de Répartition à n Dimensions et Leurs Marges [N-Dimensional Joint and Marginal Distribution Functions]; Université Paris 8: Paris, France, 1959; pp. 229–231. [Google Scholar]
- Tewari, A.; Giering, M.J.; Raghunathan, A. Parametric Characterization of Multimodal Distributions with Non-gaussian Modes. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops, Vancouver, BC, Canada, 11 December 2011. [Google Scholar]
- Brown, R.G.; Meyer, R.F. The Fundamental Theorem of Exponential Smoothing. Oper. Res. 1961, 9, 673–685. [Google Scholar] [CrossRef]
- Markowski, C.A.; Markowski, E.P. Conditions for the Effectiveness of a Preliminary Test of Variance. Am. Stat. 1990, 44, 322–326. [Google Scholar]
- Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar] [CrossRef]
- Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 2013, 49, 764–766. [Google Scholar] [CrossRef] [Green Version]
- Haralick, R.M.; Sternberg, S.R.; Zhuang, X. Image Analysis Using Mathematical Morphology. IEEE Trans. Pattern Anal. Mach. Intell. 1987, 9, 532–550. [Google Scholar] [CrossRef]
- Ming-Kuei, H. Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 1962, 8, 179–187. [Google Scholar] [CrossRef] [Green Version]
- Lloyd, S.P. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–136. [Google Scholar] [CrossRef]
- Suykens, J.A.K.; De Brabanter, J.; Lukas, L.; Vandewalle, J. Weighted least squares support vector machines: Robustness and sparse approximation. Neurocomputing 2002, 48, 85–105. [Google Scholar] [CrossRef]
- David, H.A. Early Sample Measures of Variability. Stat. Sci. 1998, 13, 368–377. [Google Scholar] [CrossRef]
- Lin, Q.; Wang, J. Vertically Correlated Echelon Model for the Interpolation of Missing Wind Speed Data. IEEE Trans. Sustain. Energy 2014, 5, 804–812. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, Y.; Li, C. Interpolation of missing wind data based on ANFIS. Renew. Energy 2011, 36, 993–998. [Google Scholar] [CrossRef]
- Zhang, Y.; Kim, S.; Giannakis, G.B. Short-term wind power forecasting using nonnegative sparse coding. In Proceedings of the 2015 49th Annual Conference on Information Sciences and Systems (CISS), Baltimore MD, USA, 18–20 March 2015. [Google Scholar]
- Efron, B. Missing Data, Imputation, and the Bootstrap. J. Am. Stat. Assoc. 1994, 89, 463–475. [Google Scholar] [CrossRef]
- Takagi, H.; Hayashi, I. NN-driven fuzzy reasoning. Int. J. Approx. Reason. 1991, 5, 191–212. [Google Scholar] [CrossRef] [Green Version]
- Jang, J.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Manand Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
Ref. | Outlier-Related Research Focus | Measurement Data Considered in Outlier Analysis | Measurements Resolution |
---|---|---|---|
[29] | WT-level | WS, Pout | 10-min |
[41] | WT-level | WS, Pout | 10-min |
[42] | WT-level | WS, Pout, rotor speed | 10-min |
[7] | WT-level | WS, Pout, WD, ambient temperature | 10-min |
[35] | WT-level | WS, Pout | 10-min |
[43] | WT-level | WS, Pout | 10-min |
[53,57] | WT-level | WS, Pout, WD, rotor speed, gear temperature, blade pitch angle | Unspecified |
[24] | WT-level | WS, Pout, ambient temperature, relative humidity, air pressure | 10-min |
[28] | WT-level | WS, Pout | 10-min |
[13] | WT-level | WS, Pout | 5-min |
[26] | WT-level | WS, Pout | 5-min |
[49] | WT-level | WS, Pout | 10-min |
[45] | WT-level | WS, Pout | 10-min |
[25] | WT-level | WS, Pout | 5-min |
[23] | WT-level | WS, Pout, WD, ambient temperature, relative humidity, air pressure | 10-min |
[34] | WT-level | WS, Pout, ambient temperature, relative humidity, air pressure | 10-min |
[18] | WT-level | WS, Pout | 5-min |
[31] | WT-level | WS, Pout | 5-min |
[51] | WT-level | WS, Pout | 10-min |
[47] | WT-level | WS, Pout, ambient temperature, rotor speed, generator speed, blade pitch angle, etc. | Unspecified |
[39] | WT-level | WS, Pout, blade pitch angle | 10-min |
[27] | WT-level | WS, Pout, generator speed, nacelle temperature, etc. | 10-min |
[40] | WF-level | WS, Pout | 15-min |
[48] | WF-level | WS, Pout | 15-min |
[37] | WF-level | WS, Pout | Unspecified |
[46] | WF-level | WS, Pout | Unspecified |
[14] | WF-level | WS, Pout | 10-min |
[15] | WF-level | WS, Pout, rotor speed, blade pitch angle | 10-min |
[20] | WT and WF-level | WS, Pout | Unspecified |
[50] | WT and WF-level | WS, Pout, ambient temperature, relative humidity, air pressure | 10-min |
Causes | Ref. (WT-Level) | Ref. (WF-Level) |
---|---|---|
WT downtime, offline operation, or outage | [29] | [48] |
WT curtailment, down-rating, or derating | [20,29,31,39,49] | [20,46,48] |
(Manually set) WT constrained operation | [7,35] | - |
Data acquisition failure | [24,29,32,33,41,45] | [14,40,46] |
Inaccurate measurements caused by improper measuring location (usually for WS), or WS under-reading | [48] | - |
Data transmission (communication system) failure | [20,23,29,33,41,45,47] | [20,40,46,48] |
Data processing failure | [23,45] | - |
Data management failure | [20,23,33] | [20] |
Electromagnetic interference | [23,29] | - |
WT control system problem, including pitch control malfunction, blade pitch angle error, control program problem, yaw and pitch misalignment, incorrect or unsuitable controller setting and similar | [7,18,23,31,35,39,45,49] | - |
WT blade damage, or excessive deposition of dirt, ice or insects | [18,20,29,31,35,45] | [20] |
Wind shear | [50] | [50] |
Shading or wake effects from neighbouring WTs, terrain or physical obstacles | [7] | - |
Extreme weather conditions, harsh environment | [29,33,45,47,49] | [40] |
Air density fluctuations | [7] | - |
Issues related to averaging period | [7,41] | - |
WT cut-out effects | [41] | - |
Wind speed (around the cut-in or cut-out wind speeds) | - | [14] |
WT data availability issues | [50] | [50] |
Other, including WT malfunctions, alarms in WT, low level of gearbox oil, worn-out generator brushes, sensor accuracy, and various errors in sensors | [18,20,23,24,32,35,45,47,49] | - |
Other, including WT malfunctions, fluctuations in WT performance, multiple non-meteorological factors and various errors in sensors | - | [14,15,20,40,46,48] |
Outlier Characteristics | Possible Causes | Ref. (WT-Level) | Ref. (WF-Level) |
---|---|---|---|
Low Pout-high WS | WT failure or outage, unplanned WT maintenance, or faults of measurement/communication system | [24,26,29,39,49] | [48] |
High Pout-low WS | Measurement/communication errors, or wind speed sensor failures | [29,39] | [40,48] |
Low Pout_max | Curtailment, damage of WT gearbox bearing, or measurement/communication system errors | [20,27,29,39,49] | [48,50] |
Shifted PC | Outages (caused by protection or unplanned maintenance) or faults of individual WTs in a WF | - | [48] |
Linear Pout-WS | Linear interpolation, malfunction of pitch-control system, or dirt deposits on the WT blades | [31] | - |
Scattered | Signal propagation noise, sensor failure, extreme weather conditions, averaging window, or uncontrolled random factors | [20,29,39,49] | [37,40] |
Constant WS-variable Pout | Errors in data acquisition and data transmission, wake effects, or Pout fluctuation around cut-out | - | [40,50] |
Method | Possible Problems or Limitations | Refs. | |
---|---|---|---|
Type | Name | ||
Classification | KNN | Plenty of normal data as samples; computational burden | [14,35] |
Clustering | k-means | Usually difficult to determine the parameter k | [15,45] |
Improved k-means | Does not work well for abundant outliers | [39] | |
DBSCAN | Sensitive to manually set parameters; does not work well for abundant outliers | [20] | |
(Semi-supervised) DBSCAN | Limited number of labelled data may not have the same statistical properties as the test data | [46] | |
LOF | Does not work for stacked outliers; does not work well for abundant outliers | [40] | |
KLOF | Does not work well for abundant outliers | [33] | |
SDAE and density-grid-based clustering | Does not work well for abundant outliers | [47] | |
FCM | Does not work well for abundant outliers | [44] | |
Correlation | Copula | Stacked outliers could not be effectively filtered | [41,42,48] |
Distance | Normal distribution | Assumption of normal distribution is not always suitable [42] | [18,24,43] |
Quartile algorithm | It is not effective when the proportion of outliers is large | [25,29,51] | |
MAD | The selection of MAD range may not be applicable in specific situations [80] | [44] | |
Change point | Change point grouping | Need to manually set parameters; ignores the overall distribution; does not work well for the abundant stacked outliers | [29] |
Recursive or iteration data removing | Only work for data set with large number of recordings, so the algorithms can converge; difficult to determine a stopping criterion | [26,31,34] | |
Statistically robust data fitting | Computational burden [23] | [23] | |
Data smoothing and censoring | Parameters must be carefully tuned to adapt to different cases [20] | [37] | |
Computer visual | Cannot distinguish false outliers near the cut-out wind speed | [49] | |
Hypothesis test | F-test | Assumes that data are normally distributed and that samples are independent [78] | [44] |
Physical | Various consecutive WS | Cannot detect outliers caused by Pout | [50] |
Expected ranges and mutually consistent | Cannot prove the correctness of selected ranges | [7,27] | |
Betz’s law | Can only determine the theoretical upper PC bound | [51] |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, M.; Djokic, S.Z. A Review of Approaches for the Detection and Treatment of Outliers in Processing Wind Turbine and Wind Farm Measurements. Energies 2020, 13, 4228. https://doi.org/10.3390/en13164228
Zou M, Djokic SZ. A Review of Approaches for the Detection and Treatment of Outliers in Processing Wind Turbine and Wind Farm Measurements. Energies. 2020; 13(16):4228. https://doi.org/10.3390/en13164228
Chicago/Turabian StyleZou, Mingzhe, and Sasa Z. Djokic. 2020. "A Review of Approaches for the Detection and Treatment of Outliers in Processing Wind Turbine and Wind Farm Measurements" Energies 13, no. 16: 4228. https://doi.org/10.3390/en13164228
APA StyleZou, M., & Djokic, S. Z. (2020). A Review of Approaches for the Detection and Treatment of Outliers in Processing Wind Turbine and Wind Farm Measurements. Energies, 13(16), 4228. https://doi.org/10.3390/en13164228