Next Article in Journal
Influence of Temperature on the Adsorption and Diffusion of Heavy Oil in Quartz Nanopore: A Molecular Dynamics Study
Next Article in Special Issue
Individuation of Wind Turbine Systematic Yaw Error through SCADA Data
Previous Article in Journal
Clean Coal Technologies as an Effective Way in Global Carbon Dioxide Mitigation
Previous Article in Special Issue
Wind Turbine Performance Decline with Age
 
 
Article
Peer-Review Record

Research on Anomaly Detection of Wind Farm SCADA Wind Speed Data

Energies 2022, 15(16), 5869; https://doi.org/10.3390/en15165869
by Wu Wen 1, Yubao Liu 2,*, Rongfu Sun 3 and Yuewei Liu 4
Reviewer 1:
Reviewer 3: Anonymous
Energies 2022, 15(16), 5869; https://doi.org/10.3390/en15165869
Submission received: 11 July 2022 / Revised: 7 August 2022 / Accepted: 10 August 2022 / Published: 12 August 2022

Round 1

Reviewer 1 Report

I find this paper interesting and appropriate for this journal. The topic is well presented and may contribute to a better understanding and improvement of wind farm operation and maintenance. The authors present a deep knowledge in this field and it is difficult for me to find some points to be improved. Therefore I suggest publishing it as it is.

Author Response

Thank you for taking the time to coordinate our manuscript and recognize my manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

In this paper, the authors develop a multi-approach based abnormal data detection method for SCADA wind speed data quality control. It is a well written paper. However, some points should be mentioned and should be included within the manuscript in order to improve the publication.

§  The authors need to clarify and explain the difference between the current study with the available literature.

§  The reasons for writing the paper or the aims of the research should be given in the abstract.

§  The discussions in sections 3 and 4 are not enough to explain the superiority of the proposed scheme well. It is suggested that the authors give more analysis of the results.

§  I believe that the flowchart of figure 1 needs more details and explanations in the text.

Author Response

Thank you very much for your time coordinating our manuscript and your advice to improve it. We have carefully revised the manuscript and incorporated all comments you and all reviewers kindly offered. Our point-to-point responses are given below.

 

Point 1: The authors need to clarify and explain the difference between the current study with the available literature.

Response 1:

          It is explained in the preface of the paper. The description is as follows:

At present, the prevailing methods for anomaly data detection include statistical correlation, distance relationship, deviation from physical relationship, and deviation from prediction. Anomaly detection based on statistical relationship is mainly to test the inconsistency of each point in the sample set [10-11], finding abnormal behavioral relationship between a single data and the data set. The distance relationship method detects anomalies through the distance between a single data and the center of the data set [12]. These two methods not effective for the complex data of abnormal conditions. Anomaly detection based on deviation relation is to establish a group of data subset of a data set . By calculating the dissimilarity between subsets, one can determine the outliers. In the actual data processing process, it is complex and difficult to deploy [13-14]. The method based on prediction is mainly to learn from a large number of historical data, put the data into the prediction model and compare the test data with the prediction data to confirm its abnormal characteristics[15-17]. This method sometimes assigns some normal mutation data as abnormal. In the SCADA data, there are both abnormal information caused by the change of fan performance and sudden abnormal information caused by sudden natural severe weather. It is difficult to comprehensively detect anomalies by relying on one of the above methods alone. Therefore, this study refines and integrates three anomaly detection methods into a comprehensive detection method for filtering the abnormal SCADA data.

  1. Yu S.; Li X. et al. Exploring the Intrinsic Probability Distribution for Hyperspectral Anomaly Detection; remote sensing.2021.
  2. Zong B.; Song Q. et al. Deep autoencoding gaussian mixture model for unsupervised anomaly detection; In ICLR. 2018.
  3. He Z.; Xu X. et al. Discovering cluster-based local outliers; Pattern Recognition Letters, 2003; Volume 24, pp. 1641–1650.
  4. Guansong P.; Cheng Y. et al. Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection; 2020.
  5. Weiming H.; Jun G.; Bing L. et al. Maybank; Anomaly Detection Using Local Kernel Density Estimation and Context-Based Regression; IEEE Transactions on Knowledge and Data Engineering. 2020.
  6. Wu X.; Shi B.; Dong Y. et al. Restful: Resolution-aware forecasting of behavioral time series data; In CIKM, 2018; pp. 1073–1082.
  7. Ya S.; Youjian Z.; Chenhao N. et al. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly.2019.
  8. Zong B.; Song Q.; Min M. R. et al. Deep autoencoding gaussian mixture model for unsupervised anomaly detection, In ICLR. 2018.

Point 2: The reasons for writing the paper or the aims of the research should be given in the abstract.

Response 2:

The objectives of the study are described in the abstract. The description is as follows:

Abstract: Supervisory control and data acquisition (SCADA) systems are critical for wind power grid integration and wind farm operation and maintenance. However, wind turbines are affected by regulation, severe weather factors, and mechanical failures, resulting abnormal SCADA data that seriously affect the usage of SCADA systems. Thus, strict and effective data quality control of the SCADA data are crucial. The traditional anomaly detection methods based on either “power curve” or statistical evaluation cannot comprehensively detect abnormal data. In this study, a multi-approach based abnormal data detection method for SCADA wind speed data quality control is developed. It is mainly composed of EEMD (Ensemble Empirical Mode Decomposition)-BiLSTM network model, wind speed correlation between adjacent wind turbines, and deviation detection model based on dynamic power curve fitting. The proposed abnormal data detection method is tested on SCADA data from a real wind farm, and statistical analysis of the results verifies that this method can effectively detect abnormal SCADA wind data. The proposed method can be readily applied for real-time operation to support an effective use of SCADA data for wind turbine control and wind power prediction.

 

Point 3: The discussions in sections 3 and 4 are not enough to explain the superiority of the proposed scheme well. It is suggested that the authors give more analysis of the results.

 

Response 3:

We computed the statistical factors of accuracy, recall and F1 score to verify the proposed scheme with two-month data of a medium-sized wind farm in 2020. The results show that this scheme can effectively improve the detection accuracy of abnormal data. From the wind farm scale has been monitored during the period can be effective.

 

Point 4: I believe that the flowchart of figure 1 needs more details and explanations in the text.

 

Response 4:

However, using the time series model method alone, the detailed information in the data cannot be effectively displayed, so the set empirical mode decomposition method is adopted. When splitting the original signal, the decomposed components can automatically match their own scale[21]. If the decomposed components can still be split, continue to decompose until they are not decomposed, At this time, all components of the original signal decomposed by EEMD method are obtained[22]. This decomposition method can mine more detailed details inside the signal, and is very suitable for dealing with unstable data. EEMD has two steps (Figure 1): 

Step 1: add normal-distributed noise to the wind speed time series to enhance EMD performance (Huang et al. [22]).

Step 2: apply the EMD(Empirical Mode Decomposition) method to obtain N IMF ( intrinsic mode components) components and a residual. This process includes obtaining local extremum of the data, finding the upper and lower envelope of the waveform, and obtaining IMF(intrinsic mode components) by deducting the average value of the upper and lower envelope from the original time series.

  1. Kuja B.; Bck A. Decomposition-based hybrid wind speed forecasting model using deep bidirectional LSTM net-works; Energy Conversion and Management,2021; 234.
  2. Huang N. E. et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis; Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 1998; Volume 454, pp. 903-995.

Author Response File: Author Response.docx

Reviewer 3 Report

1. The introduction of the article should be extended and refer in more detail to the current research works of other authors on the subject. In the proposed version of the article, the introduction is too general.

2. The article does not comprehensively describe the correlation analysis. This is because it is not clear whether the article analyzes the most popular Pearson's linear correlation coefficient. The Pearson coefficient can be calculated when both variables have a distribution close to normal and when the relationship is linear. In the paper, it is worth presenting whether both variables have a distribution close to normal. It is also worth presenting the R2 coefficient, significance coefficient and test probability value. If the assumptions of a normal distribution and a linear relationship are not met, then you need to use the nonparametric equivalents of correlation (nonparametric correlation). For example, these include Spearman correlation or gamma statistics, etc. The chapter relating to correlation analysis needs significant improvement.

3. In formula 8, the covariance matrix is missing (probably an editing error?).

4. Chapter 2.3 does not describe the algorithm used to determine the parameters. Their values were also not listed in the relevant tables.

5. The axis description in Figure 6 is missing.

6. In order to improve readability, it is worth describing (at least briefly) the deep learning techniques used in the article.

7. The literature is not formatted according to the journal's guidelines posted on the website. The introduction also contains various interlines of the text.

Author Response

Thank you very much for your time coordinating our manuscript and your advice to improve it. We have carefully revised the manuscript and incorporated all comments you and all reviewers kindly offered. Our point-to-point responses are given below.

 

 

Point 1: The introduction of the article should be extended and refer in more detail to the current research works of other authors on the subject. In the proposed version of the article, the introduction is too general.

 

Response 1: Please provide your response for Point 1. (in red)

According to estimates by the World Wind Energy Association, by 2020, approximately 12% of the world's electricity will be generated by wind power (GLOBAL WIND REPORT 2019). Wind energy has become one of the fastest-growing energy sources. The construction technology for large- and medium-sized wind farms is becoming increasingly mature. Supervisory control and data acquisition (SCADA) systems, as comprehensive monitoring systems that remotely connect each wind turbine with the main control room, have been widely used in wind power grid connection and wind farm operation and maintenance. SCADA systems provide many functions, such as remote control and parameter adjustment, data collection and storage, and alarms, so they have become important components of wind farms[1]. During the operation of a wind turbine, a SCADA system will typically sample wind turbine data at a high frequency (e.g. every second). Due to the high sampling frequency, SCADA data is not fully understood or utilized[2-5].

SCADA systems record numerous types of operating data, including historical operating status, and some data can be converted into characteristic curves reflecting the performance of the wind turbine, which has great utilization value. In the past few years, the wind turbine research community has used SCADA data for wind turbine control[6-8] and wind power prediction[9]. High-quality SCADA data are the basis of data assimilation and post-processing of model forecasts for error correction, which can improve the accuracy of wind power forecasting and ensure the reliability and stability of wind power grid connection. However, there are often abnormal data in SCADA data, including abnormal wind turbine status information, abnormal data collection, human intervention, and abnormal weather conditions. These anomalies sometimes destroy the data trends in the normal state of the wind turbine and complicate the use of data, especially for wind power prediction. Therefore, it is very important to detect and analyze anomalies in SCADA data.

At present, the prevailing methods for anomaly data detection include statistical correlation, distance relationship, deviation from physical relationship, and deviation from prediction. Anomaly detection based on statistical relationship is mainly to test the inconsistency of each point in the sample set [10-11], finding abnormal behavioral relationship between a single data and the data set. The distance relationship method detects anomalies through the distance between a single data and the center of the data set [12]. These two methods not effective for the complex data of abnormal conditions. Anomaly detection based on deviation relation is to establish a group of data subset of a data set . By calculating the dissimilarity between subsets, one can determine the outliers. In the actual data processing process, it is complex and difficult to deploy [13-14]. The method based on prediction is mainly to learn from a large number of historical data, put the data into the prediction model and compare the test data with the prediction data to confirm its abnormal characteristics[15-17]. This method sometimes assigns some normal mutation data as abnormal. In the SCADA data, there are both abnormal information caused by the change of fan performance and sudden abnormal information caused by sudden natural severe weather. It is difficult to comprehensively detect anomalies by relying on one of the above methods alone. Therefore, this study refines and integrates three anomaly detection methods into a comprehensive detection method for filtering the abnormal SCADA data. (Make a detailed description in this paragraph

The rest of this article is organized as follows. In Section 2, three detection methods are introduced: EEMD-BiLSTM network, wind speed correlation detection between adjacent wind turbines, and dynamic power curve fitting deviation detection. In Section 3, the novel design of a comprehensive detection method utilizing the three detection methods is presented, and the feasibility of the method is verified based on historical data of several wind turbines. In Section 4, the results of the real-time operation of the wind speed abnormality detection in SCADA data method for a medium-sized wind farm are analyzed to determine the effectiveness of the proposed method for real-time wind speed abnormality detection in SCADA data. Finally, in Section 5, the experimental results are summarized to obtain conclusions. The major finding of this study is that the proposed detection method is capable of effectively filtering abnormal SCADA data. This method can be used for cleaning historical data records, and it can also be used for real-time SCADA data quality control, which can effectively ensure that SCADA data are suitable for wind turbine control and wind power prediction.

  1. GWEC, GWEC|GLOBAL WIND REPORT 2019 (2020).
  2. Zaher A.; Mcarthur S. D. J. et al. Patel Y. Online wind turbine fault detection through automated SCADA data analysis; Wind Energy, 2010; Volume 12, pp. 574–593.
  3. Schlechtingen M.; Santos IF. et al.Wind turbine condition monitoring based on SCADA data using normal behavior models. Part 1: System description; Applied Soft Computing, 2013; pp. 259–270.
  4. Uluyol O.; Parthasarathy G. et al. Power curve analytic for wind turbine performance monitoring and prognostics; Proceedings of annual conference of the prognostics and health management society, 2011; pp. 1–8.
  5. Wenxian Y.; Richard C. et al. Wind turbine condition monitoring by the approach of SCADA data anal-ysis; Renewable Energy, 2013; pp. 365–376.
  6. Yingying Z.; Dongsheng L. et al. Fault Prediction and Diagnosis of Wind Turbine Generators Using SCADA Data; Energies, 2017; pp. 1–17.
  7. Ziqian K.; Baoping T. et al. Condition monitoring of wind turbines based on spatio-temporal fusion of SCADA data by convolutional neural networks and gated recurrent units; Renewable Energy, 2020; pp. 760–768.
  8. Qiu Y.; Feng Y. et al. Fault Diagnosis of Wind Turbine with SCADA Alarms Based Multidimensional Information Processing Method; Renewable Energy, 2019; pp. 1923–1931.
  9. Lin Z.; Liu X. et al. Wind power prediction based on High-frequency SCADA data along with isolation forest and deep learning neural networks; International Journal of Electrical Power & Energy Systems 2020; Volume 118.
  10. Yu S.; Li X. et al. Exploring the Intrinsic Probability Distribution for Hyperspectral Anomaly Detection; remote sensing.2021.
  11. Zong B.; Song Q. et al. Deep autoencoding gaussian mixture model for unsupervised anomaly detection; In ICLR. 2018.
  12. He Z.; Xu X. et al. Discovering cluster-based local outliers; Pattern Recognition Letters, 2003; Volume 24, pp. 1641–1650.
  13. Guansong P.; Cheng Y. et al. Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection; 2020.
  14. Weiming H.; Jun G.; Bing L. et al. Maybank; Anomaly Detection Using Local Kernel Density Estimation and Context-Based Regression; IEEE Transactions on Knowledge and Data Engineering. 2020.
  15. Wu X.; Shi B.; Dong Y. et al. Restful: Resolution-aware forecasting of behavioral time series data; In CIKM, 2018; pp. 1073–1082.
  16. Ya S.; Youjian Z.; Chenhao N. et al. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly.2019.
  17. Zong B.; Song Q.; Min M. R. et al. Deep autoencoding gaussian mixture model for unsupervised anomaly detection, In ICLR. 2018.

 

Point 2: The article does not comprehensively describe the correlation analysis. This is because it is not clear whether the article analyzes the most popular Pearson's linear correlation coefficient. The Pearson coefficient can be calculated when both variables have a distribution close to normal and when the relationship is linear. In the paper, it is worth presenting whether both variables have a distribution close to normal. It is also worth presenting the R2 coefficient, significance coefficient and test probability value. If the assumptions of a normal distribution and a linear relationship are not met, then you need to use the nonparametric equivalents of correlation (nonparametric correlation). For example, these include Spearman correlation or gamma statistics, etc. The chapter relating to correlation analysis needs significant improvement.

 

Response 2:

Under the conditions of global atmospheric circulation and weather system circulation, the near-surface airflow of wind farms is determined by the local topography and other underlying surfaces, and it has a high degree of correlation over several hundred meters to several kilometers. Thus, the correlation of wind speed between two adjacent wind turbines contains crucial information of the anomalies in one or both wind turbines. The correlation analysis of wind speed between adjacent wind turbines refers to the analysis of two or more correlated variable elements to measure the closeness of the correlation between the two variable factors. The correlation coefficient reflects the direction and degree of the change trend between two variables. Its value ranges from −1 to +1, where 0 means that the two variables are not correlated. A positive value means a positive correlation, and a negative value means a negative correlation. The larger the value, the stronger the correlation.

The correlation coefficient, one of the first statistical indicators designed by statistician Carl Pearson, is a quantity measuring the degree of linear correlation between variables, usually expressed in the letter r. Due to the different study subjects, the correlation coefficient can be defined in several ways. Among them, the Pearson correlation coefficient is more commonly used.

 

 

(2)

where  is the covariance of  and ,  is the  variance, and  is the  variance.

The Pearson correlation analysis is widely used in the field of wind power.Sunder[23] applied the correlation method to analyze the reliability of wind turbine components. Mostafa[24] applied wind load correlation analysis for wind farm reliability assessment. Shin[25] applied the structural correlation evaluation method of wind farms to evenly estimate the reliability of wind farms.

 

Point 3:In formula 8, the covariance matrix is missing (probably an editing error?).

Response 3:

If  are two groups of samples of the data set, U is the mean of vector , V is the mean of vector , and Σ is the covariance of  and, the Mahalanobis distance between  and  is:

 

 

(9)

 

Point 4: Chapter 2.3 does not describe the algorithm used to determine the parameters. Their values were also not listed in the relevant tables.

 

Response 4:

 

(7)

in which  is the power value,  is the wind speed value, and , and  are the fitting parameters. By fitting (7)with the wind farm data using Python curve_fit function , , and  providing the objective function of formula 7 and inputting historical data, the best fitting parameters are searched. The fitting curve parameters obtained by inputting 1-year historical data can well reflect the curve trend.

 

Point 5: The axis description in Figure 6 is missing.

Response 5:

Figure 6 has been modified. See Figure 6

 

Point 6:In order to improve readability, it is worth describing (at least briefly) the deep learning techniques used in the article.

Response 6:

(Chapter 2.1 has introduced the relevant knowledge of deep learning)

In recent years, long short-term memory (LSTM), a deep learning technology, has been widely used in wind power prediction in the field of wind energy[18-20]. LSTM is a time cyclic neural network, which is specially designed to solve the long-term dependence problem existing in general RNN (Recursive Neural Network) and CNN (Convolution Neural Network).

However, using the time series model method alone, the detailed information in the data cannot be effectively displayed, so the set empirical mode decomposition method is adopted. When splitting the original signal, the decomposed components can automatically match their own scale[21]. If the decomposed components can still be split, continue to decompose until they are not decomposed, At this time, all components of the original signal decomposed by EEMD method are obtained[22]. This decomposition method can mine more detailed details inside the signal, and is very suitable for dealing with unstable data. EEMD has two steps (Figure 1): 

Step 1: add normal-distributed noise to the wind speed time series to enhance EMD performance (Huang et al. [22]).

Step 2: apply the EMD(Empirical Mode Decomposition) method to obtain N IMF ( intrinsic mode components) components and a residual. This process includes obtaining local extremum of the data, finding the upper and lower envelope of the waveform, and obtaining IMF(intrinsic mode components) by deducting the average value of the upper and lower envelope from the original time series.

 

Point 7: The literature is not formatted according to the journal's guidelines posted on the website. The introduction also contains various interlines of the text.

Response 7:

We have made changes. See References

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

In section 2.3.1, please provide tables with the parameter values obtained from the calculations for polynomial fitting curve, expotential fitting curve, etc. In addition, it is worth including in this section the values of the standard deviation, which is a measure of the quality of curve fitting.

The introduction still contains various interlines in the text.

Author Response

Thank you very much for your time coordinating our manuscript and your advice to improve it. We have carefully revised the manuscript and incorporated all comments you and all reviewers kindly offered. Our point-to-point responses are given below.

 

 

Point 1: In section 2.3.1, please provide tables with the parameter values obtained from the calculations for polynomial fitting curve, expotential fitting curve, etc. In addition, it is worth including in this section the values of the standard deviation, which is a measure of the quality of curve fitting.

 

Response 1:

Contents have been added, see Table 7

Point 2: The introduction still contains various interlines in the text.

Response 1:

  1. Introduction

Wind energy has become one of the fastest-growing energy sources. According to the estimate by the World Wind Energy Association, by 2020, approximately 12% of the world's electricity will be generated by wind power (GLOBAL WIND REPORT 2019).  Supervisory control and data acquisition (SCADA) systems, as comprehensive monitoring systems that remotely connect each wind turbine with the main control room, have been widely used in wind power grid connection, power prediction, and wind farm operation[6-8] and maintenance[1]. During the operation of a wind turbine, a SCADA system typically samples wind turbine data at a high frequency (e.g. every second). Due to the high sampling frequency, SCADA data is not fully understood or utilized[2-5].

SCADA systems record numerous types of operating data, including historical operating status, and some data can be converted into characteristic curves reflecting the performance of the wind turbine, which has great utilization value[9]. High-quality SCADA data are the basis of data assimilation and post-processing of model forecasts for error correction. However, there are often abnormal data in SCADA data, including abnormal wind turbine status information, abnormal data collection, human intervention, and abnormal weather conditions. These anomalies sometimes destroy the data trends in the normal state of the wind turbine and complicate the use of data, especially for wind power prediction. Therefore, it is very important to detect and analyze anomalies in SCADA data.

At present, the prevailing methods for anomaly data detection include statistical correlation, distance relationship, deviation from physical relationship, and deviation from prediction. Anomaly detection based on statistical relationship is mainly to test the inconsistency of each point in the sample set [10-11], finding abnormal behavioral relationship between an individual sample and the data set. The distance relationship method detects anomalies through the distance between a single data sample and the center of the data set [12]. These two methods are not effective for some other complex abnormal conditions. Anomaly detection based on deviation relation is to establish a group of data subset of a data set and by calculating the dissimilarity between subsets, one can determine the outliers. In the actual data processing process, this is complex and difficult to deploy [13-14]. The method based on prediction is to learn from a large amount of historical data, put the data into the prediction model, and compare the test data with the prediction data to confirm its abnormal characteristics[15-16]. This method sometimes assigns some normal mutation data as abnormal. For example, there are both abnormal information caused by the change of the turbine blade performance and sudden changes caused by natural severe weather. It is difficult to comprehensively detect anomalies by relying on one of the above methods alone. Therefore, this study refines and integrates three anomaly detection methods into a comprehensive detection method for filtering the abnormal SCADA data.

The rest of this article is organized as follows. In Section 2, three detection methods are introduced: EEMD-BiLSTM network, wind speed correlation detection between adjacent wind turbines, and dynamic power curve fitting deviation detection. In Section 3, the novel design of a comprehensive detection method utilizing the three detection methods is presented, and the feasibility of the method is verified based on historical data of several wind turbines. In Section 4, the results of the real-time operation of the wind speed abnormality detection in SCADA data method for a medium-sized wind farm are analyzed to determine the effectiveness of the proposed method for real-time wind speed abnormality detection in SCADA data. Finally, in Section 5, the experimental results are summarized to obtain conclusions. The major finding of this study is that the proposed detection method is capable of effectively filtering abnormal SCADA data. This method can be used for cleaning historical data records, and also for real-time SCADA data quality control, effectively ensuring suitable use of SCADA data.

 

Author Response File: Author Response.docx

Back to TopTop