Intelligent Condition Monitoring of Wind Power Systems: State of the Art Review

: Modern wind turbines operate in continuously transient conditions, with varying speed, torque, and power based on the stochastic nature of the wind resource. This variability affects not only the operational performance of the wind power system, but can also affect its integrity under service conditions. Condition monitoring continues to play an important role in achieving reliable and economic operation of wind turbines. This paper reviews the current advances in wind turbine condition monitoring, ranging from conventional condition monitoring and signal processing tools to machine-learning-based condition monitoring and usage of big data mining for predictive maintenance. A systematic review is presented of signal-based and data-driven modeling methodologies using intelligent and machine learning approaches, with the view to providing a critical evaluation of the recent developments in this area, and their applications in diagnosis, prognosis, health assessment, and predictive maintenance of wind turbines and farms.


Introduction
The combination of the ever-increasing global electricity demand and growing carbon emissions has in recent decades firmly positioned renewable energy generation as a key for securing the future energy provision for our needs.As an effectively free and clean energy source, renewables have rapidly captured the attention of power generation companies, resulting in strong global growth [1].Among renewable energy resources, wind power occupies a prominent place and is generally accepted as a leading contributor with strong future growth projections [2].To ensure the much-needed continuity and expansion of wind power generation, it is imperative that its productivity, reliability, and cost are further improved.
Onshore and offshore wind turbines (WTs) often operate in harsh environments [3].This invariably imposes a requirement for sophisticated and powerful real-time condition monitoring (CM) systems that are capable of adapting to any environmental or operational condition during the conversion of kinetic energy into electricity.Thus, an accurate modeling process will always be the primary link between an accurate health assessment and a well-planned maintenance policy.Various modeling methods, including model-based techniques as well as data-driven and hybrid modeling procedures have been applied Energies 2021, 14, 5967 2 of 33 in this task [4].Accordingly, the emergence of sensing technologies makes it easier to collect the relevant operating history, directing health CM research to go further towards understanding better and more reliably characterizing the diagnostic features captured in CM signals, in an effort to enable more reliable diagnosis and prognosis of subassembly failures and lifetime consumption.A particularly attractive methodology that holds great potential to enable advances in this area is machine learning (ML), especially when physical modeling becomes challenging to manipulate due to the physical complexity of the system.
Modern WTs are able to continuously extract vast amounts of kinetic energy from the wind flow and convert it into useful electricity, due to effective aerodynamic design of blades and advanced turbine system operation, as well as the usage of sophisticated performance enhancement equipment [5].Understanding the concept of WT CM requires a clear understanding of their operating principles.To this end, Figure 1 gives an overview of the most critical WT components that any CM software/system should consider under operating conditions.The illustration focuses on horizontal axis WT design that has today become a standard configuration for modern multi-megawatt (MW) scale variable speed WT connected to the power grid.
Energies 2021, 14, x FOR PEER REVIEW 2 of 33 based techniques as well as data-driven and hybrid modeling procedures have been applied in this task [4].Accordingly, the emergence of sensing technologies makes it easier to collect the relevant operating history, directing health CM research to go further towards understanding better and more reliably characterizing the diagnostic features captured in CM signals, in an effort to enable more reliable diagnosis and prognosis of subassembly failures and lifetime consumption.A particularly attractive methodology that holds great potential to enable advances in this area is machine learning (ML), especially when physical modeling becomes challenging to manipulate due to the physical complexity of the system.Modern WTs are able to continuously extract vast amounts of kinetic energy from the wind flow and convert it into useful electricity, due to effective aerodynamic design of blades and advanced turbine system operation, as well as the usage of sophisticated performance enhancement equipment [5].Understanding the concept of WT CM requires a clear understanding of their operating principles.To this end, Figure 1 gives an overview of the most critical WT components that any CM software/system should consider under operating conditions.The illustration focuses on horizontal axis WT design that has today become a standard configuration for modern multi-megawatt (MW) scale variable speed WT connected to the power grid.WT CM along with ML tools has itself undergone many developments and improvements over the decades [4].This evolution is driven by the nature of WT operation and the multitude of environmental and physical variables characterizing it.The continual change in the physical state of WT components results in a higher level of access to dynamic samples.This time-varying dynamicity can be affected by several constraints including the fatigue loading on faulty components, damage propagation, aging, and environmental conditions [4].Therefore, considerable research exists that is aimed at moving towards advanced ML-based dynamic programming that is more suited to the nature of this process, rather than the ordinary offline learning [6].Likewise, for some modes of operation, it is difficult to collect patterns sufficient for the prediction process, thus leading to engagement of knowledge from different sources, ranging from pre-hypotheses obtained from pre-trained learners or experts to generative models such as generative adversarial networks (GANs) and transfer learning (TL) [7].
On the one hand, the multitude of WT failure modes in several components (e.g., gearbox, yaw, blades, and alternator) and the nature of their occurrence (gradually as in degradation, fleetingly and frequently) under different conditions make the data collected from non-similar events similarly resemble higher cardinality.This, therefore, requires WT CM along with ML tools has itself undergone many developments and improvements over the decades [4].This evolution is driven by the nature of WT operation and the multitude of environmental and physical variables characterizing it.The continual change in the physical state of WT components results in a higher level of access to dynamic samples.This time-varying dynamicity can be affected by several constraints including the fatigue loading on faulty components, damage propagation, aging, and environmental conditions [4].Therefore, considerable research exists that is aimed at moving towards advanced ML-based dynamic programming that is more suited to the nature of this process, rather than the ordinary offline learning [6].Likewise, for some modes of operation, it is difficult to collect patterns sufficient for the prediction process, thus leading to engagement of knowledge from different sources, ranging from pre-hypotheses obtained from pretrained learners or experts to generative models such as generative adversarial networks (GANs) and transfer learning (TL) [7].
On the one hand, the multitude of WT failure modes in several components (e.g., gearbox, yaw, blades, and alternator) and the nature of their occurrence (gradually as in degradation, fleetingly and frequently) under different conditions make the data collected from nonsimilar events similarly resemble higher cardinality.This, therefore, requires special care in processing and extracting characteristics.This need to have significant data brings out the complexity of the learning models by pushing them towards a more robust extraction such as denoising or convolutional mapping.In contrast, the nature of the occurrence of the Energies 2021, 14, 5967 3 of 33 failure modes distinguishes the type of application from one to the other.For example, the progressive propagation of damage requires prognostic algorithms, which depend mainly on both clustering and regression, such as in bearings.Conversely, other failure types are fully diagnostic specialties, which directly lead to classification.
In the recent literature reviews and in ML modeling context, many details about the growth and depth of WT CM problems are missing.For instance, the review provided by Stetco et al. [1] studied ML models as single entities that aim at classification or regression.The diversities in terms of complexity such as simple and deep architectures have not been discussed in detail.In addition, generative models that provide prior assumptions such as TL and generative adversarial models have been discussed as in the same data-driven frameworks and not knowledge-driven.The review of Liu et al. [8] has moved slightly for the study of ML tools without providing enough detail because it focused on things related to types of failures and classification.Another review by Rezamand et al. [4] studied only the important critical component in WTs and provided general views on both physical-based modeling methods and data-based methods.The authors only concentrated on prognostics where remaining useful life (RUL) was the main adopted health evaluation metric.The authors also considered ML methods with different architecture as single classes of data-driven methods or as black boxes without going deeply into architectures and learning procedures.
In general, CM systems comprise sensors, data acquisition, information processing, feature extraction, pattern recognition, and decision-making units.The majority of available CM systems measure vibration, requiring a range of sensors for different frequencies.Other systems measure parameters such as blade stress and temperatures of the nacelle, coolant, oil, gearbox, and generator.Monitoring data may be stored locally or transferred to a central computer for further diagnosis.Commercial wind farms usually employ a SCADA (supervisory control and data acquisition) system, which contains valuable online information regarding the performance and operational history of the turbines.Therefore, SCADA data have also been employed widely by researchers as the CM basis.Typically, around 200 signals are required to monitor an MW turbine continuously through SCADA and CM systems, each with different sampling rates [9].The large amount of data generated require smart mining techniques in order to reveal the salient patterns that can infer the nature, form, and extent of any faults existing in the system.
To address the limitations of existing reviews, this paper presents a systematic review of recent developments in this area and their applications in diagnosis, prognosis, health assessment, and predictive maintenance of WTs and farms.It is noted that this paper reviews the signal-based and data-driven modeling methodologies using intelligent and ML approaches, focusing on their relative advantages, capabilities, and limitations.Reviews of model-based fault detection for WTs, which require a more accurate mathematical WT model, can be referred to in the literature [10].
The paper is organized as follows.Section 2 presents a succinct review of conventional signature-analysis-based CM systems and advanced sensing CM applications for health monitoring and fault diagnosis of WTs.Section 3 introduces the main and recent ML contributions, providing a classification of different ML tools in terms of evolution as well as prediction complexity, and reviewing their application per most prominent WT failure modes.Section 4 reviews data mining techniques to address challenges resulting from big data collection and analytics as well as predictive maintenance based on health condition and RUL estimation.The discussion, future work in this area, and conclusions are given in Sections 5 and 6, respectively.

Wind Turbine Condition Monitoring
The key CM objective is to reduce operation and maintenance (O&M) expenditure, currently estimated to account for up to 20% and 30% of total onshore and offshore farm lifetime costs, respectively, where the turbine drivetrain is a major contributor [9].This section therefore reviews the WT drivetrain components of health monitoring and fault diagnosis.
The conventional approach to CM of the WT drivetrains principally relies on externally monitoring vibration of the individual drivetrain components [11,12].The contemporary drivetrain-dedicated CM systems invariably employ an array of accelerometers distributed along the drivetrain structure (i.e., generator, gearbox).The vibration sensors are operated via an appropriate signal conditioning and acquisition charge amplifier device to enable continuous high-rate (kHz rate) monitoring of the vibration signals in relevant positions in the drivetrain [13,14].The inclusion of vibration monitoring platforms in WT systems is formally stipulated by the relevant turbine CM certification criteria with clear specifications on the minimum measuring point requirements [15].Other drivetrain signals that can be captured by WT CM platforms can include the generator electrical signals and the gearbox and the generator thermal signals, as well as acoustic signals, those related to gearbox oil condition, and others [16].The underlying aim of monitoring a selection of appropriate drivetrain signals and their distinct diagnostic features is to enable reliable fault presence identification and fault propagation trending online, i.e., during WT operation [17].

Vibration Monitoring
Vibration monitoring (VM) is presently the most commonly used commercial CM technique implemented on WTs for drivetrain online monitoring [18,19].This largely stems from the fact that VM for diagnostics of rotating machinery is a well-researched and a well-understood concept, with extensive transferrable expertise available from other industries [20].
VM is chiefly based on the identification of drivetrain mechanical fault-related changes in the vibration signal, which provides information about the mode and location of a potential fault.VM is an online technique and is regulated by the relevant standards [21] to define the position and implementation of the vibration sensors on a given device.There are three main types of vibration sensors: distance sensors including displacement and proximity, which operate between 1-100 Hz; velocity sensors (10 to 1 kHz); and accelerometers (1 to 30 kHz) [22].Some examples of the implementation of vibration sensors in the drivetrain are low-frequency accelerometers for the main bearing, high-frequency accelerometers for the gearbox and generator bearings, and proximity sensors such as inductive distance sensors on other parts of the drivetrain [22].The most commonly used accelerometer type is the piezoelectric accelerometer, due to its wider bandwidth, robustness, lower cost, and general availability in a broad range of sizes and configurations [23].An example of the implementation of vibration sensors on a geared drivetrain configuration is presented in Figure 2 [24].

Figure 2.
Example of vibration sensor positions on a drivetrain.Reproduced from [24], Elsevier.
The measured time-domain vibration signals are converted to the frequency domain since fault-related frequency components can be identified and isolated in the frequency domain.The frequency-domain analysis is generally achieved by processing the monitored signals using the Fast Fourier transformation (FFT) [25].However, a number of advanced signal processing methods such as wavelet transforms have also been researched to increase the diagnostic capability of the vibration signal spectral analysis during variable load and speed operating conditions, where the conventional FFT analysis is chal- The measured time-domain vibration signals are converted to the frequency domain since fault-related frequency components can be identified and isolated in the frequency domain.The frequency-domain analysis is generally achieved by processing the monitored signals using the Fast Fourier transformation (FFT) [25].However, a number of advanced signal processing methods such as wavelet transforms have also been researched to increase the diagnostic capability of the vibration signal spectral analysis during variable load and speed operating conditions, where the conventional FFT analysis is challenged [26].While these generally enable a more effective extraction of diagnostic information in transient conditions, they are complex and computationally intensive to implement, especially for operational WTs [27].
The current commercial VM systems are found to be the most effective CM technique for the early detection of faults in mechanical components [28].In addition, the severity of a fault can be recognized through the magnitude of the observed vibration signal component [11].Gearbox faults (e.g., tooth damage, breakage or fracturing of gear teeth), rotor faults, shaft faults (e.g., misalignment, cracked shaft or coupling failure), faults in the mechanical brake (e.g., cracked disk), main bearing faults (e.g., bearing pitting or cracking) and generator faults (i.e., short-circuit, rotor electrical imbalance) are some of the drivetrain faults that have been shown possible to identify through VM [8,13,22,25,26,29,30].
VM systems are unable to provide fault detection on specific electrical units such as the converter since there are no moving parts [31].In addition, VM requires the installation of not only the vibration sensors and the associated signal conditioning and data acquisition equipment, but also the availability of advanced signal processing techniques to extract useful information from the vibration data.Therefore, VM-based CM is generally deemed to be a relatively costly monitoring method [32].Furthermore, the installation of vibration sensors on the surface or into the body of drivetrain components is a specialist process [22].VM is not highly efficient at detecting incipient stage faults, as the vibration signals typically have a low signal-to-noise (SNR) ratio [22].The application of VM systems in WTs is generally complicated by vibration data collection requirements and the variable speed WT operating conditions, characterized by continuous variation of load and thus drivetrain speed.It can also be challenged by effective transfer of VM-based diagnostics and systems used in other rotating machinery industries to the wind industry, as the rotor speed is relatively lower.A reliable and consistent interpretation of the vast amount of vibration data obtained from individual turbine and farm vibration-based CM systems is required to obtain dependable diagnosis [11].

Oil Debris Analysis
The oil debris analysis technique has been effectively used for fault detection in gearboxes, generators, and bearings, as there are a number of locations in WTs where lubrication is used [33,34].Oil debris analysis is principally used to monitor the status of the lubrication of rolling components to detect oil degradation and contamination [35].Dirt, wear debris, water, incorrect oil, depletion of additives, oxidation, and base stock breakdown are some of the reasons that can lead to the degradation and contamination of lubrication [36].In addition, the oil debris analysis is important to achieve maximum service life, especially for the gearbox [37].
The condition of the lubricant is found to carry useful information about the health of the rolling components.For example, the amount of particles, size, shape, and composition can be monitored to determine faults without having to disassemble the entire system.Oil debris analysis is also used to monitor the level of lubrication quality, as it is important for the operation of rolling components.The lubricant can be affected by temperature, oxidation, contaminants, moisture, and time in service, and more effective maintenance action can be achieved by monitoring its quality [38].The parameters that are generally monitored to characterize the lubricant quality are [39]: acid content, viscosity, water content, oxidation level, and temperature.
Currently, the dominant oil debris analysis approach is that of offline oil debris analysis [23].Monitoring the relevant diagnostic parameters of oil in commercial WTs is generally conducted via laboratory techniques by means of special reagents, instruments, and equipment, such as a viscometer and an optical emission spectrometer [22].The typical Energies 2021, 14, 5967 6 of 33 recommended interval for oil debris analysis if there are no abnormal operating conditions is once every six months [36].The analysis results provide information about the status of tested samples, as well as recommendations to the owner/operator of the WTs.
Research is ongoing that is focused on developing effective, online, real-time oil debris analysis to eliminate the current restrictions of oil debris analysis-based CM techniques and potentially further increase the reliability of WTs [40].Several sensors such as particle counting sensors and oil condition sensors are generally installed in the gearbox lubrication loop [36].However, the use of additional sensors needed to enable online monitoring increases the cost of oil debris analysis.Furthermore, the proposed online methods can be limited in detection of certain gearbox failures [28].The interpretation of the online oil debris data can also be challenging due to its dependency on the operation conditions, such as temperature.In combination with the lack of universal oil debris analysis for all WTs (the oil debris analysis requirements are specific to particular WT manufacturers or lubrication oil suppliers, and generally differ between these), this has limited the application of this technique for commercial purposes [36].
The main drivers for offline oil debris analysis use are to monitor the parameters that are not monitored by other online CM techniques and also to conduct analysis to identify the failed parts of components and the root cause of a failure or to detect incipient faults.The oil debris analysis is generally implemented in combination with vibration analysis for the potential detection of a more extensive variety of faults and to increase the reliability of diagnosis derived from usage of the oil debris analysis alone.While it has been shown to be effective in CM of lubricated mechanical components, the oil debris analysis accuracy is highly dependent on the type, number, and location of the sensors used, and it is generally challenging to establish a cost-effective and universal oil debris analysis technique for gearboxes due to their configuration complexity [36].

Acoustic Emission (AE)
Acoustic emission (AE) monitoring is available for commercial CM systems of WT drivetrains as an online monitoring technique.AE monitoring employs AE sensors to obtain and analyze sound information.This is based on utilizing the release of strain energy in the form of transient elastic waves within or on the surface of a material, caused by a deformation or damage; in practice, this means that observing and trending particular frequencies of drivetrain-emitted sound can enable effective mechanical fault diagnosis [28].AE analysis is thus used to detect gearbox, bearings, generator, shaft, and rotor faults, such as, for example, shaft misalignment or gear damage [22,27,41,42].
AE monitoring can be implemented in combination with vibration analysis to increase the accuracy of fault detection and also reduce the number of false alarms [43].The application of AE monitoring on WT drivetrains generally uses two types of AE sensors: piezoelectric transducers and optic fiber displacement sensors.AE monitoring can exhibit a high signal-to-noise ratio (SNR) and contain high-frequency vibrations ranging from 50 kHz to 1 MHz, which is not the case with conventional VM [44].As a result, AE monitoring can be more efficient in detection of early-stage fault compared with other established CM techniques [23].
The wider use and application of AE monitoring of WT drivetrains is, however, impeded by some of its inherent drawbacks, such as [11,28,33]: • AE sensors are required to be placed at certain proximity locations to be able to accurately detect a fault.

•
Accurate AE measurements require the installation of a large number of AE sensors, which all require individual dedicated data acquisition equipment for the sensing, analysis, and data transfer process.

•
AE measurements and analysis are expensive due to the data acquisition system cost and the requirement of high sampling rates for signal processing.
• WT nacelles are not particularly suitable for AE sensor application due to the high level of operational and ambient noise, which can complicate the identification of target sound components.

•
The attenuation of the AE signals during propagation can also pose limitations in implementation of this technique.

Temperature Monitoring
Temperature monitoring (TM) is based on detecting unexpected temperature changes in WT drivetrain components, which can be an indicator of increased heat originating from component degradation caused by a developing fault.This is a commonly used CM method due to its maturity, cost efficiency, and reliability [33], whose application features for various power equipment are regulated by the relevant standards (e.g., IEEE 1310-2012 [45], IEEE 1718-2012 [46], ISO 17359-2006 [47], and others) [23].The temperature of the main bearing, the gearbox, the generator bearings and windings, and the lubrication and hydraulic oil temperatures are monitored for thermal changes arising from the presence of underlying fault, such as mechanical damage of bearings and gears, insufficient lubricant properties, loose or bad electrical connections, faults in the mechanical brake (i.e., cracked disk), generator winding faults, and rotor over speed [22,23].Optical pyrometers, resistant thermometers, and thermocouples are some of the common temperature sensors used in this approach [28].
Temperature sensors can, however, be highly invasive and fail in harsh environments.They can also be challenged in identifying fine thermal changes in devices, which may be typical of incipient fault stages [22,23].Furthermore, thermal-based diagnosis in WT drivetrains can be complicated by the difficulty of reliable identification of the reasons for an observed component temperature rise, as the temperature of different WT components can be affected by their surroundings [28].As a result, temperature monitoring is generally used in combination with other CM techniques in order to achieve more accurate diagnosis of fault.

Electrical Signal Analysis
Electrical signal analysis (ESA) has been gaining prominence as a CM technique to monitor WT drivetrains and identify faults, due to its relatively simple implementation, efficiency, lower hardware complexity, and cost effectiveness [22,48].ESA is based on signature analysis techniques in which the spectra of the generator electrical signals are analyzed with the view of identifying fault-specific signatures that can be employed for reliable diagnosis purposes.The magnitudes of these fault signatures provide information about the severity of a fault and can be used to detect faults at an early stage [22].The biggest advantage of ESA is that it is non-invasive and is relatively straightforward to implement and install on WTs, as the electrical signals are already monitored during WT operation via the control and protection systems (such as SCADA).Furthermore, the electrical signals are easily accessible without needing direct access to a WT nacelle to install measurement sensors.Therefore, no additional sensors or data acquisition devices are generally required for establishment of ESA-based CM schemes [23].In addition, ESA is more cost effective than other CM techniques that require mechanical signal measurements, as electrical measurements are generally cheaper to obtain than mechanical measurements [32].
Voltage, current, power, flux, and control signals are some of the electrical signals investigated for monitoring the faults of WT drivetrain components.These signals are used to monitor components such as the gearbox, bearings, and generator and are used to identify electrical and mechanical faults such as bearing faults, air gap eccentricity, misalignment, electrical imbalances, winding faults, and rotor mass imbalance [22,23,28,[49][50][51][52][53].As an example, utilizing current signal analysis for the identification of faults and the calculation of fault-specific changes implemented on a real operational WT is presented in [54].While generally promising, the method is highly device-design-specific, and the identification of signatures specific to particular fault types can be a significant challenge.Its application is further complicated in WT drivetrains due to their inherent variable speed operation which can impose considerable complications in extraction and trending of the nonstationary target fault signatures for diagnosis purposes [55].
ESA is not yet widely implemented in commercial CM systems due to the lack of experience in the wind power industry [33].In addition, one of the disadvantages of ESA is the relatively low SNR of the electrical signals, which can reduce the observability of the relevant diagnostic content [32].Furthermore, it is important to reliably identify the relevant fault signature and choose an appropriate signal processing technique to obtain suitable results; otherwise, there is considerable likelihood of false alarms and unreliable fault-detection processes [32].

Torque Measurement
Torque measurements (TM) have also been used for monitoring and fault detection of WT drivetrains [56].The basis of TM is dependent on identifying a torsional oscillation or disruption in a torque-speed ratio caused by the presence of electrical and/or mechanical faults [56].There are generally three different approaches used for practical TM: using a rotary torque sensor, which measures the torque signal; using a reaction torque sensor, which measures the bending moment signal; and using the estimated torque signal calculated from the electrical signals of a WT generator [22].Signature analysis techniques have to be applied to the measured or calculated torque signal for fault signature extraction, which is then used to identify a fault in a WT drivetrain.The general premise of diagnostic application is identical to that used for ESA; however, the torque signal is used for inferring diagnostic information here.
Torque sensors are generally placed in line with the drivetrain rotating shafts to sense the torque signal; this is generally only practical for smaller devices.TM has been researched for detection of faults in the main shaft, bearings, the gearbox, mass imbalance, and generator faults such as winding faults and unbalances [12,22,56,57].
TM as a WT drivetrain CM system is very challenging to implement due to practical installation issues and the resulting cost implications [33].In addition, the dominant components in the spectrum of the torque signal are load-dependent, which results in the need to utilize more complicated signal processing techniques to investigate the torque signal compared to those used in vibration signal analysis [23].Therefore, TM has found very limited use in commercial applications for WT drivetrain monitoring [56].

SCADA Signals
Commercial WTs are equipped as standard with a SCADA system for performance monitoring, remote supervision, and control, and the usage of SCADA signals for diagnostic purposes has attracted considerable research interest.A SCADA system generally uses 10 min intervals to monitor more than 200 signals from a WT and creates historical datasets, which can then be used in a CM application through appropriate data analysis solutions [33,58].The SCADA signals measured via various sensors in a WT during each interval are generally mean, maximum, and minimum values, and standard deviation of temperature, current, voltage, power, rotor speed, wind speed, and various other WT signals [23].These signals will invariably contain information related to WT health and can therefore be exploited for CM.SCADA data collected from healthy WTs is usually used as a reference to model behavior of a WT during operating conditions when there is no fault in the system, and then any fault can be detected by comparing the monitored operational data with the reference data.Faults in the generator, main shaft, and gearbox of a WT drivetrain are among the components whose diagnosis has been researched using SCADA signal analysis [18,23,59].
SCADA offers an advantage in that no additional sensors and data acquisition equipment cost is required for CM [60].In addition, a SCADA system is also capable of monitoring the status of the alarms identified in a WT.A number of researchers have been investigating using these alarms for CM of WT drivetrains [60].However, the low sampling rate of the SCADA signals is not sufficient for timely and highly accurate fault detection, as the most useful diagnostic information of interest for most drivetrain failure modes can be compromised [23].Furthermore, a SCADA system can create false alarms due to the varying operational nature of a WT.Therefore, it cannot presently be relied on as the sole CM system in commercial WTs [33].Moreover, since a SCADA system was not designed initially for CM, it does not collect all of the required information to be able to conduct a full CM of a WT [28].
A summary of the conventional CM methods for WT drivetrains is presented in Table 1.Thermography analysis (TA) is based on capturing heat patterns and thermal images of components, which emit infrared radiation according to their temperature and emissivity when a component starts to fail via temperature transmitters and high-resolution thermographic (infrared) cameras [62].TA does not need any physical contact for measurements and is considered a highly noninvasive measurement technique.It therefore minimizes the problems associated with the location and proximity of sensors.
Presently, TA is only commercially used as an offline CM technique in operating WTs (generally via the periodical manual inspection) although infrared cameras and diagnostic software are available for online CM [63].This is largely caused by the high cost of thermographic monitoring systems, and also by challenges in using TA in practical applications, such as the dependency of the results on the resolution of the cameras, as well as the utilized image processing techniques.Furthermore, as it is predominantly based on external thermal imaging of devices, TA is not capable of incipient fault detection since the device surface temperature change caused by internal fault development is a slow process [28].Finally, the results obtained from thermographic cameras are interpreted visually and need to be interpreted correctly for reliable diagnosis.
While it has yet to find a more widespread use, TA has previously been used to identify cracks and damage on the main shaft, bearings, and also gearboxes.The technique is considered promising for monitoring of generators and power electronics as well [28,64].

Shock Pulse Method
The shock pulse method (SPM) has been used for monitoring rolling element bearings in WTs as a quantitative online CM method.SPM is based on detecting short-duration shock waves generated from the impacts in the bearings via a shock pulse transducer and a probe piezoelectric accelerometer [65].Piezoelectric accelerometers convert mechanical strain created as a result of shock waves to electric signals using the piezoelectric effect.For CM, piezoelectric accelerometers operate at their resonant frequency (~32 kHz) to generate large output signals from weak shock pulses since damped oscillations are created at the resonance frequency [66,67].
The magnitudes of peaks as well as the signal levels between the peaks of the shock waves can be measured using SPM.Furthermore, analysis of a normalized shock value provides information about the conditions of bearings [68].The correct interpretation of the results obtained from SPM requires the knowledge of the bearing geometry, its operating conditions, and the shock values under different operating conditions.Low frequency vibrations collected in the nacelle and created by other sources than the bearings are electronically filtered out when SPM is used [67].Although SPM is generally used to monitor bearing conditions, it is also useful to obtain information about the thickness of lubricants, which can be used to inform the preventive maintenance schedule and implement corrective action during the most suitable time frame.

X-ray Micro-Tomography
X-ray micro-tomography is a high-resolution 3D monitoring technique which enables investigation of internal structures without physically needing to open or cut through the investigated sample.This CM technique has been reported to be used to identify incipient-stage gearbox-bearing failures such as white structure flaking (WSF) or white etching cracks [69].X-ray micro-tomography is based on the identification of initiators caused by surface flaws/cracks, micro structural discontinuities, and non-metallic inclusion.Although the early research results are promising, this CM technique is costly and new for monitoring WT drivetrains, and therefore, it is not commercially used yet [70].

Fiber Bragg Grating Sensors Measurement
Fiber Bragg grating (FBG) sensor measurement for WTs has increasingly been researched as a promising alternative CM technique due to its advantages such as lower signal-to-noise ratio, immunity to electromagnetic interference, small sensor size, flexibility, multiplexing, and multi-physical sensing capability [28,[71][72][73]].An FBG sensor contains a specially fabricated optic fiber, which is thin, flexible, and transparent and can reflect particular wavelengths of light from distinct fiber locations exposed to physical excitation (e.g., temperature, strain, and others).FBG sensing is a power passive technology that with appropriate design can be used for acquisition of a range of multi-physical measurands and is most often employed as a thermal and/or strain-sensing solution [71].The measurement process involves the transformation of the measured physical quantity to a distinct wavelength of light, which is then analyzed by a specialized interrogator device to extract a physical measurand [74].
FBG measurement is commercially used in WT as a leading solution for monitoring of WT blade stress [71].Due to its inherent advantages, the technology has also received recent research attention for application in drivetrain CM and power devices in general and has been shown to have promising potential to enable advanced in situ CM solutions for generators and also the power electronics components [72,[75][76][77][78][79][80][81][82].FBG monitoring is currently not commercially used for WT drivetrain CM.While promising, this technology does require specialized installation procedures and sensor design, and its wider adoption will largely depend on whether it transitions from a niche high value application sensing technology to a more generally adopted lower-cost solution [83].

Machine Learning for Wind Turbine Condition Monitoring
ML is one of the techniques that are at the forefront of diagnostic research in many disparate areas of health assessment.This section aims to provide a dedicated review of ML application in WT: the current research trends are reviewed, as are the proposed ML diagnostic solutions for key WT subassemblies.Section 3.1 presents fundamentals of ML-based CM, the used ML tools, and their classification and usage in WT CM.Section 3.2 reviews the application of ML techniques for CM of failure modes in key individual WT components.Section 3.3 summarizes the selection of the appropriate ML models for WT CM.

Machine-Learning-Based Condition Monitoring
Generally, WT CM based on ML tools is done by following the three main steps: data acquisition, data analysis, and finally, health status assessment [22], as addressed by the flow diagram of Figure 3.

Data Acquisition
In data acquisition, samples intended to convey health patterns are in the form of signals that have been collected using various types of sensors.Here, particular sensor type(s) may be used in specific diagnostic applications, such as, e.g., the accelerometers are generally used to collect vibration signals from the WT drivetrain including bearings, gearbox, and shafts [84][85][86].Similarly, microphones can be used to record acoustic emissions in harsh environments where it is difficult to implement accelerometers [87], and thermocouples can also be used for the same purpose as accelerometers [6,[88][89][90].Finally, cameras can be used for metal deformation image recording [91,92].For more centralization and ease of CM system implementation in a single processing system rather than individually installed ones, wireless sensors can be used to send data measurements to a centered data analysis base for less complex processing [93][94][95].Moreover, one can find more detection methods such as ultrasonic, thermo graphical, and radio graphical testing (e.g., see García Márquez et al. [96]).Tables 2-5 summarize some of the important ones used in recent years.

Data Analysis
Data analysis is one of the major milestones of WT CM with ML tools; the reliability of a CM system is directly related to the accuracy of the prediction model it employs.In ML-based CM, incoming signals are generally unlabeled, and the ground truth real labels are impossible to be assumed from experts.Therefore, one can find that most of applications in WT CM fundamentally depend on the clustering process [85,88,[97][98][99] or the signal processing techniques [89,100,101].Whether the user intended to perform an effective

Data Acquisition
In data acquisition, samples intended to convey health patterns are in the form of signals that have been collected using various types of sensors.Here, particular sensor type(s) may be used in specific diagnostic applications, such as, e.g., the accelerometers are generally used to collect vibration signals from the WT drivetrain including bearings, gearbox, and shafts [84][85][86].Similarly, microphones can be used to record acoustic emissions in harsh environments where it is difficult to implement accelerometers [87], and thermocouples can also be used for the same purpose as accelerometers [6,[88][89][90].Finally, cameras can be used for metal deformation image recording [91,92].For more centralization and ease of CM system implementation in a single processing system rather than individually installed ones, wireless sensors can be used to send data measurements to a centered data analysis base for less complex processing [93][94][95].Moreover, one can find more detection methods such as ultrasonic, thermo graphical, and radio graphical testing (e.g., see García Márquez et al. [96]).Tables 2-5 summarize some of the important ones used in recent years.

Data Analysis
Data analysis is one of the major milestones of WT CM with ML tools; the reliability of a CM system is directly related to the accuracy of the prediction model it employs.In ML-based CM, incoming signals are generally unlabeled, and the ground truth real labels are impossible to be assumed from experts.Therefore, one can find that most of applications in WT CM fundamentally depend on the clustering process [85,88,[97][98][99] or the signal processing techniques [89,100,101].Whether the user intended to perform an effective detection, diagnosis, or prognosis operation, the first step consists in differentiating between operating behaviors in the case of diagnosis, or health stages in the case of prognosis.In the case of performance evaluation where the real RUL is missing, a labeling process by experts can be evolved to associate certain probabilistic functions (linear or exponential degradation model) to different samples of the life cycles presented by those measurements to be able at least to obtain some knowledge on current physical conditions [102]. Figure 4 dictates the most important applications of ML in WT CM.

Common Failure Modes of Turbine Components
In a WT, as shown in Figure 1, the blowing wind creates a lift force that makes the blade turn when moving through the airfoil cross-sections of the root-to-tip twisted blades.The blades connected to a single hub in the center are controlled by a pitch controller to collect the maximum amount of energy from the winds to increase the rotation speed [3].A low-speed shaft connects the hub and the gearbox to transport the mechanical rotational energy.The resulting low torque due to mechanical construction of the equipment is therefore boosted by the planetary gear set arrangement of the gearbox to produce sufficient rotation trying to achieve maximum efficiency when driving the generator [8,84].All the components are brought together in a single housing chamber called the nacelle.The nacelle itself is lifted on a top of a tower, and its direction is controlled by a yaw motor with the help of a velocity sensor that measures the wind speed and direction to ensure that the turbine rotor is always directly facing the wind flow.Brakes are also installed in the nacelle to stop the rotation of the blades during a higher rotation speed or to stop the yaw motor in windy conditions which could damage the system [4].
Since the WT operates in extremely harsh environments, the working conditions can inherently compromise its integrity.An extreme wind speed can be considered too severe for rotating equipment and even the entire core where the function of the brakes may not be effective.In addition, extreme cold can cause malfunction of important equipment, including the blades, and cause damage.Therefore, the function of CM is to offer a monitoring system capable of detecting, diagnosing, and prognosing such failures in order to ensure the continuity of energy production by planning the necessary maintenance operations at appropriate times.
Since CM with ML is the main topic of this review, we have collected the important contributions from recent literature, mostly studied during the last two years.The developed methods of detection, diagnosis, and prognosis have been classified according to the main types of significant failures generally encountered by WTs.A complete list of work that adopts the common failure modes, which includes gearbox, yaw, blades, and gener- In recent literature and after a careful pattern-selection for training process, an approximation function should be selected for the assessment process.Therefore, the approaches developed upon these criteria have different architectures, ranging from traditional ML (TML) through hybrid to deep and complex networks with advanced training procedures.The new generation of the WT ML analysis mostly depends on deep learning techniques including CNN (convolutional neural network) and LSTM (long short-term memory).Recent training procedures involve new techniques of generative models able to guess to give prior assumptions for learning models by providing new enhanced representation.The training models known as GANs and TL are very popular in recent studies, which gives ML a prediction of a new impression to extend the data-driven into the knowledge-driven by providing different prior assumptions.

Common Failure Modes of Turbine Components
In a WT, as shown in Figure 1, the blowing wind creates a lift force that makes the blade turn when moving through the airfoil cross-sections of the root-to-tip twisted blades.The blades connected to a single hub in the center are controlled by a pitch controller to collect the maximum amount of energy from the winds to increase the rotation speed [3].A low-speed shaft connects the hub and the gearbox to transport the mechanical rotational energy.The resulting low torque due to mechanical construction of the equipment is therefore boosted by the planetary gear set arrangement of the gearbox to produce sufficient rotation trying to achieve maximum efficiency when driving the generator [8,84].All the components are brought together in a single housing chamber called the nacelle.The nacelle itself is lifted on a top of a tower, and its direction is controlled by a yaw motor with the help of a velocity sensor that measures the wind speed and direction to ensure that the turbine rotor is always directly facing the wind flow.Brakes are also installed in the nacelle to stop the rotation of the blades during a higher rotation speed or to stop the yaw motor in windy conditions which could damage the system [4].
Since the WT operates in extremely harsh environments, the working conditions can inherently compromise its integrity.An extreme wind speed can be considered too severe for rotating equipment and even the entire core where the function of the brakes may not be effective.In addition, extreme cold can cause malfunction of important equipment, including the blades, and cause damage.Therefore, the function of CM is to offer a monitoring system capable of detecting, diagnosing, and prognosing such failures in order to ensure the continuity of energy production by planning the necessary maintenance operations at appropriate times.
Since CM with ML is the main topic of this review, we have collected the important contributions from recent literature, mostly studied during the last two years.The developed methods of detection, diagnosis, and prognosis have been classified according to the main types of significant failures generally encountered by WTs.A complete list of work that adopts the common failure modes, which includes gearbox, yaw, blades, and generator, is therefore provided, along with ML techniques being applied, respectively.

Gearbox
A WT gearbox is a very essential part of transporting kinetic energy.It is used to increase the low-speed rotation of the blades rotor to a higher speed to be able to produce enough power to cause the initiation of the generator to produce electricity.Generally speaking, a WT gearbox has four main parts arranged in planetary form: the sun gear, planetary gears, bearings, and planet gear carrier (Figure 5a).It is thus formed in a planetary gear in order to be able to satisfy the aforementioned speed of rotation.

Gearbox
A WT gearbox is a very essential part of transporting kinetic energy.It is used to increase the low-speed rotation of the blades rotor to a higher speed to be able to produce enough power to cause the initiation of the generator to produce electricity.Generally speaking, a WT gearbox has four main parts arranged in planetary form: the sun gear, planetary gears, bearings, and planet gear carrier (Figure 5a).It is thus formed in a planetary gear in order to be able to satisfy the aforementioned speed of rotation.
Under the operating conditions of harsh environments, each of these components could be affected by the high rotational speed of the high-speed shaft of the gearbox.Consequently, many types of failures could appear.According to [103,104] one may observe several health levels of gears by taking into account different defects on gears teeth such as cracked, chipped, missing root, surface defect and healthy gears as addressed by Figure 5b.Additionally, bearing faults such as internal race faults could affect the mechanical transmission process of the drivetrain (Figure 5c) [105].Under the operating conditions of harsh environments, each of these components could be affected by the high rotational speed of the high-speed shaft of the gearbox.Consequently, many types of failures could appear.According to [103,104] one may observe several health levels of gears by taking into account different defects on gears teeth such as cracked, chipped, missing root, surface defect and healthy gears as addressed by Figure 5b.Additionally, bearing faults such as internal race faults could affect the mechanical transmission process of the drivetrain (Figure 5c) [105].
One can provide from the literature a set of examples that have dealt with these types of failures.For instance, in the work of Cao et al. [103], they studied how to detect different states of health of the sun gear of the WT gearbox (cracked, chipped, missing root, surface defect, and healthy gears).They mainly used multiple time domain features extruded from three different accelerometers installed in different positions of the bearings (vertical, horizontal, and radial).After that, in a simple way, they introduced these features into a bidirectional long-short term memory (Bi-LSTM) specially designed for sequence-to-sequence classification problems.In the deep learning approach proposed by Cheng et al. [104], a new learning path for fault classification (diagnosis) for gearboxes of dual-power induction generator WTs is designed depending on the current signal processing.As a new contribution in the gearbox fault diagnosis, Corley et al. [105] used a thermal modeling method coupled with the ML technique to be able to strengthen the CM system of the WT.In the work of Fu et al. [106], an efficient approach to select gearbox temperature measurements was adopted using an elastic neural network.After that, the obtained learning features were fed into a hybrid convolutional LSTM for precise universal approximation and further generalization to be able to detect over-temperature fault warning.In the work of Hu et al. [100], they mainly involved signal processing techniques to detect failure thresholds of WT gearbox under operating conditions.After determining the learning classes from signal processing frames, training samples were fed into a randomly assigned extreme learning machine (ELM) network enhanced with the particle swarm optimization (PSO) technique for a full-supervised fault detection.In the work of Inturi et al. [88], a problem of fault classification for health state evaluation of the WT gearbox at different speed stages was aborted.A hybrid algorithm of fuzzy logic and ML, namely the adaptive neuro-fuzzy inference system (ANFIS), was therefore developed.In the work of Jiang et al. [107], an end-to-end CNN was involved to directly use the raw vibration signals recorded from sensors installed in the rotating planetary elements of the gearbox without using any signal processing techniques.Thus, the designed approach has proven its ability to detect different health stage patterns of the gearbox.In addition, other examples in the topic of fault types on the gearbox have been listed in Table 2.It can be seen that most of the recently cited works, which have been carried out in an attempt to study transmission anomalies of gearboxes, are generally classification problems, used either to detect different stages of health, or to classify different modes of failures.These techniques are based on powerful deep learning techniques for sequential or ordinary multiclass classifications.Therefore, this explains the lack of work that has been done in the regression problems, which generally consists of prognostic-based RUL predictions that depend on the measure of the remaining useful life and is thus very crucial in CM, especially for the recent decades of the remarkable industrial evolution.

Yaw System
The yaw system is designed to direct the nacelle around the tower axis to ensure maximum power tracking and increase the energy capture through pointing the rotor towards the direction of the incoming wind stream.As shown in Figure 6a, the yaw direction system consists of mechanical equipment that is loosely similar in functionality to that of the gearbox system.Therefore, it could encounter the same failure modes of bearings and gears, in addition to the failure modes of the yaw motors, as illustrated in Figure 6b.However, the working conditions are not the same because the yaw system affected by the pressure encounters the entire WT in addition to the rotation speed of the blades [87].CM techniques aiming to detect multiple yaw system faults have been reported in literature.In the work of Reder et al. [112] they integrate semi-supervised data mining approaches to process meteorological and fault data.The study mainly focused on the kmeans clustering to extract different groups of patterns related to cases of both healthy and unhealthy operation of several WT components, including the yaw system.The work of Chen et al. [87] represents an automatic damage detection algorithm applied to the yaw system of WTs.This is a classification procedure totally based on the analysis of acoustic signals.In fact, and unlike the installation of vibration and temperature sensors, the current diagnostic system facilitates the installation of acoustic signals using only a regular microphone installed next to the yaw system.The obtained signals are thoroughly preprocessed before feeding a Bayesian network fault classifier.Another work by Chen et al. [113] involved the use of unsupervised sequential autoencoders trained for feature extraction combined with an approximation neural network to obtain an accurate performance evaluation model.The reconstruction and approximation networks were dynamically trained with LSTM for the detection of multiple WT faults, including the yaw system, using real SCADA data.Results were passed to a support vector machine (SVM) based on an adaptive threshold algorithm to annotate healthy and health-related patterns.
The mentioned contributions indicate that most of the algorithms designed were based on both deep learning and feature extraction.Multiple feature recording techniques (e.g., acoustic and vibration signals) have been involved where the accuracy of detection process primarily depends on a clustering process that aims to identify the degree of damage spread (for more details, see Table 3).CM techniques aiming to detect multiple yaw system faults have been reported in literature.In the work of Reder et al. [112] they integrate semi-supervised data mining approaches to process meteorological and fault data.The study mainly focused on the k-means clustering to extract different groups of patterns related to cases of both healthy and unhealthy operation of several WT components, including the yaw system.The work of Chen et al. [87] represents an automatic damage detection algorithm applied to the yaw system of WTs.This is a classification procedure totally based on the analysis of acoustic signals.In fact, and unlike the installation of vibration and temperature sensors, the current diagnostic system facilitates the installation of acoustic signals using only a regular microphone installed next to the yaw system.The obtained signals are thoroughly pre-processed before feeding a Bayesian network fault classifier.Another work by Chen et al. [113] involved the use of unsupervised sequential autoencoders trained for feature extraction combined with an approximation neural network to obtain an accurate performance evaluation model.The reconstruction and approximation networks were dynamically trained with LSTM for the detection of multiple WT faults, including the yaw system, using real SCADA data.Results were passed to a support vector machine (SVM) based on an adaptive threshold algorithm to annotate healthy and health-related patterns.
The mentioned contributions indicate that most of the algorithms designed were based on both deep learning and feature extraction.Multiple feature recording techniques (e.g., acoustic and vibration signals) have been involved where the accuracy of detection process primarily depends on a clustering process that aims to identify the degree of damage spread (for more details, see Table 3).Blades are a key WT component, which is exposed to considerable stress in operation.They are aerodynamically designed in a form of twisted blades with gradually decreasing airfoil cross-sections from root to tip.Blades could be affected either by the high wind speed or turbulence, or, for example, cold weather conditions where the presence of blade ice can be particularly challenging and lead to breakdown of the system [91,114].The ice formation on the surface of blades (Figure 7a) is the result of existence of water particles in the wind stream.Sand/particle-contaminated wind streams can also erode and cause considerable damage to the blade material, as shown in Figure 7b.Blades are a key WT component, which is exposed to considerable stress in operation.They are aerodynamically designed in a form of twisted blades with gradually decreasing airfoil cross-sections from root to tip.Blades could be affected either by the high wind speed or turbulence, or, for example, cold weather conditions where the presence of blade ice can be particularly challenging and lead to breakdown of the system [91,114].The ice formation on the surface of blades (Figure 7a) is the result of existence of water particles in the wind stream.Sand/particle-contaminated wind streams can also erode and cause considerable damage to the blade material, as shown in Figure 7b.Fault detection in blades can generally be performed via several methods including ultrasonic waves, measurement of frequency in resonance, vibration measurement, or via optical measurement [115].In a test aimed at detecting blade icing in WTs with machine learning-based CM, Yi et al. [97] focused on a field SCADA data problem related to the detection of WT ice under unbalanced classification.They proposed a synthetic technique of grouping minority and oversampling to separate the recorded data into specific clusters related to the icing stages.The resulting clusters were preprocessed using a linear interpolation algorithm before feeding the regular ML classifier.In the work of Yang et al. [92], a pattern recognition algorithm was designed to classify the images of WT blades obtained via an unmanaged aerial vehicle.The main objective was to detect damage in the blades by involving three main learning mechanisms: i) a CNN for the extraction of the best features, ii) TL algorithms to improve generalization, and iii) a random forest set to improve Fault detection in blades can generally be performed via several methods including ultrasonic waves, measurement of frequency in resonance, vibration measurement, or via optical measurement [115].In a test aimed at detecting blade icing in WTs with machine learning-based CM, Yi et al. [97] focused on a field SCADA data problem related to the detection of WT ice under unbalanced classification.They proposed a synthetic technique of grouping minority and oversampling to separate the recorded data into specific clusters related to the icing stages.The resulting clusters were preprocessed using a linear interpolation algorithm before feeding the regular ML classifier.In the work of Yang et al. [92], a pattern recognition algorithm was designed to classify the images of WT blades obtained via an unmanaged aerial vehicle.The main objective was to detect damage in the blades by involving three main learning mechanisms: (i) a CNN for the extraction of the best features, (ii) TL algorithms to improve generalization, and (iii) a random forest set to improve the blade defect detection process.In an attempt to predict the gradual formation of ice on the rotor blades of WTs, research by Kreutz et al. [116] developed a data-based ice prediction approach using two different ML methods, namely the SVM and the DNN (deep neural network).The analyzed data were collected from the SCADA monitoring system with the help of specific sensors installed in WTs from a wind farm located in Germany with around 10 WTs.In their work [91], the authors studied the same subject based on a CNN that learns patterns from RGB (Red Green Blue) images obtained with a camera installed in the nacelle.
The subject of blade icing is an entirely environmental variable; it is different from the problems of bearing and gear faults, which can be a hybridization of physical and environmental variables.Therefore, detection techniques can be challenged by the unpredictable dynamics of the underlying events.Recent work employs recorded measurements from different sensors containing images and their analysis by different learning tools that attempt to address the key health patterns of interest (see Table 4).Structures and architecture ML algorithms similar to the work mentioned above have been carried out in this field.Typically, they involve a preprocessing unit and deep, ordinary, ensemble, or hybrid learning algorithms to solve classification problems.For instance, in the work of Chen et al. [85], due to the problem of unlabeled health CM data, a self-setting health threshold has been assigned to solve health stage splitting problem by training a GAN network, which is a type of autoencoder via adversarial learning.Zhang et al. [123] have also developed a semi-automatic learning approach based on generative adversarial learning that helps in bearing fault classification using incomplete datasets (i.e., unlabeled small amount of vibration signals).On the other hand, Chang et al. [89] developed a parallel CNN with multi-scale kernels for the classification of health stages.One of the main advantages of their contributions has been focused on the absorption of raw signals without any preprocessing, which reduces human intervention.One can notice that the work done on the generator CM is similar to those done on the gearbox CM in both detection and processing (see Table 5).

Selection of Machine Learning Models
The selection of the appropriate ML model depends on many important factors: the nature of the application (feature extraction, classification, regression, and clustering), the nature of the data provided (complete balanced labeled data, unbalanced data, incomplete data with missing labels), and the nature of the driven samples (time series, images).For example, LSTM is a better tool for sequence-to-sequence learning, which can be applied for both classification and regression.CNN is very helpful when it comes to pattern detection such as image segmentation.The above Tables 2-5 are introduced to scan most of the important work that has been performed so far in CM of WTs.They are devoted to the training algorithms, extraction techniques, learning architecture, learning behavior, and applications.
On the one hand, according to the pie charts presented in Figure 9, it can be observed that deep learning algorithms are incredibly growing in WT CM by occupying about 39% of the used techniques, which is only 10% less than TML tools.Most of the deep architectures are based on powerful hierarchical architectures developed based on CNN.Furthermore, one can find that most of the work (45%) has been focused on signal processing extraction techniques rather than ML tools (only 29%).As a matter of fact, all the applications of WT CM are mainly based on fault classification.Besides, the extension to GAN networks and TL is largely in its infancy stage.

Big Data Problems and Challenges
A tremendous amount of data, referred to as big data, has been generated by the improvement of science and technology, particularly ICT (information and communication technology) for CM in recent years.The concept of big data is defined by Garter [124] as a data type that has the characteristics of high volume, velocity, and variety.By using new processing paradigms, the decision-making and data processing procedures can thus be optimized.However, because of the high volume, velocity, and variety of the data, the conventional CM technologies might not be able to explore the full potential of big data.Hence, developing big data applications for information extraction from vast data amounts has become a challenge.
The four Vs used to describe big data characteristics are volume, variety, velocity, and veracity [125].The first and the most well-known characteristic of big data is volume which describes the amount, size, and scale of the data.For CM systems, the data acquired from the sensors has a major impact on the system.The installation of an effective WT CM system requires a high number of sensors with high sampling frequency in general, especially for the electrical components within the turbine, thus generating a large amount of data.However, the use of a large number of sensors may compromise and reduce the overall reliability of the sensor system [126].Besides, processing and interpreting large amounts of data acquired from a sensor system can be a complex task even for the expe- Pie chart analysis of the used machine learning methods in wind turbine condition monitoring.

Big Data Problems and Challenges
A tremendous amount of data, referred to as big data, has been generated by the improvement of science and technology, particularly ICT (information and communication technology) for CM in recent years.The concept of big data is defined by Garter [124] as a data type that has the characteristics of high volume, velocity, and variety.By using new processing paradigms, the decision-making and data processing procedures can thus be optimized.However, because of the high volume, velocity, and variety of the data, the conventional CM technologies might not be able to explore the full potential of big data.Hence, developing big data applications for information extraction from vast data amounts has become a challenge.
The four Vs used to describe big data characteristics are volume, variety, velocity, and veracity [125].The first and the most well-known characteristic of big data is volume which describes the amount, size, and scale of the data.For CM systems, the data acquired from the sensors has a major impact on the system.The installation of an effective WT CM system requires a high number of sensors with high sampling frequency in general, especially for the electrical components within the turbine, thus generating a large amount of data.However, the use of a large number of sensors may compromise and reduce the overall reliability of the sensor system [126].Besides, processing and interpreting large amounts of data acquired from a sensor system can be a complex task even for the experienced data analyst [127].
The second relates to variety that defines the structural variation of the dataset and the data types of the big data [128].There are two major challenges associated with the variety of big data in CM: data heterogeneity, and incomplete and noisy data.Data heterogeneity refers to the syntactic and semantic characteristics of the data, which indicate the diversity of the data types and different interpretations of the data.For a WT SCADA system, various types of data are included, such as mechanical, temperature, and electrical data.The data integration would be a problem since the data may come from different sources with different physical meanings.Hence, solving the data heterogeneity problem has attracted renewed attention in recent years [129].The data acquired from the sensors may contain various types of measurement errors, missing values, outliers, and noisy data [130], while the noise can be accumulated especially with high-dimensional datasets typical of big data.Therefore, it is important to extract valid data from the noisy data subsequently following data collection and integration [131].
The third dimension is velocity, which describes not only how the data are generated but also how the data are sampled in terms of frequency rate.For real-time data streaming, the new data are continuously generated, which causes nonstationary behavior of big data; thus, it is impossible to acquire the entire dataset before processing [132].This would bring challenges to acquisition of the necessary datasets for real-time processing.
The last important characteristic of big data is associated with veracity.Because of the inherent unreliability of the data sources, the provenance and quality of big data would define the veracity together [133].Similar with variety, the challenges of veracity are often brought by the data sources.The original dataset can be too large in the context of big data, and thus extra computational cost becomes overwhelming [134].Moreover, the veracity of a dataset can be affected by the uncertainty of the data source.The noise contained in the data is not unique, which makes the noise in a large dataset more difficult to handle.

Data Mining Condition Monitoring
A WT CM system consists of the combination of sensors and signal processing units [135].The CM techniques comprise statistical analysis, signal processing, and increasingly, the data-driven and data mining techniques, which are used to diagnose and prognose the health status of major WT subassemblies (e.g., blades, nacelle, gearbox, generator, and power electronic converter).The monitoring process can be online or offline; the online monitoring provides real-time data that reflect the instantaneous feedback of operation condition while the off-line monitoring collects data at regular time intervals for analysis based on different data acquisition systems [136].With appropriate CM techniques, maintenance actions can be planned appropriately to prevent further damage to the turbine while the turbine is still kept operational, and thus the downtime and O&M costs are reduced [137].
Data mining techniques have been designed to solve big data problems such as variable selection, dimension reduction, feature extraction, and online processing.The data mining techniques, especially ML-based CM methods, have drawn more attention in recent years.The ML approaches are commonly referred to as the data-driven CM, which does not require prior knowledge of the turbine.
Due to the large amount of data and untraceable data sources, the raw data might be messy and contain lots of noise.Incomplete and incorrect data will lead to misjudgment in CM, and data cleaning is therefore necessary before processing the data.The kernel-based local outlier factor (KLOF) has been proposed for data cleaning [138].With this method, the data are first divided into several segments and then the features extracted from those segments, such as mean, maximum, and peak-to-peak value, and used to evaluate the degree of each segment being incorrect data by adapting KLOF.A proper threshold was set to distinguish the incorrect data from correct data.The results demonstrated that the proposed method could effectively identify incorrect data and abnormal segments.A method based on minimization of dissimilarity-and uncertainty-based energy (MDUE) was also proposed for data cleaning [139].This method transformed scattered data into a digital image in grey scale and then determined an optimum threshold based on intensitybased class uncertainty and shape dissimilarity.The abnormal data were finally marked by image thresholding.
The dimension reduction techniques have been widely applied to reduce the complexity of the original dataset and thus the computation load while processing the large amount of data.Principal component analysis (PCA) is a well-established data mining technique that extracts principal components from various types of variables, which has often been used in dimension reduction and feature extraction.By adapting PCA, the computation load can be significantly reduced.Wang et al. proposed a PCA-based method to select certain variables among all variables relating to a target fault.The proposed method reduced the dimensions of two different datasets to 51.7% (15 out 29 variables) for simulation data and 45.4% (35 out of 77 variables) for SCADA data, respectively.The average correlation and information entropy after dimension reduction were kept at 99.81%, 0.0082, and 81.32% for simulation data, and 99%, 0.162, and 88.88% for SCADA data, respectively.Clearly, this method can detect faults efficiently and effectively while reducing the number of variables for CM [9].Other data mining techniques such as parallel factor analysis, k-means clustering, auto-encoders, and deep belief network have also shown their capability in dimension reduction and feature extraction [140][141][142].
There are still challenges in dealing with big data for CM, particularly for online processing.In the context of streaming/online data, ML algorithms may not fulfil such tasks due to being trained by historical and previously training data [143].In this scenario, incremental learning was therefore taken into consideration to prevent retraining of the previous model based on support vector regression and Karush-Kuhn-Tucker [144].The dimension of the training dataset would change if the new sample comes in; however, the weights could be updated automatically without retraining the data.Thus, online monitoring can be achieved without building new models for training.It is noted that the online monitoring also needs to consider data uploading problems.To solve this, a hierarchical extreme learning machine embedded with cloud computing was proposed to reduce the data upload quantity [145].The result showed that the uploaded data volume could be reduced to 12.5% of the original data size before compression, while in the meantime, the data transmission security was improved since the parameters of model and original input data were compressed in the first hidden layer.

Condition-Based Predictive Maintenance
The conventional WT maintenance is often divided into corrective or scheduled maintenance.The corrective maintenance is performed after system failure, which can be caused by, e.g., a component fatigue, unreliable design, and environmental operational factors.Engineers often implement corrective maintenance during WT inspection or when the WT shuts down due to a fault.Thus, the O&M cost of corrective maintenance is the highest among all maintenance strategies.In contrast, the scheduled maintenance, also known as the periodic-based maintenance or preventive maintenance, is carried out by repairing at fixed time intervals usually recommended by the supplier.The fatigue components can be replaced before the failure [146,147].Scheduled maintenance can indeed reduce the unscheduled downtime; however, setting maintenance tasks more frequently than usual would increase the O&M cost since the replaced components may have not yet reached their full useful life.A more advanced policy, called opportunistic maintenance, has also been developed as the combination of corrective maintenance with preventive maintenance.When a WT component reaches its critical degradation state, there is an opportunity to implement preventive maintenance for the others, thus reducing the losses of accidental failures [148].An optimal opportunistic maintenance policy was proposed for a deteriorating multi-bladed offshore WT subjected to stress corrosion cracking and environmental shocks by employing field failure data from the SCADA system [149].
Thus, the condition-based predictive maintenance takes into consideration the health condition of the turbine to mitigate against major component failures, where the intelligentbased approaches have become a promising solution [150].This strategy includes a whole set of data acquisition, data processing and analysis, and fault diagnosis and prognosis in order to provide optimal maintenance actions [151,152].By adapting this strategy, unscheduled and unnecessary maintenance tasks are prevented, hence significantly reducing the O&M cost.

Decision-Making Framework
Data-driven CM approaches have recently attracted more attention in predictive maintenance.Based on the Energy Roadmap 2050, European electricity will be supplied by wind energy from 31.6% to 48.7% [153].Offshore wind farms have now been deployed in deep seas for richer wind resources, which have caused more difficulties in terms of maintenance activities [154].Hence, it is vital for wind farm operators to perform predictive maintenance in order to increase the useful lifetime of WTs [155].By using historical and real-time data from various parts, the WT CM can be performed to achieve a more reliable predictive maintenance for the turbines.The data acquired from the WTs are multi-dimension time-series, which need a precise modeling method to predict the fault [156].The condition-based predictive maintenance is able to gather necessary information from CM system and SCADA system to analyze the operational status of the WT components in order to prevent major failures from happening [61,157].
Decision-making for condition-based predictive maintenance can be implemented by two methods: current condition evaluation-based (CCEB) and future condition predictionbased (FCPB) [158].The major difference between the two decision-making methods is that the CCEB focuses more on the current state (i.e., diagnosis) while the FCPB focuses on the future state (i.e., prognosis).Figure 10 shows the framework of these two decision-making methods, both of which highly rely on the CM techniques.Maintenance activities can be scheduled as long as the estimated health condition exceeds a certain threshold [112,149,159,160].

Decision-Making Framework
Data-driven CM approaches have recently attracted more attention in predictive maintenance.Based on the Energy Roadmap 2050, European electricity will be supplied by wind energy from 31.6% to 48.7% [153].Offshore wind farms have now been deployed in deep seas for richer wind resources, which have caused more difficulties in terms of maintenance activities [154].Hence, it is vital for wind farm operators to perform predictive maintenance in order to increase the useful lifetime of WTs [155].By using historical and real-time data from various parts, the WT CM can be performed to achieve a more reliable predictive maintenance for the turbines.The data acquired from the WTs are multi-dimension time-series, which need a precise modeling method to predict the fault [156].The condition-based predictive maintenance is able to gather necessary information from CM system and SCADA system to analyze the operational status of the WT components in order to prevent major failures from happening [61,157].
Decision-making for condition-based predictive maintenance can be implemented by two methods: current condition evaluation-based (CCEB) and future condition prediction-based (FCPB) [158].The major difference between the two decision-making methods is that the CCEB focuses more on the current state (i.e., diagnosis) while the FCPB focuses on the future state (i.e., prognosis).Figure 10 shows the framework of these two decisionmaking methods, both of which highly rely on the CM techniques.Maintenance activities can be scheduled as long as the estimated health condition exceeds a certain threshold [112,149,159,160].The implementation of CCEB and FCPB strategies can be challenging during real industrial practice.In fact, when implementing CCEB, there may not be enough time for maintenance planning if the health condition shows that the components have already reached the fault limit.Although the FCPB can indeed solve this problem since it is able to predict future health condition of the components, the reliability of short-term predic- The implementation of CCEB and FCPB strategies can be challenging during real industrial practice.In fact, when implementing CCEB, there may not be enough time for maintenance planning if the health condition shows that the components have already reached the fault limit.Although the FCPB can indeed solve this problem since it is able to predict future health condition of the components, the reliability of short-term predictions is higher than that of long-term ones.When dealing with long-term prediction, the FCPB might not be precise enough.To provide a reliable maintenance decision, the CCEB and FCPB need to be chosen carefully for an optimal decision.

Remaining Useful Life Estimation
Condition-based maintenance activities have also focused on fault prognosis and remaining useful life (RUL) estimation.Cheng et al. proposed a fault prognosis and RUL prediction method for WT gearbox [161], where an ANFIS was used to learn the state transition function of the fault features.Then, a particle filtering algorithm was employed to predict the RUL of the gearbox via the learned state transition function.The effectiveness of this method has been demonstrated by their run-to-failure tests.Another case study presented in [162] has shown that a power-purchase-agreement-managed wind farm by incorporating estimation of the WT RUL can enable predictive maintenance for wind farm, thus avoiding corrective maintenance and reducing the cost and downtime.Zhang et al. proposed a fatigue prediction model of the blade to reproduce the fatigue damage evolution in the composite blades subjected to aerodynamic loadings by cyclical winds.The lifetime probability of fatigue failure of the blades was then investigated by stochastic deterioration modeling, and a cost benefit model was finally built to optimize the maintenance cost [163].Zhu et al. investigated new importance measures of evaluating the maintenance values of WT components in terms of increasing the mean of RUL and mean residual system profit over the RUL.Their study showed that the proposed importance measures were suitable and effective for selecting components for inspection and maintenance actions to take [164].
To estimate the RUL of a WT, the prognostics and health management (PHM) techniques can be adapted.A turbine with PHM was studied with a stochastic jump-diffusion model in order to model the random evolution of deterioration process and production output.Monte-Carlo simulation was performed to find the optimal maintenance data as well as the lowest maintenance cost [160].Not only are the mechanical components of WT used to estimate their RUL, the RUL estimation of electrical components is also necessary.A Gaussian process regression technique was proposed to estimate the RUL for degraded high-power IGBTs (insulated-gate bipolar transistor) [165].This method was proven compatible with accelerated ageing database of real devices as defined under thermal overstress utilizing a direct current at the gate.
As shown in the literature, both diagnostic and prognostic/RUL estimation strategies can provide valuable information for condition-based preventive maintenance.On the other hand, a number of researches have also been conducted to investigate the scheduling optimization.Garcia et al. proposed a maintenance system, called intelligent system for predictive maintenance (SIMAP), for the WT gearbox and showed that the SIMAP can adapt the maintenance calendar of a WT to its real needs and operating times [166].Zhong et al. proposed a maintenance scheduling optimization model as a two-phase solution framework by integrating the fuzzy arithmetic operation and the non-dominated sorting genetic algorithm.The schedules were derived from the trade-offs between the maximum reliability and minimum cost [167].Except the CM methods for WT components, the labor cost and production loss as objective functions have also been taken into consideration for maintenance scheduling decision.By analyzing historical weather data and a statistical model for weather description, the maintenance problem was formulated compactly as a mixed-integer linear programming model.Compared with the periodic preventive maintenance, the expected labor cost and production loss were reduced approximately by 30% and 20%, respectively [168].Other parameters such as maintenance vessel allocation, electrical price, and dynamic safe access pre-requisites for WTs and crane also play an important role for maintenance scheduling optimization [169,170].
It is noted that condition-based predictive maintenance suffers from a lack of details in the existing data collection system.The RAMS (reliability, availability, maintainability, and safety) databases have therefore been constructed to provide more detailed information on maintenance planning, scheduling optimization, and life cycle cost minimization [171].Another concern is associated with the data reliability since the data can be lost, noised, and hacked during the transmission process.In order to improve the CM accuracy and reliability, data encryption has also often been taken into account.

Discussion and Future Work
Conventional WT CM is implemented by signal-processing-based approaches.This is achieved through detection and analysis of pre-learned signal features that are specific to particular fault modes.These features are commonly time and/or spectral domain artefacts in the monitored signals and are generally referred to as the fault signature.There is a general requirement to keep the CM process as low-cost as possible and ideally as minimally invasive to the device hardware as is practical, assuming retention of diagnostic capability.This, in principle, imposes a trade-off between the device operative features that can feasibly and practically be sensed and those that could contain an inherently higher density of diagnostic information, such as device-embedded stress in the vicinity of known failure points.The sensing technology underpinning a given CM method thus also plays an important role in the diagnostic process, and its advancement remains the objective of continuous research.
In addition to improved diagnostic reliability, the realization of more accurate maintenance planning is needed to enable more profound impact on the O&M cost that the sector requires.Although reviewed in this paper, the lack of more significant work in prognosis and especially in RUL prediction indicates a strong need for intensification of research efforts in this area.Additionally, since WT CM is generally performed based on data acquisition, and in particular vibration analysis, which is a completely unlabeled data problem, this can create challenges associated with bad generalization related to inconsistency between new forced labels and learning inputs.Furthermore, the lack of similarity in distribution between training and testing samples due to the dynamicity of working conditions could lead to mispredictions (false alarms) of a CM system.Besides, for example, some bearing problems, data have been generated from accelerated life tests that provide incomplete and unlabeled list of patterns.Therefore, future work in this space would need to attempt to fill these gaps by incorporating more knowledge from pertained models through involving GANs and TL.
Sensing for WT CM is an area that provides the principal source of diagnostic information and as such has a profound impact.As stated earlier, the general desire is to rely on a minimum number of additional sensing points to those inherent to core system operative functionality and rely on system-contained signals for diagnosis where possible.However, this level of non-invasiveness is generally a challenge to attain and can restrict the diagnostic and prognostic capability.Increasing sensor numbers or adopting alternative and more advanced sensing methodologies can improve the diagnostic relevance and coverage of measurements; the cost and complexity of the CM system need to be carefully taken into consideration in this process.Sensor failures or misreporting are highly undesirable as they increase the risk of CM system unreliability, resulting in the scheduling of unnecessary maintenance or downtime.Deployment of advanced sensing techniques could, however, lead to much improved characterization of the subassembly failure and degradation process and caries the potential to be strategically used either for development of higher-fidelity, validated diagnostic models or for dedicated, high-value componentspecific monitoring solutions.A strong interest remains in employing the readily available low-resolution standalone SCADA data, or in combination with high-resolution CM data, to improve the CM system accuracy.However, achieving high reliability diagnosis and prognosis remains a challenge.Therefore, future work is required to develop new CM methods by means of artificial intelligence and ML to improve the CM robustness and accuracy, considering also the inputs of advanced, strategic sensor inputs where pertinent.Moreover, the deployment of a CM system to WTs at the farm level would lead to new insights into predictive maintenance strategies; therefore, the performance and reliability of a CM system itself are crucial [172].Future work is also required to develop more accurate and reliable CM systems for corresponding condition-based maintenance opportunities with a multi-system approach by considering dependencies among WTs and optimizing operational decisions.

Conclusions
The paper reviews the general state of the art and upcoming advances in the area of WT CM systems by intelligent and ML approaches.The review covers recent developments in conventional signal-based CM and tools, from data-driven ML-based CM to big data mining and predictive maintenance.It has been found that the general focus in WT CM research largely remains associated with classification driven by application of ML and big data techniques and is aimed at underpinning more effective diagnosis.CM systems should detect, diagnose, and eliminate hidden faults rapidly and predict failures of the system with as little human intervention as possible, particularly given the rapidly growing size of wind farms and moving further offshore.System level automation of this process is highly desirable yet remains a challenge for the existing state of the art.The intelligent and ML approaches reviewed in this paper hold potential to provide a viable and efficient solution to improve CM capabilities and hence reliability and availability of WTs and ultimately to reduce the O&M costs.However, considerable further research is needed to achieve this goal.

Figure 1 .
Figure 1.Important components of a horizontal wind turbine.

Figure 1 .
Figure 1.Important components of a horizontal wind turbine.

33 Figure 3 .
Figure 3. General solution of machine learning problems for wind turbine condition monitoring.

Figure 3 .
Figure 3. General solution of machine learning problems for wind turbine condition monitoring.

Energies 2021 ,
14, x FOR PEER REVIEW 13 of 33 to guess to give prior assumptions for learning models by providing new enhanced representation.The training models known as GANs and TL are very popular in recent studies, which gives ML a prediction of a new impression to extend the data-driven into the knowledge-driven by providing different prior assumptions.

Figure 4 .
Figure 4. Machine learning application for condition monitoring.

Figure 4 .
Figure 4. Machine learning application for condition monitoring.

Figure 5 .
Figure 5. Gearbox components and fault types.(a) Components and rotation mechanism of gears of the planetary gearbox.(b) Gearbox gears failure types.Reproduced from [84], Elsevier and from [103], IEEE.(c) Internal race faults in high-speed shaft.Reproduced from [102], MDPI.One can provide from the literature a set of examples that have dealt with these types of failures.For instance, in the work of Cao et al. [103], they studied how to detect different states of health of the sun gear of the WT gearbox (cracked, chipped, missing root, surface defect, and healthy gears).They mainly used multiple time domain features extruded from three different accelerometers installed in different positions of the bearings (verti-

Figure 6 .
Figure 6.Yaw system structure and failure illustration.(a) Graphical illustration of yaw system components.(b) Some failure mode types of the yaw system.Reproduced from [87], Elsevier and from [111], Elsevier.

Figure 6 .
Figure 6.Yaw system structure and failure illustration.(a) Graphical illustration of yaw system components.(b) Some failure mode types of the yaw system.Reproduced from [87], Elsevier and from [111], Elsevier.
Common serious problems to WT generators remain in rolling elements such as bearings, similar to the examples of inner race defects shown in Figure8.

Figure 8 .
Figure 8. Common defects of inner race generator rolling bearings of wind turbines.Reproduced from [85], Elsevier.

Figure 8 .
Figure 8. Common defects of inner race generator rolling bearings of wind turbines.Reproduced from [85], Elsevier.

Energies 2021 , 33 Figure 9 .
Figure 9. Pie chart analysis of the used machine learning methods in wind turbine condition monitoring.

Figure 9 .
Figure 9.Pie chart analysis of the used machine learning methods in wind turbine condition monitoring.

Table 2 .
Gearbox condition monitoring state of the art review.

Table 3 .
Yaw system condition monitoring state of the art review.

Table 3 .
Yaw system condition monitoring state of the art review.

Table 4 .
Blade condition monitoring state of the art review.

Table 5 .
Generator condition monitoring state of the art review.