An Application of Instantaneous Spectral Entropy for the Condition Monitoring of Wind Turbines

: For economic and environmental reasons, the use of renewable energy sources is a key aspect of the ongoing transition to a sustainable industrialised society. Wind energy represents a major player among these natural, carbon-neutral sources. Nevertheless, wind turbines are often subject to mechanical faults, especially due to ageing. To alleviate Operation and Maintenance costs, Vibration-Based Inspection and Condition Monitoring have been proposed in recent times. This research proposes Instantaneous Spectral Entropy and Continuous Wavelet Transform for anomaly detection and fault diagnosis, departing from gearbox vibration time histories. The approach is validated on experimental data recorded from a turbine suffering bearing failure and total gearbox replacement. From a computational point of view, the proposed algorithm was found to be efﬁcient and therefore even potentially applicable for real-time monitoring.


Introduction
According to the 2020 New Energy Outlook (NEO) released by BloombergNEF, wind and solar energy are expected to grow up to 56% of global electricity demand by 2050, with wind energy retaking the lead from photovoltaic [1]. By way of example, Denmark is intended to achieve 100% non-fossil-based power generation by the same year, mostly thanks to wind power [2].
This represents a unique opportunity for transitioning from classic, polluting fuels to renewable and sustainable resources. In this regard, wind turbines are, nowadays, a well-established technology and cost-efficient, especially when grouped in wind farms (both on-and off-shore). Worldwide, the wind power capacity increased from about 13 Giga Watts in 1999 to >760 GW in 2020 [3]. Furthermore, if compared to other alternatives such as large hydropower plants, solar photovoltaic, or nuclear energy, wind power had the most stable growth in the 2005-2016 period, being less subject to market fluctuations [4].
However, a major issue for the rapidly increasing market of wind power systems is the relatively high probability of mechanical faults in wind turbines' gearboxes. In turn, this generates high Operation and Maintenance (O&M) costs. In detail, it has been estimated that fixed and variable O&M costs account for between 11% and 30% of the Levelized Cost Of Energy (LCOE, €/MWh) for onshore wind farms [5][6][7]. The costs for offshore installations are much greater, due to the accessibility constraints and the hostile environment (the topic is analysed in-depth in [8]).

The Instantaneous (Shannon) Spectral Entropy
In this research, the Instantaneous Spectral Entropy is proposed as a damage-sensitive index for condition monitoring. Specifically, the Shannon Spectral Entropy is utilised for this aim. The rationale for entropy measurements in SHM is quite straightforward. It derives from the eighth (and last) axiom of Structural Health Monitoring [28], which states that: "damage increases the complexity of a system" [29] where the definition of 'complexity' (intentionally left vague by the original authors) can be intended from both a geometrical and signal processing standpoint. In the first case, the damage is intended as a localised inhomogeneity, which can be detected from a certain time instant (the 'damage event') onwards. From the vibrational perspective, this is rather intended as the occurring of additional signal components, previously inexistent in the undamaged baseline. A classic example would be the insertion of super-and sub-harmonics due to the presence of a breathing crack in an otherwise linearly behaving structure [30]. In this sense, the concept of an entropic framework for signal analysis has been recently further detailed in [31].
Noteworthy, the concept of an 'increase' in complexity naturally incorporates the pre-existing conditions, that is to say, randomly distributed manufacturing defects in the structure under investigation and measurement noise in the recordings derived from its analysis [32]. These aspects do not affect the entropy variation when damage is inserted in the system; therefore, this framework fits well in the general context of SHM as an outlier (anomaly) detection problem, i.e., as a deviation from a known baseline [33].
As mentioned earlier, this work deals with the instantaneous definition of entropy, aiming at damage event detection. In this sense, it is necessary to recall that vibration time series can be analysed with different signal processing approaches, depending on the intended purposes. These fall into two main categories: 1.
real-time analysis, using a small moving window of recent history over the data stream.

2.
retrospective analysis, where the time series data are fully available and analysed a posteriori.
In both cases, instantaneous parameters can be used to perform event detection; this can be then applied to define the instant of damage occurrence (see, e.g., [34]). For the sake of this research, the second case (retrospective analysis) will be considered. This is not uncommon for SHM applications, where signals are sampled periodically (in the case study of this research, every hour) and then processed. Under these conditions, the rationale is that a mechanical fault should be detected shortly after its appearance, in a near-real-time fashion (i.e., with only some hours or, in the worst case, some days of delay). Truly real-time SHM is more challenging since it strictly requires an uninterrupted, seamless stream of data, plus computationally efficient routines capable of processing the raw data in a short timeframe. This is generally an unnecessary complication, apart from extremely fragile structures or systems, prone to sudden collapse.
In this regard, a notable example comes from the Aerospace Engineering field. In the case of rotorcraft's Health and Usage Monitoring Systems (HUMSs [35]), the real-time condition monitoring of the rotating components is essential due to the rotorcrafts implicit inability to sustain non-propelled flight. This sort of application falls beyond the aims of this study; nevertheless, it will be shown how the proposed approach is suitable for such tasks, thanks to a relatively short execution time.
Another important caveat should be addressed before moving on to the proper mathematical definition of SSE and ISE. That is, it is important to recall that the entropy of the recorded output does not depend solely on the system behaviour. Indeed, it reflects the frequency content of the input as well. For this reason, entropy-based approaches are particularly well-suited for the operational modal analysis of civil structures and infrastructures since the ambient vibrations can be easily approximated to a pure white Gaussian noise [23,24]. However, the concept is still suitable for deterministic driving forces. As long as the input remains constant (or at least similar), any relevant variation in the output entropy can be directly linked to a structural change in the target system. As it will be shown later for the experimental case study, this condition is satisfied for wind turbines operating at similar wind speeds. Furthermore, this input dependence can be easily bypassed by pairing ISE values with wind speed readings and considering only threshold trespassings at a constant wind speed.

Shannon Spectral Entropy
The general term spectral entropy (SE) refers to any measure of the uniformity of a signal spectral power distribution. The definition of Shannon SE originates from the works of Powell & Percival [21], based on the measure of uncertainty proposed by Shannon in [36]. Specifically, the SSE formula can be considered as the limit form of the generalised Rényi entropy for α → 1 [37]. Specifically, for a given probability distribution in the frequency domain P( f ), the SSE can be defined as where B is the total number of discrete frequencies, i.e., the bins of the distribution. Please note that here, the base 10 logarithm was applied; however, any other base can be used, without major conceptual differences. Equation (1) can be normalised by dividing it by log 10 B; nevertheless, for the sake of this research, the standard (non-normalised) definition of Equation (1) has been preferred. For a discretised power spectrum |H( f )| 2 , where H indicates the Discrete Fourier Transform of a general signal h(t), the probability distribution can be written as This can be then extended to include the time dependency.

Instantaneous Spectral Entropy
Similar to Equation (2), it is possible to define the probability distribution at time t as where H( f , t) can be any form of time-frequency (TF) representation of the signal (this aspect will be discussed in more detail in the next section). Then, the Instantaneous (Shannon) Spectral Entropy becomes Generally speaking, several options are available for the TF analysis of a given signal. For this specific case, the Continuous Wavelet Transform was applied, using a Generalised Morse Wavelet as the mother wavelet.

The Continuous Wavelet Transform and Generalised Morse Wavelet
The theory of wavelets and wavelet-based signal processing would be too long to be recalled here; the interested reader is referred to the classic works of Rioul & Vetterli [38], and Daubechies [39] for the basics concepts, and the book of Mallat [40] for a complete discussion. In a few words, the main (but not the only one) feature of any wavelet is its compact support in time. Differently from harmonic functions, which span indefinitely from t = −∞ to +∞, these brief oscillations allow capturing time-varying phenomena [41]. For this reason, orthonormal wavelets have been extensively utilised for signal analysis via Wavelet Transform (WT), especially for SHM applications (see, by way of example, [42]; a review about this topic can be found in [43]). In this regard, several variants of WT exist, depending on the shifting and scaling of the basis function, known as the mother wavelet (these points will be further discussed later). The two main forms are Discrete and Continuous WTs; here in this study, the CWT has been applied.

Continuous Wavelet Transform
Considering again the signal h(t) defined in the time domain, its CWT can be expressed (according to its most usual definition) as i.e., a convolution of the given data sequence (here, the time series) with a resized and time-translated version of the so-called mother wavelet, given as Therefore, ψ a,b depends on the shift (b) and scale (a) parameters, as well as on the original shape of this mother wavelet, which in turn is an arbitrarily selected localised oscillatory function. Converting the wavelet scale to frequency, the final result is a TF transform of the analysed time series. Please note that the term 1 2 √ a in Equation (6) is only needed to ensure equal energy at all time scales.
Importantly, it must be recalled that there is not a unique definition for the mother wavelet ψ. Any time-limited oscillatory function with zero means and that satisfies certain regularity and admissibility conditions [44] can be arbitrarily selected as the mother wavelet. In this sense, a comparative analysis for SHM purposes can be found in [45]. Here for this study, the Generalised Morse Wavelet has been tested.

Generalised Morse Wavelet
For CWT-based signal processing, analytic wavelets are currently considered the best option for precise TF analysis. These can be seen as complex-valued time/frequency localised filters with vanishing support on negative frequencies [46]. In this regard, the complex (or analytic) Morlet wavelet is arguably the most common option adopted. It has been widely applied for the extraction of instantaneous parameters, e.g., from seismic data [47]. Recently, the Morlet wavelet power spectral entropy has been investigated as well, specifically for bearing fault localisation and severity assessment [48].
However, the complex Morlet wavelet is only approximating analytical for large centre frequency ( f c ) values. For small f c values, it may not meet the admissibility condition and thus potentially lead to a negative frequency [49]. For this reason, the generalised Morse wavelet has been preferred for this application, due to its totally analytic definition and better time resolution.
The GMW was firstly envisioned by Daubechies & Paul [50], and then further detailed and investigated by Lilly & Olhede [46,49]. Its formulation can be expressed in the frequency domain (considering the natural pulsation ω = 2π f for simplicity) as [46] where U(ω) is the Heaviside (or unit) step function, α β,γ is a normalizing constant, and β and γ are the two parameters which govern the specific shape taken by the general formulation of Equation (7). More precisely, these two parameters are known as symmetry (γ) and compactness (β). Furthermore, a third parameter, the time-bandwidth product, can be defined as P 2 = βγ and is used often (but not here) in lieu of β.
The main feature of the GWT is its adaptability, as it can be dilated or contracted in the time-frequency domain to better suit the signal processing aims.
In detail, 2 √ P 2 is directly proportional to the wavelet support along the time axis. Therefore, for constant γ, increasing β implies a longer time-bandwidth. In turn, this increases the rate of the long-time decay and broadens the central portion of the resulting mother wavelet. For time-frequency analysis, a longer time duration, T, implies more refined frequency resolution, ∆ f = 1/T, which is generally useful. This point will be better discussed in the Results (Section 5).
On the other hand, the gamma parameter controls the symmetry of the wavelet in time through the demodulate skewness [49]. Increasing γ for constant β does not affect the time-bandwidth, but broadens the GWT envelope, making it more or less directional. For instance, if γ = 1, the zeroth-order GWT corresponds to the Cauchy wavelet [51], which is strictly supported in a narrow convex cone in the time-frequency domain.
Different GWT shapes can better serve different purposes. These shapes can be grouped in a piecewise fashion as follows:
Therefore, for γ = 3, both the time decay and the wavelet symmetry increase with β, and the resulting mother wavelet narrows in frequency and enlarges in time, with more oscillations under its envelope. This derives from the demodulate skewness of the GWT being null for gamma equal to 3; this results in a global maximum of the time/frequency concentration [46]-that is to say, it maximises the product of the time-domain and frequencydomain standard deviations, known as the Heisenberg's area [40]. Again, all these aspects will be further investigated in a later Section.

The Experimental Case Study
The experimental recordings from a wind turbine have been used for the validation of the proposed entropy-based Condition Monitoring strategy. The dataset (described in detail in [52]) originates from an undisclosed onshore wind farm, located in Northern Sweden and consisting of 18 2.5 MW Nordex N100 wind turbines [53]. The continuous monitoring was performed over 46 consecutive months, acquiring 1.28 s-long time series (16,384 data points for a sampling frequency f s = 12, 800) approximately every 12 h.
Specifically, Turbine #5 was considered here, as the only one (out of six installations included in the dataset) that suffered mechanical faults during the monitored timeframe.
The vibration time series of interest were collected from an accelerometer, located and oriented as indicated by the black arrow in Figure 1 (which is based on the original schematics from [52]). This was mounted on the housing of the output shaft bearing of the turbine. The three-stage gearbox was made up of two sequential planetary gear stages, followed by a helical gear stage. The position of the bearing failure is highlighted in red in Figure 1. The damage consisted of an inner raceway failure on one of the four NSK RN2240 cylindrical roller bearings (CRBs) supporting one of the planets in the first planetary gear. From visual inspection after disassembly (refer to [52,53]), the most probable cause was identified as rolling fatigue-induced flaking (according to the NSK definition [54]), with a loss of material and the consequent rough and coarse texture extended over most of the contact surface. This caused the entire gearbox to be replaced after two years of operation.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 6 o The vibration time series of interest were collected from an accelerometer, located oriented as indicated by the black arrow in Figure 1 (which is based on the original sc matics from [52]). This was mounted on the housing of the output shaft bearing of the bine. The three-stage gearbox was made up of two sequential planetary gear stages, lowed by a helical gear stage. The position of the bearing failure is highlighted in red Figure 1. The damage consisted of an inner raceway failure on one of the four NSK RN2 cylindrical roller bearings (CRBs) supporting one of the planets in the first planetary g From visual inspection after disassembly (refer to [52,53]), the most probable cause identified as rolling fatigue-induced flaking (according to the NSK definition [54]), wi loss of material and the consequent rough and coarse texture extended over most of contact surface. This caused the entire gearbox to be replaced after two years of operatio Two signals of interest (portrayed in Figure 2) were defined by concatenating chronological order some consecutive time histories (THs), recorded from Turbine # described in Table 1 (the period column in Table 1 reports the time since the beginnin the continuous monitoring as year fractions). The concatenating procedure applied h reflects what was performed by Figueiredo et al. [55], to artificially generate a nonstat ary experimental benchmark from stationary experimental recordings, emulating ti varying structural conditions with abrupt changes. Two signals of interest (portrayed in Figure 2) were defined by concatenating in chronological order some consecutive time histories (THs), recorded from Turbine #5 as described in Table 1 (the period column in Table 1 reports the time since the beginning of the continuous monitoring as year fractions). The concatenating procedure applied here reflects what was performed by Figueiredo et al. [55], to artificially generate a nonstationary experimental benchmark from stationary experimental recordings, emulating time-varying structural conditions with abrupt changes.
For signal #1, three segments were considered. These correspond to one recording (the first one) shortly before replacement and two acquisitions (the remaining two thirds) shortly after. These latter two were chosen since they are consecutive recordings with very similar rotational speeds (i.e., very similar external input; reported as cycles per minute).
Signal #1 was intended to study the effectiveness of the algorithm, presenting a discussion on parameter setting. For Machine Learning purposes, only the first tract was used as training data for the statistical modelling of the normal operating conditions (NOCs). The second and third parts of the signal provided the test points for the damage index.  Table 1). The single segments are separ by thin vertical lines.
. In (a), the dot-dashed vertical lines represent the three concatenated segments. In (b), the thick dot-dashed vertical lines enclose the segments considered for the training set, the two validation sets, and the two test sets, in this order (see Table 1). The single segments are separated by thin vertical lines.
For signal #2, a larger set of acquisitions were included. In this latter case, by considering only the optimised algorithm parameters, the intent was to address the full capabilities of the proposed approach when trained on more than one tract. Thus, signal #2 was evaluated on data with (almost) comparable rotational speeds before and after bearing damage. Specifically, the following segments were used: 1.
seven consecutive tracts corresponding to 14 months before fault (the first five elements for training and the last two for validation), 2.
the single tract already included in signal #1 plus five other nearby tracts immediately before fault (all included for further validation), 3.
three recordings taken immediately after replacement, including the two already considered in signal #1 (all considered for testing), 4.
other five segments acquired 7 months later (considered again for further testing).
The second part of Table 1 reports more details about these THs. For both signal #1 and #2, the data, originally reported in terms of [g], were standardised (subtracting the mean and dividing by the standard deviation, for each recording separately) to remove any potential issue related to the different amplitudes. Figure 3 reports the Fast Fourier Transforms (FFT) of the five sets included in signal #2. One can notice that the multiple harmonic components of each acquisition do not allow for a simple comparison between the frequency content before and after the gearbox replacement. Thus, the common damage detection strategy based on the analysis of the frequency shift is hardly feasible in these circumstances. The same can be said for signal #1 as well since it comprises a subset of the segments of signal #2. Indeed, the limited reliability of FFTbased signal analysis for these typologies of bearing failure in wind turbine gearboxes was reported as well in [53].

Signal #1
An example of results is reported in Figure 4. The green line corresponds to the Instantaneous Spectral Entropy, defined at any timestep. The two horizontal dashed lines correspond to the upper and lower bounds of the Gaussian distribution fitted over the ISE values of the training set, that is to say, ISE( ) ≡ + 2 and ISE( ) ≡ − 2 , where and 2 correspond, in the same order, to the mean and standard deviation of the baseline tract, which is assumed to be almost stationary.
However, it was verified that these two thresholds were not optimal for anomaly detection. Indeed, as it can be seen, the ISE( ) value is quite unstable and subject to strong, rapid fluctuations. For this reason, a moving mean (calculated over a sliding window of 10,000 timesteps, i.e., 0.84 s) was preferred as a more stable indicator. This is indicated by the thick black line. The area shaded in grey corresponds to its expected values in 'normal' conditions, defined (similarly as before) as all points in between , − 2 , < ISE( ) < , + 2 , , i.e., with a 95.45% confidence of belonging to the same population as the training data points. One can see that, for the two test scenarios, the value of the moving average of ISE( ) generally deviated from the previously stationary conditions, trespassing the lower threshold. This can be used to perform automated and instantaneous fault detection.
The results portrayed in Figure 4 focus on a single value of symmetry ( = 3) and varying . The effects of these two parameters have been thoroughly investigated. The findings will be discussed in the next subsection. However, the two points (1) and (2) highlighted above were encountered for any combination of and . This proves that

Signal #1
An example of results is reported in Figure 4. The green line corresponds to the Instantaneous Spectral Entropy, defined at any timestep. The two horizontal dashed lines correspond to the upper and lower bounds of the Gaussian distribution fitted over the ISE values of the training set, that is to say, ISE(t) ≡ µ b + 2σ b and ISE(t) ≡ µ b − 2σ b , where µ b and 2σ b correspond, in the same order, to the mean and standard deviation of the baseline tract, which is assumed to be almost stationary.
However, it was verified that these two thresholds were not optimal for anomaly detection. Indeed, as it can be seen, the ISE(t) value is quite unstable and subject to strong, rapid fluctuations. For this reason, a moving mean (calculated over a sliding window of 10,000 timesteps, i.e., 0.84 s) was preferred as a more stable indicator. This is indicated by the thick black line. The area shaded in grey corresponds to its expected values in 'normal' conditions, defined (similarly as before) as all points in between µ movmean,b − 2σ movmean,b < ISE(t) < µ movmean,b + 2σ movmean,b , i.e., with a 95.45% confidence of belonging to the same population as the training data points. One can see that, for the two test scenarios, the value of the moving average of ISE(t) generally deviated from the previously stationary conditions, trespassing the lower threshold. This can be used to perform automated and instantaneous fault detection.
The results portrayed in Figure 4 focus on a single value of symmetry (γ = 3) and varying β. The effects of these two parameters have been thoroughly investigated. The findings will be discussed in the next subsection. However, the two points (1) and (2) highlighted above were encountered for any combination of γ and β. This proves that the ISE(t), especially when smoothed via a moving average with a properly sized window length, is, overall, effective and efficient as a time-dependent DSF, thus suitable for damage event detection.

Sensitivity Analysis for the GWT Parameters
Since the ISE( ) is a synthetic feature derived from the TF transform of the recorde signal, better time and frequency resolution will return more reliable results. Thus, it essential to optimise the CWT settings. In the case investigated here, as mentioned in Se tion 3, the particular shape of the Generalised Morse (mother) Wavelet depends excl sively on the values considered for the doublet of parameters ( , ). Therefore, to improv the capabilities of the proposed DSF, fine-tuning these two parameters becomes the mo critical aspect of the whole procedure. For this reason, a dedicated sensitivity analysis h been performed.

2.
varying from 2 to 40 in steps of 2.
Hence, a total of 140 combinations were analysed. The range of was defined to n exceed the suggested ≤ 40 ratio [49]. Accordingly, (as the product of the lowest va ues of both and ) ranges from a minimum of 2 to a maximum of 160. The aim of this optimisation is dual. For the training dataset, the data should behav as homogeneously as possible, to clearly define a NOCs model. This implies the sign stationarity (that is, constant mean and standard deviation ) and low variability (i. low sigma values).
For a constant gamma (in the previous example of Figure 4, = 3) increasing d creased the absolute value of both (ISE) and (ISE). This latter point resulted in a na rower interval of confidence. In turn, this increased the sensitivity to damage as th threshold was lowered. The same trend was observed for all the other values of inve tigated here as well.
Regarding the testing part (second and third tracts of the concatenated signal), it also noticeable how larger values increased the detectability of the fault conditio making a more marked transition from the "pre" to "post" damage insertion condition

Sensitivity Analysis for the GWT Parameters
Since the ISE(t) is a synthetic feature derived from the TF transform of the recorded signal, better time and frequency resolution will return more reliable results. Thus, it is essential to optimise the CWT settings. In the case investigated here, as mentioned in Section 3, the particular shape of the Generalised Morse (mother) Wavelet depends exclusively on the values considered for the doublet of parameters (β, γ). Therefore, to improve the capabilities of the proposed DSF, fine-tuning these two parameters becomes the most critical aspect of the whole procedure. For this reason, a dedicated sensitivity analysis has been performed.

2.
β varying from 2 to 40 in steps of 2. Hence, a total of 140 combinations were analysed. The range of β was defined to not exceed the suggested β γ ≤ 40 ratio [49]. Accordingly, P 2 (as the product of the lowest values of both β and γ) ranges from a minimum of 2 to a maximum of 160.
The aim of this optimisation is dual. For the training dataset, the data should behave as homogeneously as possible, to clearly define a NOCs model. This implies the signal stationarity (that is, constant mean µ and standard deviation σ) and low variability (i.e., low sigma values).
For a constant gamma (in the previous example of Figure 4, γ = 3) increasing β decreased the absolute value of both µ(ISE) and σ(ISE). This latter point resulted in a narrower interval of confidence. In turn, this increased the sensitivity to damage as the threshold was lowered. The same trend was observed for all the other values of γ investigated here as well.
Regarding the testing part (second and third tracts of the concatenated signal), it is also noticeable how larger β values increased the detectability of the fault condition, making a more marked transition from the "pre" to "post" damage insertion conditions. Again, this finding was verified for all values of γ ∈ [1, 4].
As anticipated, this derives from the detail of the TF representation. Figure 5 shows this point for the examples reported in Figure 4. Please note that to avoid any potential aliasing issue and considering the very high sampling frequency, the TF was truncated at f s /4 (3200 Hz). The instant corresponding to the damage event is marked by the red dashed line.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 11 of 19 As anticipated, this derives from the detail of the TF representation. Figure 5 shows this point for the examples reported in Figure 4. Please note that to avoid any potential aliasing issue and considering the very high sampling frequency, the TF was truncated at /4 (3200 Hz). The instant corresponding to the damage event is marked by the red dashed line. One can see that the TF representation for = 2 and = 3 is clearly unreliable. The TF representations become more and more refined over the frequency axis as both (i) the length of the time support and (ii) the number of oscillations under the GWT envelope (and therefore the instantaneous frequency resolution) increase with ( Figure 6). Since = , this latter effect can be achieved by independently increasing or . On the other hand, the effects of varying for constant can be summarised as follows. One can see that the TF representation for β = 2 and γ = 3 is clearly unreliable. The TF representations become more and more refined over the frequency axis as both (i) the length of the time support and (ii) the number of oscillations under the GWT envelope (and therefore the instantaneous frequency resolution) increase with P 2 (Figure 6). Since P 2 = βγ, this latter effect can be achieved by independently increasing β or γ.
As anticipated, this derives from the detail of the TF representation. Figure 5 show this point for the examples reported in Figure 4. Please note that to avoid any potenti aliasing issue and considering the very high sampling frequency, the TF was truncated /4 (3200 Hz). The instant corresponding to the damage event is marked by the re dashed line. One can see that the TF representation for = 2 and = 3 is clearly unreliable. Th TF representations become more and more refined over the frequency axis as both (i) th length of the time support and (ii) the number of oscillations under the GWT envelop (and therefore the instantaneous frequency resolution) increase with ( Figure 6). Sinc = , this latter effect can be achieved by independently increasing or . On the other hand, the effects of varying for constant can be summarised as follow On the other hand, the effects of varying γ for constant β can be summarised as follows. For any value of the symmetry value, lower values of β (β = 2 in the example of Figure 7) have too coarse a frequency resolution and are thus almost unusable for the intended purposes, as it can be seen from the ISE values barely changing when moving from pre-to post-damage conditions. The corresponding TF representations are reported in Figure 8.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 12 of 19 For any value of the symmetry value, lower values of ( = 2 in the example of Figure 7) have too coarse a frequency resolution and are thus almost unusable for the intended purposes, as it can be seen from the ISE values barely changing when moving from pre-to post-damage conditions. The corresponding TF representations are reported in Figure 8.  On the other hand, ≥ 4 ÷ 6 might return acceptable results, depending on the paired values. Generally speaking, ≤ 2.0 ÷ 2.5 returned distorted results, at least for the application investigated here. This is even more noticeable for ≤ 1.5, which were found to be unreliable for any value of . The results were instead less influenced by when the symmetry parameter was set as equal or larger than 2.0 ÷ 3.0. By way of example, for = 20, any gamma larger than 2 returned almost the same ISE(t) time history (see Figure 9; the corresponding TF transforms are reported in Figure 10). For any value of the symmetry value, lower values of ( = 2 in the example of Figure 7) have too coarse a frequency resolution and are thus almost unusable for the intended purposes, as it can be seen from the ISE values barely changing when moving from pre-to post-damage conditions. The corresponding TF representations are reported in Figure 8.  On the other hand, ≥ 4 ÷ 6 might return acceptable results, depending on the paired values. Generally speaking, ≤ 2.0 ÷ 2.5 returned distorted results, at least for the application investigated here. This is even more noticeable for ≤ 1.5, which were found to be unreliable for any value of . The results were instead less influenced by when the symmetry parameter was set as equal or larger than 2.0 ÷ 3.0. By way of example, for = 20, any gamma larger than 2 returned almost the same ISE(t) time history On the other hand, β ≥ 4 ÷ 6 might return acceptable results, depending on the paired γ values. Generally speaking, γ ≤ 2.0 ÷ 2.5 returned distorted results, at least for the application investigated here. This is even more noticeable for γ ≤ 1.5, which were found to be unreliable for any value of β. The results were instead less influenced by β when the symmetry parameter was set as equal or larger than 2.0 ÷ 3.0. By way of example, for β = 20, any gamma larger than 2 returned almost the same ISE(t) time history (see Figure 9; the corresponding TF transforms are reported in Figure 10).  In conclusion, the following points can be highlighted from this sensitivity analysis: 1. Values of ≥ 20 are all deemed suitable. Increasing reduces the variability of the results, at least on the specific case study investigated here. Therefore, large values of are recommended, independently from the selected .

2.
≥ 3 is suggested to avoid potential issues, even if 2.5 ≤ ≤ 3 were found to be suitable as well for the range of selected above, at least for this case study.
The suggested range is depicted in Figure 11, considering the mean and standard deviation of the ISE(t) computed over the NOCs. As mentioned previously, it is important  In conclusion, the following points can be highlighted from this sensitivity analysis: 1. Values of ≥ 20 are all deemed suitable. Increasing reduces the variability of the results, at least on the specific case study investigated here. Therefore, large values of are recommended, independently from the selected .

2.
≥ 3 is suggested to avoid potential issues, even if 2.5 ≤ ≤ 3 were found to be suitable as well for the range of selected above, at least for this case study.
The suggested range is depicted in Figure 11, considering the mean and standard deviation of the ISE(t) computed over the NOCs. As mentioned previously, it is important In conclusion, the following points can be highlighted from this sensitivity analysis:

1.
Values of β ≥ 20 are all deemed suitable. Increasing β reduces the variability of the results, at least on the specific case study investigated here. Therefore, large values of β are recommended, independently from the selected γ.

2.
γ ≥ 3 is suggested to avoid potential issues, even if 2.5 ≤ γ ≤ 3 were found to be suitable as well for the range of β selected above, at least for this case study.
The suggested range is depicted in Figure 11, considering the mean and standard deviation of the ISE(t) computed over the NOCs. As mentioned previously, it is important to have stationary and low variance over the training dataset. This condition was confirmed for β ≥ 20 and γ ≥ 3. This selected area is also intended to avoid the undesired timedomain sidelobes and frequency-domain asymmetries that arise from very small timebandwidth and large symmetry values.

Computational Efficiency
The computational effort required by the feature extraction procedure was further tested. This is essential for online SHM since the whole algorithm (feature extraction and threshold validation) should be performed in real-time, i.e., during the acquisition of an uninterrupted data stream. This is generally performed via a small moving window of recent history. Therefore, the computational time needed should be smaller than the considered time window.
Of the two main steps-TF transform and ISE( ) computation-the first one is the most demanding. Indeed, instantaneous SSE was performed (for the signal length considered here) in less than 0.6 s on average. This test, as well as the following ones, was performed on a laptop equipped with Windows 10 64-bit, Intel Core i7-7700HQ with CPU 2.80 GHz and 16.0 GB RAM, and MatLab R2020b.
The CWT was found to be slightly longer to perform. As it can be seen from Figure  12, the elapsed time is mostly equal for any pair of values in the inspected ranges of the two GWT parameters. Except for ( = 2, = 40) and ( = 2.5, = 10), which lasted, respectively, for 5.8 s and 3.5 s, the CWT ran in less than 2.0 s everywhere else. Considering all cases, the overall mean elapsed time is about 1.08 s (1.02 s excluding the two outliers), with many combinations running in < 1 s.

Computational Efficiency
The computational effort required by the feature extraction procedure was further tested. This is essential for online SHM since the whole algorithm (feature extraction and threshold validation) should be performed in real-time, i.e., during the acquisition of an uninterrupted data stream. This is generally performed via a small moving window of recent history. Therefore, the computational time needed should be smaller than the considered time window.
Of the two main steps-TF transform and ISE(t) computation-the first one is the most demanding. Indeed, instantaneous SSE was performed (for the signal length considered here) in less than 0.6 s on average. This test, as well as the following ones, was performed on a laptop equipped with Windows 10 64-bit, Intel Core i7-7700HQ with CPU 2.80 GHz and 16.0 GB RAM, and MatLab R2020b.
The CWT was found to be slightly longer to perform. As it can be seen from Figure 12, the elapsed time is mostly equal for any pair of values in the inspected ranges of the two GWT parameters. Except for (γ = 2, β = 40) and (γ = 2.5, β = 10), which lasted, respectively, for 5.8 s and 3.5 s, the CWT ran in less than 2.0 s everywhere else. Considering all cases, the overall mean elapsed time is about 1.08 s (1.02 s excluding the two outliers), with many combinations running in <1 s.  Figure 13 reports the results for the second, longer signal. = 40 and = 3 were set for this further study. One can notice that except for some tracts where the rotationa speed was slightly higher than the average, the normality model-defined over five con secutive acquisitions-was validated almost everywhere on both the two validation sets (corresponding to, respectively, 14 months and immediately before the gearbox replace ment). This indicates that the process has a certain level of robustness for relatively similar rotational speeds. Nevertheless, in some segments (more markedly in tract #1489, belong ing to the second validation set, and to a lesser extent in #1487 and #1588, from the same set), three false positives were encountered. This can be explained since in binary classifi cation, fine-tuning the classification algorithm parameters generally induces an incremen in both the true positive and the false positive rates.

Signal #2
Immediately after the gearbox replacement, for a comparable wind speed, the ISE showed a noticeable decrease (as already described in the previous subsection). This be haviour was confirmed in the second test set, i.e., after more than seven months from the gearbox replacement. One can notice how some elements of the training and validation datasets have almost the same rotational speed but different ISE-e.g., tract #1588, the las one of the second validation set, and #2021, the first one of the second test set, correspond respectively, to 785.45 and 786.29 cpm. This seems to indicate that the different output is not linked to any input variation but rather to structural changes.   Figure 13 reports the results for the second, longer signal. β = 40 and γ = 3 were set for this further study. One can notice that except for some tracts where the rotational speed was slightly higher than the average, the normality model-defined over five consecutive acquisitions-was validated almost everywhere on both the two validation sets (corresponding to, respectively, 14 months and immediately before the gearbox replacement). This indicates that the process has a certain level of robustness for relatively similar rotational speeds. Nevertheless, in some segments (more markedly in tract #1489, belonging to the second validation set, and to a lesser extent in #1487 and #1588, from the same set), three false positives were encountered. This can be explained since in binary classification, fine-tuning the classification algorithm parameters generally induces an increment in both the true positive and the false positive rates.  Figure 13 reports the results for the second, longer signal. = 40 and = 3 were set for this further study. One can notice that except for some tracts where the rotational speed was slightly higher than the average, the normality model-defined over five consecutive acquisitions-was validated almost everywhere on both the two validation sets (corresponding to, respectively, 14 months and immediately before the gearbox replacement). This indicates that the process has a certain level of robustness for relatively similar rotational speeds. Nevertheless, in some segments (more markedly in tract #1489, belonging to the second validation set, and to a lesser extent in #1487 and #1588, from the same set), three false positives were encountered. This can be explained since in binary classification, fine-tuning the classification algorithm parameters generally induces an increment in both the true positive and the false positive rates.

Signal #2
Immediately after the gearbox replacement, for a comparable wind speed, the ISE showed a noticeable decrease (as already described in the previous subsection). This behaviour was confirmed in the second test set, i.e., after more than seven months from the gearbox replacement. One can notice how some elements of the training and validation datasets have almost the same rotational speed but different ISE-e.g., tract #1588, the last one of the second validation set, and #2021, the first one of the second test set, correspond, respectively, to 785.45 and 786.29 cpm. This seems to indicate that the different output is not linked to any input variation but rather to structural changes.  Immediately after the gearbox replacement, for a comparable wind speed, the ISE showed a noticeable decrease (as already described in the previous subsection). This behaviour was confirmed in the second test set, i.e., after more than seven months from the gearbox replacement. One can notice how some elements of the training and validation datasets have almost the same rotational speed but different ISE-e.g., tract #1588, the last one of the second validation set, and #2021, the first one of the second test set, correspond, respectively, to 785.45 and 786.29 cpm. This seems to indicate that the different output is not linked to any input variation but rather to structural changes.

Discussion
The experimental results show that the Instantaneous (Shannon) Spectral Entropy can be effectively used as a time-dependent damage index for Pattern Recognition-based SHM. However, one must use care in obtaining a time-frequency distribution which is as refined as possible, especially along the frequency axis. For this reason, a dedicated study was carried out.
High γ and β values return the highest possible time duration and number of oscillations under the GWT envelope, which in turn grants the highest frequency resolution achievable at any instant. This greatly increases the damage detection capabilities of the proposed entropy-based approach. The performance reaches a plateau at a certain point, where the instantaneous power spectrum is refined enough, and further increasing ∆ f does not significantly improve the final results. In this application, as mentioned previously, this was found at β ∼ = 20, γ ∼ = 3, which therefore constitute the lower boundaries of the range of suggested settings. If one is interested in further improving the detectability of damage (by further shrinking the confidence interval around the normality model), according to these findings, increasing both β and γ might be helpful. The optimal pair of (γ, β) is, therefore, only limited by possible variations in the computational effort required. However, from the point of view of the computational effort, it was found that there is no relevant difference between different parameters.
However, due to the well-known optimal trade-off between time and frequency resolution (recalled in Section 3.2), it is strongly suggested to test γ = 3 as a first attempt, independently from the specific dataset, before increasing β. Only if, after reaching β = 40, the resolution ∆ f is still insufficient to obtain reliable ISE(t) estimates, the symmetry parameter should be further increased.

Conclusions
The Instantaneous Spectral Entropy (ISE) has been discussed as a potential timedependent damage index. The Shannon Spectral Entropy (SSE) definition was used for this aim. The goal of this research was to perform damage event detection retrospectively on acquired vibration time histories, recorded from a wind turbine gearbox. The Continuous Wavelet Transform (CWT) was applied to define the time-frequency representation needed to extract the ISE. In this regard, a sensitivity analysis has been carried out on the parameters of the Generalised Morse Wavelet (GMW), to investigate their effects on the final results. Some suggestions were made based on the experimental data analysed here. However, further studies will be needed to assess these findings on different datasets, also considering different sampling frequencies and varying the duration of the recorded measurement time series.
The pure ISE was found to be significantly affected by spikes, i.e., very fast, shorttermed transients of the instantaneous entropy. For this reason, its moving mean has been preferred as a more stable time-dependent index. The procedure successfully detected the occurrence of bearing faults on experimental data, recorded before and after a bearing failure and concatenated to emulate a time-varying structural condition. The moving mean of ISE returned some isolated false alarms as well; however, only an actual alteration of the machine condition caused a permanent deviation from the baseline.
As an entropy-based approach, the ISE is inherently limited by its dependence on the input force. That is to say, measurements corresponding to highly different wind speeds cannot be directly compared for anomaly detection. Future works will also include the automation of the selection procedure for acquisitions related to similar wind speeds.
Finally, the relatively low computational burden and rapid execution time suggest that the method can be further extended to real-time, online applications. This can be achieved using a small moving window of recent history over the data stream. These potential applications will be further investigated in future works as well.