Damage Diagnosis for Offshore Wind Turbine Foundations Based on the Fractal Dimension

Cost-competitiveness of offshore wind depends heavily in its capacity to switch preventive maintenance to condition-based maintenance. That is, to monitor the actual condition of the wind turbine (WT) to decide when and which maintenance needs to be done. In particular, structural health monitoring (SHM) to monitor the foundation (support structure) condition is of utmost importance in offshore-fixed wind turbines. In this work a SHM strategy is presented to monitor online and during service a WT offshore jacket-type foundation. Standard SHM techniques, as guided waves with a known input excitation, cannot be used in a straightforward way in this particular application where unknown external perturbations as wind and waves are always present. To face this challenge, a vibration-response-only SHM strategy is proposed via machine learning methods. In this sense, the fractal dimension is proposed as a suitable feature to identify and classify different types of damage. The proposed proof-of-concept technique is validated in an experimental laboratory down-scaled jacket WT foundation undergoing different types of damage.


Introduction
Structural health monitoring's (SHM) main purpose is to diagnose in time damage that affects the integrity of a structure and determine whether repair or reinforcement actions are required to avoid or delay its degradation. Generally, SHM strategies consist of the following steps: (i) the strategic placement of sensors in the overall structure; (ii) data collection and communication; and (iii) analysis of the measured data.
It is important to note that, in a wide variety of applications, guided waves, which is a nondestructive approach, is the usual standard. This approach relies on exciting the structure with low frequency ultrasonic waves and then sensing the reflected response waves. Thus, the method relies heavily on the fact that the input excitation is known and also that other perturbations can be filtered or neglected. On the one hand, in civil infrastructures, such as bridges, it is feasible to assume that external perturbations can be neglected or filtered with respect to the induced excitation, see [1] and [2]. On the other hand, in other applications, as in aerospace, the structure can only be diagnosed with this approach when it is not in service. This strategy is used, for example, in [3] where a multiarea scanning ultrasonic system is built in a hangar to rapidly scan the airplane overall structure. This type of not in service diagnose (the airplane can be diagnosed during no-flight conditions, when it is in the hangar) or neglecting the external perturbations (as in SHM for standard civil structures as bridges or buildings) cannot be straightforwardly extrapolated to the main research area of the present work: wind turbines. Online and in-service SHM for wind turbines (WTs) are extremely important. WTs are extremely large structures subject to remarkable external unknown excitations such as wind and waves in the offshore case. Thus, SHM strategies for WTs must be able to cope with unknown significant external excitations hindering the use of the standard exciting-and-sensing approach [4]. To face this challenge, in this work, a vibration-response-only SHM strategy is stated to monitor online and during service a WT offshore fixed foundation by using only the excitation caused by the external and unknown perturbations.
Offshore wind power will expand dramatically in the next two decades, multiplying by 15 by 2040 to a minimum of 345 gigawatts (GW) of installed capacity, according to the Offshore Wind Outlook 2019 report of the International Energy Agency [5]. However, this achievement will only be possible through cost-competitiveness of offshore wind, which depends entirely on SHM capacity to switch preventive maintenance to predictive one [6]. Thus, SHM for offshore assets is imperative to guarantee its exploitability. Hence, in this work, a SHM methodology for offshore fixed foundations is proposed.
Nowadays, the SHM systems for WTs are mostly deployed to blades [7] and tower [8] but research of SHM for offshore support structures is still scarce [9]. The state of the art in this very specific area has three main research lines: (i) model-based, using, for example, the finite element method as in [10][11][12]; (ii) data-based using solely experimental and/or real data; and (iii) a hybrid approach that makes use of real and/or experimental data and numerical models.
Regarding the first option, the work of Stutzamnn et al. is noteworthy [13] where crack detection of monopile offshore foundations is accomplished based on numerical simulations of fatigue cracks. Regarding the second option, a comprehensive review is given in [14] about SHM of offshore WTs through the statistical pattern recognition paradigm. In this review, it is shown that the usual strategy, regarding offshore WT damage detection, is to identify changes in the modal properties. However, this strategy requires detailed attention to take into account the operational and environmental impact, and usually only damage detection (but not classification) is accomplished. For example, in [15] a SHM approach verified on a full-scale foundation is presented. However, dynamic variability between different operational cases only allows the final results to indicate an overall stiffening of the structure but not to conclude whether damage is present or not. Regarding the third approach, the work by Gomez et al. [16] is noteworthy based on acceleration response data and calibrated computer models. However, this work is based in the usual operational modal analysis and holds the difficulties of this type of approach including the fact that only detection (but not classification of damage type) is acquired. In this work, facing the challenge posed by the previous references, different damage types are taken into account and its classification is achieved in an experimental down-scaled jacket WT foundation. It should be noted that the experimental testbed is a reduced model but well-founded for this proof-of-concept work as it is comparable to that employed in the following works: (i) [17], where damage detection is achieved via damage indicators; (ii) [18], where damage detection is obtained via statistical time series analysis; (iii) [19], based on principal component analysis and support vector machines; and (iv) [20], where a deep learning approach based on convolutional neural networks is employed.
It is well known that machine learning requires a feature extraction preprocess. It is a challenge to find suitable features, sensitive to physical characteristics, that lead to the identification of the damage or fault [21]. In this work, the fractal dimension (FD) of the data time series is employed as the main feature. The FD has been used traditionally as a feature for medicine applications. For example, in [22] experiments on intensive care unit data sets show that the FD characterizes the time series better than the correlation dimension; in [23] FD is proven to be discriminant for the detection of epileptic seizures in intracranial electroencephalogram signals; and in [24] glaucomatous eye detection is proposed based on FD estimation. However, it was not until recently that FD has been explored as feature for structural damage detection. It is important to note the recent work by Rezaie et al. [25], where FD for crack pattern recognition is studied. It is also important to note the work by Wen et al. [26] where FD is shown to be effective to realize fault diagnosis of rolling element bearings and cope with the effects of variation in operating conditions. In this work, the FD feature is proposed for the vibration-response signals inspired by the physical insight that the different fractal structures of these signals should be capable to discriminate different types of damage in jacket-type offshore foundations.
The paper is arranged as follows. First, the laboratory test bed and damage scenarios are briefly introduced in Section 2. Section 3 addresses the detailed statement of the developed damage diagnosis strategy that encompasses the following steps: (i) data collection and manipulation; (ii) fractal dimension feature extraction by means of the Katz's algorithm; and (iii) normalization and classification tools.
The experimental results are comprehensively stated in Section 4. Finally, conclusions are drawn in Section 5.

Experimental Test Bed
The reliability of the damage diagnosis approach presented in this paper is verified using different types of damage in an experimental test bed modeling a jacket-type WT as in [19]. For a very detailed description of the function generator, the amplifier and inertial shaker, the sensor network, the data acquisition system, how the vibration signals are acquired and how the time domain waveforms are processed, readers are referred to [19,20].
A brief characterization of the experimental setup of the small scale wind turbine is described below. First, a function generator (model GW INSTEK AF-2005) is used to produce a white noise signal with four different amplitudes (0.5, 1, 2, and 3) that account for different wind speed regions. This signal is then amplified and used as input to a modal shaker (GW-IV47 from Data Physics) that induces vibration in the structure. The overall description of the test bench is displayed in Figure 1a. The structure is 2.7 m high and consists of three parts: (i) the top beam; (ii) the tower; and (iii) the jacket.
The top beam is 1 meter wide and 0.6 meters high and the inertial shaker is attached to one of the ends of the beam. Three tubular sections united with bolts form the tower. Finally, the jacket is a pyramidal structure composed of steel bars of different lengths as well as steel sheets.
The vibration of the structure is measured by means of the data acquisition system cDAQ-9188 (National Instruments) and through 8 triaxial accelerometers (model 356A17, PCB Piezotronic) optimally placed following the work by Zugasti (2014) [17], as can be seen in Figure 1b.
In this work we have considered the same 4 different structural states as in the work by Puruncajas et al. [20]. All of the structural states refer to the jacket bar illustrated in Figure 1a. These states are: the healthy structure with the original healthy steel bar; (ii) the healthy structure where the original bar is replaced by a replica; (iii) the structure with a 5 mm crack damaged bar; and (iv) the structure with an unlocked bolt in the jacket.

Damage Diagnose Strategy
In this section the damage diagnosis strategy is stated. First, a detailed description on data collection and manipulation is given. On the one hand, how data is collected and reshaped is of utmost importance in machine learning in general and for this specific application in particular, see [27,28]. On the other hand, it is well known that feature selection allows to improve the classification performance making faster and more profitable the classifiers [21]. In this regard, the fractal dimension feature is introduced for damage classification purposes, as well as a physical insight of its nature for time series and a detailed explanation about the Katz's algorithm used to compute it. Finally, three machine learning classifiers are reviewed and tested for damage classification.

Data Collection and Manipulation
A total of 100 experimental tests have been conducted that include the four amplitudes that represent the different speed regions. More precisely: (i) 10  For each experimental test, the acceleration has been measured through 24 sensors during 59.51636719 seconds and with a sampling frequency of 275.28 Hz, which leads to 16384 time instants and a time step of about ∆ = 0.0036328125 sec.
The raw data of the k−th experimental test, k = 1, . . . , 100, can be arranged as the matrix X (k) in Equation (1). Each of the 24 columns of matrix X (k) contain the 16, 384 measures of each sensor: Each column of the matrix X (k) in Equation (1) is reshaped into a 64-by-256 matrix to build a new matrix Y (k) ∈ M 64×(256·24) (R) in Equation (2): Two are the main reasons for reshaping the matrix Y (k) in Equation (2): (i) on the one hand, for a single experimental test, we create 64 rows. Each one of these rows is what we call a sample; (ii) on the other hand, each sample will contain time-history measures of the whole set of sensors.
We will see in Section 4 that when we want to diagnose whether a wind turbine is healthy or not, we just need to measure these 24 sensors during 256 time instants, that is, during 256∆ ≈ 0.93 sec.
To define the matrix that contains all the data, the matrices Y (k) , k = 1, . . . , 100, from each experiment, are stacked to define (2) . . .

Fractal Dimension
Fractal geometry was proposed by Benoît Mandelbrot [29] and it is a relatively new mathematics discipline which has found a lot of applications in bio-science [30-32], engineering [33] and many other fields [34].
Euclidean geometry describes common geometric forms like lines, planes, spheres or rectangular volumes. Each of the geometric objects considered so far has an integer dimension (D), either 1, 2, or 3. However, many natural shapes do not harmonize with the integer-based idea of dimension.
In order to give meaning to noninteger dimensions, a more mathematical description of dimension proposed by P. Bourke [35] is based on "how the size of an object behaves as the linear dimension increase". More precisely, consider, for instance, three objects with dimensions D = 1 (a line segment); D = 2 (a square); and D = 3 (a cube). If the line segment, the square and the cube are linearly scaled by a factor of 2, then the results are 2 copies, 4 copies, and 8 copies of the initial objects, respectively. In other words, the length (characteristic size) of the line segment is doubled (Figure 2a The relation between the scaling factor S, the dimension D and the number of generated copies N (increasing size) can be generalized and expressed as: which is equivalent to Since D is defined in terms of N and S in Equation (5), it is possible to find the dimension, for instance, of the famous Koch curve [36]. In the case of the Koch curve, at each step, we divide the line segment into S = 3 segments of equal length and we draw an equilateral triangle that has the middle as its base and points outward. Therefore, we have created N = 4 copies (the two external sides of the original line segment and the two sides of the triangle). Consequently, the fractal dimension D Koch of the Koch curve is: As it is very well known, fractals are self-similar subsets of the Euclidean space where the fractal dimension defined in Equation (5) surpasses their topological dimension. Fractals have the same appearance at different scales. In this sense, many time series of different processes can be considered as fractals, since many parts taken from these time series, scaled by proper factors, are similar to the whole series. Considering that the fractal dimension is, somehow, a measure of the complexity that is repeating on each scale, it seems very interesting to compute the fractal dimension of a time series. In this regard, there are several algorithms that can be applied to estimate the fractal dimension of a time series. The approach used in this paper to estimate the fractal dimension is Katz's algorithm, that is summarized in Section 3.2.1.

Katz's Algorithm
For a given sensor τ = 1, . . . , 24, the time series used in this work are the rows in matrix Y in Equation (3). More precisely, for a given row i = 1, . . . , 6400 and a given sensor τ = 1, . . . , 24, the associated time series are composed of a sequence of ν = 256 points, where Y[α, β] represents the element in the α-th row and β-th column of matrix Y.
To estimate the fractal dimension of the time series, Katz [37] defines two magnitudes, L and d, see Figure 3. On the one hand, the total length of the curve L is defined as the sum of the distance between two consecutive points. More precisely, for a given row i = 1, . . . , 6400 and a given sensor τ = 1, . . . , 24: On the other hand, d is the diameter or planar extent of the time series and it is defined as the maximum distance between the first point in the time series and the rest of points. More precisely, for a given row i = 1, . . . , 6400 and a given sensor τ = 1, . . . , 24: The last step in the Katz's algorithm is the normalization of both L i,τ and d i,τ by the average distance a i,τ between two consecutive points. More precisely, for a given row i = 1, . . . , 6400 and a given sensor τ = 1, . . . , 24: Finally, for a given row i = 1, . . . , 6400 and a given sensor τ = 1, . . . , 24, the formula for the fractal dimension z i,τ can be represented as: Note that d i,τ ≤ L i,τ , where both d i,τ and L i,τ are positive real numbers. Therefore, The expression log is zero if, and only if, the points s j i,τ , j = 1, . . . , ν are all aligned. In this case, the fractal dimension is exactly 1. When the ratio   With the fractal dimensions z i,τ of the time series in matrix Y in Equation (3), we build a new matrix Z as: Specifically: • z 1,1 in matrix Z in Equation (10) is the fractal dimension of the time series ν,1 ; • more generally, z i,τ is the fractal dimension of the time series and div and mod stand for the integer quotient and the remainder of an integer division, respectively.

Normalization and Classification Tools
Although matrix Z in Equation (10) is a matrix of elements generally between 1 and 2, the data is normalized by using column-wise scaling. This way, each column, and consequently each sensor, will have the same influence on the posterior analysis. Otherwise, the sensors closest to the source of the excitation and furthest from the structural damage could have a superior influence and make it difficult to detect the damage. Column-wise scaling is performed by subtracting the mean of each column to the elements on that column and dividing the same elements by the standard deviation of the column.
In this work, different classifiers have been used for the classification: k nearest neighbors (kNN) and support vector machines (SVM) with different kernels. In Sections 3.3.1 and 3.3.2 these methods will be briefly reviewed. Finally, it is important to note that 5-fold cross validation has been used to evaluate the classifier models.

k Nearest Neighbor
The k nearest neighbor (kNN) algorithm has been used since 1970. It is a classification algorithm that is used to make a prediction of a new observation based on the category of the k nearest neighbors. Two elements are key to this approach: (i) the one and only parameter k; and (ii) the distance measure [38].
The most commonly used distance measures in machine general are, in general, the Hamming distance, Euclidean distance, the Manhattan distance and the Minkowski distance. In this paper, the Euclidean distance is used.

Support Vector Machines (SVM)
SVM is a supervised machine learning algorithm that is used for classification purposes and it has been applied to a large variety of applications [39]. SVM are based on the simple idea of finding the hyperplane (or the decision boundary) that best divides the data into two classes. Figure 4a shows the illustration of three separating hyperplanes out of many possible. The goal is to choose a hyperplane with the widest margin to separate both classes, see Figure 4b. In this context, the margin is defined as the smallest distance between any of the samples and the hyperplane. The data points closest to the separating hyperplane are called the support vectors. These points will determine how wide the margin is. Let us consider a two-classes example, a training data set x 1 , . . . , x N , N ∈ N with corresponding binary target values {t 1 , . . . , t N } ⊂ {−1, 1}, where one class is labeled as red (corresponding to a positive target value 1) and the other one as blue (corresponding to a negative target value −1). Commonly, the hyperplane is expressed in the following form: where ω is the weight vector and b is the bias term. The canonical hyperplane is used in this paper, among all the possible descriptions. The canonical hyperplane satisfies: where x sv red and x sv blue represent the so-called support vectors (the closest samples with respect to the hyperplane) on the red and blue classes, respectively. The distance δ from the support vectors to the hyperplane is given by: ω .  Since the margin is twice the distance from the support vectors to the hyperplane, the margin will be 2 ω . As it has been said, the goal is to maximize the margin 2 ω , which is equivalent to minimizing the inverse function ω 2 . This is also equivalent to minimizing In order to find the extreme values of a function with multiple constraints, one possible approach is to use the Lagrange multipliers. With this approach, the previous minimization problem is re-expressed as: where α i , i = 1, . . . , N are the Lagrange multipliers. To find the extreme values, the partial derivatives with respect to ω and b are computed and equated to zero: Equation (12) shows that the weight vector ω is a linear combination of the training data set. Replacing Equations (12) and (13) into Equation (11), the minimization problem is uniquely expressed in terms of α i , x i and t i : After some simple manipulations, Equation (14) is now expressed as: As it can be clearly seen, the optimization problem depends only on the dot product of pairs of training data. However, frequently the data are not linearly separable. Therefore, the margin constraint cannot be satisfied for any ω and b. One possible solution is to allow some data points to violate the margin constraints (soft margin), but it is needed to assign them a cost. In this case, a penalty parameter C (box constraint) has to be considered to control the maximum penalty imposed on margin-violating observations, as well as slack variables ε i that controls the width of the margin. For the case of a linear kernel, dealing with a nonlinearly separable case can be generalized as: The constrained minimization problem in Equation (16) can be rewritten, using Lagrange multipliers, as: In many cases, even with a soft margin, the space is not linearly separable. In these cases, a transformation φ is used to transform the original training data to another space. As it was mentioned before, the optimization depends only in dot products. Therefore, the transformation φ is not needed. Instead, only the dot product is needed, renamed as the kernel function. In this work, we will used two kernel functions, quadratic kernel K q and Gaussian kernel K G , defined as: where γ is the so-called kernel scale.

Results
In this section, the results are organized as follows. First, the evaluation metrics used to assess the classification models are introduced and explained in Section 4.1. As it has been detailed in Sections 3.3.2-3.3.1, the classification models used in this work are kNN, quadratic SVM and Gaussian SVM. The results of the present approach using the fractal dimension to build the feature vector and kNN, quadratic SVM and Gaussian SVM are presented in Sections 4.2-4.4, respectively. Figure 5 presents a flowchart summarizing the proposed damage diagnosis strategy. In a nutshell, the fractal dimension is computed and normalized to each time series (per sensor) of the baseline data and machine learning models are trained. Finally, when new data from a structure to be diagnosed comes in, its fractal dimension is computed, normalized and finally the already trained kNN or SVM (quadratic or Gaussian) model is applied for the structural state classification.

Evaluation Metrics
Before the results are presented, in terms of multiclass confusion matrices, it is important to clearly describe the evaluation metrics that are used to assess the performance of each model. One of the most used metrics is the overall accuracy, which is defined as the number of correct predictions out of the total number of predictions. However, the overall accuracy alone does not always tell if a model performs satisfactorily or unsatisfactorily, especially if the test data are comprised of imbalanced classes. However, even in the case of balanced classes, with the information provided by the overall accuracy, it is not possible to completely know how to improve the model. The metrics used in this work are accuracy, precision, recall, F 1 -score and specificity. These metrics, for both the binary classification and multiclass classification problem will be defined shortly in the next paragraphs. Consider categorical labels when n ∈ N observations x 1 , . . . , x n have to be assigned into predefined classes C 1 , . . . , C , ∈ N. In a binary classification problem, each observation x i is to be classified into one, and only one, of two nonoverlapping classes (C 1 and C 2 , or positive and negative). However, in a multiclass classification problem, the input x i is to be classified into one, and only one, of nonoverlapping classes.

Metrics for a Binary Classification Problem
A confusion matrix is a table or matrix that summarizes the prediction results of a classification problem. It is not a metric itself but it helps to visually understand the metrics and types of errors the model is making. Table 1 represents the confusion matrix for the case of a binary classification problem, where two classes have been considered: positive and negative. The observations are distributed in two rows and two columns. The rows represent the actual classes, while the columns represent the predicted classes. The observations in the diagonal represent the correct decisions, while the elements in the antidiagonal represent the misclassifications.  The five metrics for the binary classification problem are then defined in Table 2 in terms of the elements of the confusion matrix. The F 1 score is a particular case of the F β score defined in [40] when β = 1.

. Metrics for a Multiclass Classification Problem
Metrics for a multiclass classification problem are based on a generalization of the metrics in Table  2 for many classes C i , i = 1, . . . , [41,42]. More precisely, with respect to the class C i , we define: • tp i as the true positive for C i , that is, the number of observations that belong to the class C i that are correctly labeled as C i ; • tn i as the true negative for C i , that is, the number of observations that do not belong to the class C i that are not labeled as C i ; • fp i as the false positive for C i , that is, the number of observations that do not belong to the class C i that are wrongly labeled as C i ; and • fn i as the false negative for C i , that is, the number of observations that belong to the class C i that are not labeled as C i . Table 3 presents the metrics for the evaluation of a multiclass classification problem. Although the quality of the overall multiclass classification is usually assessed in two ways: (i) macroaveraging; and (ii) microaveraging, Table 3 only considers the macroaveraging case, where all classes are treated equally, instead of microaveraging, where bigger classes are favored. Table 3. Metrics for the evaluation of multiclass classification problems, where is the number of classes.

Metric Formula
Finally, it is important to note that in the next subsections all the presented confusion matrices follow the next nomenclature. The rows represent the actual class and the columns represent the predicted class. Label 0 corresponds to the case when the structure is healthy; label 1 corresponds to the structure with a replica bar; label 2 corresponds to the structure with a 5 mm cracked bar; and label 3 corresponds to the structure with an unlocked bolt in the jacket.

Results of Fractal Dimension and kNN as Classification Method
As it has been said in Section 3.3.1, the one and only parameter of the kNN classifier is k, the number of neighbors. Table 4 shows the performance of the proposed approach using kNN as the classification method, in terms of the number of neighbors k. As described in Section 4.1.2, the metrics for the evaluation of this multiclass classification problem are the average accuracy, the average precision, the average recall, the average F 1 score and the average specificity. The best results for each metric have been highlighted in bold. The same results, as a function of the number of neighbors, are depicted in Figure 6. The case with the best performance corresponds to the case where the number of neighbors is k = 20. It can be observed that increasing further the number of neighbors does not increase the indicators' performance and only leads to a higher computational cost. Table  5 represents the confusion matrix for the best case (k = 20). In Table 4, the performance measures are presented using macroaveraging. However, in the confusion matrix in Table 5, precision and recall can be extracted for each class, separately. Similarly, Table 5 also presents the false negative rate (fnr)-defined as 1−tpr-and the false discovery rate (fdr)-defined as 1−ppv-. From this confusion matrix, it can be derived all the aforementioned metrics. In particular, it is noteworthy that an average accuracy of 96.9%, an average precision of 94.3% and an average specificity of 97.7% are obtained. Table 4. Performance measures (per-unit) for the kNN method using different number of nearest neighbors (k). The cases with the best performance of each measure are highlighted in bold.   Table 6 summarizes the performance, using macroaveraging, of the proposed approach using quadratic SVM as the classification method, in terms of the box constraint C and the kernel scale γ hyperparameters. More precisely, we combine the box constraint for C = 5, 10, 20, 30, 40 and 50 and the kernel scale for γ = 0.1, 0.2, 0.5, 1, 2, 5, 10, 15, 20, 30 and 50. The best results for each metric have been highlighted in bold. The same results, for a box constraint C = 30 and as a function of the kernel scale γ, are depicted in Figure 7. The case with the best performance corresponds to the case where the box constraint is C = 30 and the kernel scale is γ = 1. Table 7 represents the confusion matrix for this case, where it is worth remarking that an average accuracy of 98.4%, an average precision of 96.5% and an average specificity of 98.9% are obtained.  Table 6. Performance measures (per-unit) corresponding to the quadratic SVM strategy for the multiclass classification problem using different box constraints (C) and different kernel scales (γ). The cases with the best performance of each measure are highlighted in bold.

Results of Fractal Dimension and Gaussian SVM as Classification Method
As in Section 4.3, Table 8 summarizes the performance of the proposed approach using Gaussian SVM as the classification method, in terms of both the box constraint C and the kernel scale γ. More precisely, we combine the box constraint for C = 5, 10, 20, 30, 40 and 50 and the kernel scale for γ = 0.1, 0.2, 0.5, 1, 2, 5, 10, 15, 20, 30 and 50. The best results for each metric have been highlighted in bold. The same results, for a box constraint C = 50 and as a function of the kernel scale γ, are depicted in Figure 8. The case with the best performance corresponds to the case where the box constraint is C = 50 and the kernel scale is γ = 1. Table 9 represents the confusion matrix for this case. From the confusion matrix, it is worth remarking that an average accuracy of 98.7%, an average precision of 97.3% and an average specificity of 99.1% are obtained. Table 8. Performance measures (per-unit) corresponding to the Gaussian SVM strategy for the multiclass classification problem using different box constraints (C) and different kernel scales (γ). The cases with the best performance of each measure are highlighted in bold.

Brief Discussion
Sections 4.2-4.4 present an optimization of the model hyperparameters for the kNN, quadratic SVM and Gaussian SVM, respectively. In each subsection, the confusion matrix for the best (optimized) model is presented. In this subsection, the best models are compared among them. That is, a comparison among the kNN, quadratic SVM and Gaussian SVM methodologies is given. In particular, Figure 9 shows the accuracy, precision, recall, F 1 score and specificity measures for the best kNN, quadratic SVM and Gaussian SVM models. It is noteworthy that the Gaussian SVM accomplishes the highest performance for all the indicators. Thus, it is the recommended approach to be employed with the proposed SHM strategy. Finally, it is also important to note that the quadratic SVM has a close performance to the Gaussian SVM but the kNN falls far behind in all the indicators in general, and more markedly for the recall and F 1 score measures. Therefore, its use is inadvisable. As a final remark, the performance of the Gaussian SVM over the quadratic SVM may depend on the nature of the data or even on how this data is preprocessed and what features are extracted. In this sense, the exceeding performance of the Gaussian SVM has been reported in the literature as a machine learning model for the prediction of the viscosity of nanofluids [43] or, in the field of fault diagnosis, to get the operation status of a wind turbine [44].

Conclusions
In this work a proof-of-concept damage diagnosis strategy that can be deployed online and during the WT service has been stated. This main contribution of the paper is accomplished by using only the vibration-response accelerometer signals instead of the standard approach based on guided waves. Furthermore, the methodology is based on machine learning techniques. In this regard, the second main contribution of this work is to introduce the FD as a suitable feature to detect and classify different damage scenarios inspired by the physical insight that the different fractal structures of the accelerometer signals should be capable to discriminate different types of damage. Three supervised machine learning classifiers have been studied and optimized for the specific problem. Finally, the proposed methodology has been validated in an experimental laboratory test bed where for the best selected model (Gaussian SVM with box constraint C = 50 and kernel scale γ = 1) all the studied measures (average accuracy, average precision, average recall, average F 1 score and average specificity) have attained values higher than 97%. These results encourage future work in this area of research to develop further this proof-of-concept. More tests including changing the damage location and taking into account and dealing with variable environmental operating conditions, including waves, will be the focus of future work.
Author Contributions: All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding: This research has been partially funded by the Spanish Agencia Estatal de Investigación (AEI)-Ministerio de Economía, Industria y Competitividad (MINECO), and the Fondo Europeo de Desarrollo Regional (FEDER) through the research project DPI2017-82930-C2-1-R; and by the Generalitat de Catalunya through the research project 2017 SGR 388.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.