Research on Wind Turbine Fault Detection Based on the Fusion of ASL-CatBoost and TtRSA

The internal structure of wind turbines is intricate and precise, although the challenging working conditions often give rise to various operational faults. This study aims to address the limitations of traditional machine learning algorithms in wind turbine fault detection and the imbalance of positive and negative samples in the fault detection dataset. To achieve the real-time detection of wind turbine group faults and to capture wind turbine fault state information, an enhanced ASL-CatBoost algorithm is proposed. Additionally, a crawling animal search algorithm that incorporates the Tent chaotic mapping and t-distribution mutation strategy is introduced to assess the sensitivity of the ASL-CatBoost algorithm toward hyperparameters and the difficulty of manual hyperparameter setting. The effectiveness of the proposed hyperparameter optimization strategy, termed the TtRSA algorithm, is demonstrated through a comparison of traditional intelligent optimization algorithms using 11 benchmark test functions. When applied to the hyperparameter optimization of the ASL-CatBoost algorithm, the TtRSA-ASL-CatBoost algorithm exhibits notable enhancements in accuracy, recall, and other performance measures compared with the ASL-CatBoost algorithm and other ensemble learning algorithms. The experimental results affirm that the proposed algorithm model improvement strategy effectively enhances the wind turbine fault detection classification recognition rate.


Introduction
With the rapid development of the global economy, the scale of demand for energy continues to expand. Traditional thermal power generation methods are highly prone to causing environmental pollution and they do not meet the requirements of sustainable development. Green power generation methods such as wind power generation are in line with the future development direction of the energy industry. With the development of Industry 4.0, the global installed capacity of wind turbines is expected to reach two billion kilowatts by 2030 [1]. Promoting the development of renewable energy will not only meet the energy needs of economic development, but also reduce the proportion of traditional thermal power generation methods, and accelerate the construction of a clean, low-carbon, energy-efficient system [2]. However, with the continuous expansion of wind power generation, and an increase in operational lifespan, the issue of turbine faults has become increasingly prominent, posing a series of challenges to the wind power industry. The conventional maintenance approach for wind turbine units is typically based on scheduled inspections and fault responses, which presents several issues. Firstly, scheduled inspections often fail to accurately predict the occurrence time and types of turbine faults, which can result in unnecessary maintenance and high costs. Secondly, maintenance carried out in response to faults is often conducted after the occurrence of the faults, potentially leading to the prolonged downtime of the units and a reduction in production capacity. Additionally, the remote geographical locations where wind turbine units are typically installed pose significant challenges and high costs for maintenance, limiting their reliability and maintainability. Therefore, the development and implementation of wind turbine fault detection systems are of great significance. By leveraging advanced sensor technologies, data analysis, and machine learning algorithms, real-time monitoring of the operational status and performance parameters of the units can be achieved, enabling the timely detection of potential faults and abnormalities [3]. Accurate fault detection can help reduce maintenance costs, improve maintenance efficiency, and optimize operations and equipment performance [4,5]. Moreover, by enhancing the accuracy and timeliness of fault detection, the safety of wind turbine units can be enhanced, reducing the risk of accidents and promoting the overall sustainable development of the wind power industry.
Artificial intelligence and machine learning have unique advantages in the field of wind turbine fault detection [6][7][8][9]. In order to improve the detection speed and accuracy of wind turbine faults, a novel dynamic model sensor method is proposed for SCADA data-based wind turbine fault detection. A dynamic model representing the relationship between the generator temperature, wind speed, and ambient temperature is constructed using the first principles, and it is used as the basic structure of the model sensor. When the model sensor is applied for fault detection, its parameters are updated regularly using the generator temperature, wind speed, and ambient temperature data from the SCADA system. Then, from the updated model, the fault sensitive features of the wind turbine system are extracted by performing system frequency analysis for use in turbine fault detection [10]. Aziz, U [11] used a realistic framework for SCADA data simulation by critically comparing power-based wind turbine fault-detection methods. Song [12] proposed the use of an improved denoising autoencoder to detect wind turbine rolling bearing faults. Liu [13] proposed a twin neural network method that allows the algorithm to achieve wind turbine fault detection with only a small amount of training data. In order to solve the problem of inaccurate and untimely fault detection caused by wind turbine data features, Liu [14] proposed a new deep network called the Deep Residual Network (DRN) for the fault detection of wind turbines. The results indicate that the proposed DRN achieves a better performance and outperforms some published fault detection methods. However, none of the aforementioned studies considered the impact of the imbalanced positive and negative samples in wind turbine fault detection datasets on the accuracy of algorithms, leaving room for further improvement in the field concerning the detection accuracy of faults.
Fault detection models typically contain many hyperparameters that are not learned from data, but are manually set by the user. These hyperparameters play a crucial role in determining the performance and behavior of the model. However, finding the optimal combination of hyperparameters can be a challenging and time-consuming task. Swarm intelligence optimization algorithms can search for the optimal solution distributed in a certain range of space, and they have good parallelism and autonomous exploration, which is of significance for the hyperparameter optimization of fault detection models [15][16][17]. Lei [18] analyzed a fault detection model based on long short-term neural networks and Bayesian optimization algorithms, and they applied the model to the fault warning of the induced draft fan of a coal-fired power plant, achieving good results. Zhang [19] used the Whale Optimization Algorithm in order to find the global optimal solution, realize hyperparameter optimization of the BiLSTM network, improve the prediction accuracy of wind power generation, and save a significant amount of time on debugging. Huang [20] proposed a crawling animal search algorithm based on the interactive cross strategy of Levy flight, and they verified its effectiveness in practical engineering via innovations such as the welding beam design.
Based on the above research results and their shortcomings, the following work was conducted in this paper. Firstly, the accuracy and recall rate of the CatBoost algorithm to achieve fault detection is not high enough, and there is an imbalance between positive and negative samples in the wind turbine fault detection dataset. The cross-entropy loss function of the CatBoost algorithm was thus replaced by an asymmetric loss function, and the ASL-CatBoost algorithm was proposed to implement wind turbine icing fault detection and capture the state information of wind turbine faults. To verify the effectiveness of the improved method, experiments were conducted using integrated learning algorithms (Cat-Boost, XGBoost, LightGBM, etc.), machine learning algorithms (SVM), and deep learning algorithms (LSTMAE) as comparison algorithms for the wind turbine icing fault dataset. The experimental results show the superior performance of the improved algorithms. Secondly, due to the difficulty in setting appropriate hyperparameters in the ASL-CatBoost algorithm, in this paper, we propose an improved reptile search algorithm based on the tent chaotic map and t-distribution mutation strategy to optimize hyperparameters such as learning rate, iteration number, and the tree depth of the ASL-CatBoost algorithm. The improved algorithm was also proposed in order to introduce the optimized hyperparameters into the ASL-CatBoost algorithm for model training. To verify the optimization ability of the improved reptile search algorithm, the TtRSA algorithm was compared with classical population intelligence optimization algorithms such as PSO, WOA, and SSA regarding 11 benchmark functions. The experimental results show that the improved reptile search algorithm has better performance, convergence speed, and accuracy. Finally, the TtRSAoptimized ASL-CatBoost algorithm has a higher detection accuracy and detection efficiency than the original ASL-CatBoost algorithm.

CatBoost Algorithm
In 2017, Yandex proposed a new integrated learning algorithm called CatBoost [21]. This algorithm is an improvement on Gradient Boosting Decision Trees, and it outperforms other algorithms in the same GBDT [22] framework, such as XGBoost [23] and Light-GBM [24], in terms of model accuracy. The main innovation of the CatBoost algorithm is the use of Ordered Boosting instead of the traditional gradient estimation method, which solves the problems of Gradient Bias and Prediction Shift. In addition, the algorithm uses Oblivious trees as base models, which improves the model's ability to classify correctly, and it takes into account its generalization ability, effectively preventing algorithm over fitting.
The Gradient Boosting Decision Tree [25] algorithm uses One-hot encoding during the category encoding process. However, when the data dimension is high, the problem of dimension explosion may arise. To address this issue, CatBoost has designed a method called Ordered Target Statistics. This method first randomly arranges all data samples S = {(X_1, Y_1), (X_2, Y_2), (X_3, Y_3), ···, (X_n, Y_n)} to generate multiple sets of random sequences. During the training process, the average label value replaces the category for a particular feature sequence. Assuming that σ = (σ_1, σ_2, σ_3,..., σ_n) is the reordered sequence of the dataset, the k feature x ik of the i sample in the original dataset can be represented by σ, as shown in Equation (1) [26]. This method can convert categorical features into numerical features, reduce computational complexity, and minimize information loss.
After converting categorical features into numerical features using the Ordered Target Statistics method, feature interactions may be affected because numerical features cannot be effectively cross-matched. CatBoost uses a greedy strategy to perform feature interactions. During the first split of the tree generation, CatBoost does not use any cross-features. In subsequent splits, CatBoost uses all of the original features and cross-features that were used to generate the tree, as well as all categorical features in the dataset, to perform feature interactions.
During the model training process using XGBoost and LightGBM algorithms, we found that the model can fit well with F1 wind turbine data, but its fitting effect was poor when testing F2 wind turbine data. Based on this, the CatBoost algorithm proposed the idea of Ordered Boosting, which can effectively reduce the error of gradient estimation and alleviate the problem of prediction shift.
The CatBoost algorithm uses a symmetric binary tree as the base model, and this tree's structural constraint has a certain regularity effect. For the prediction process of the CatBoost algorithm, the splitting of each feature is independent and not sequential. Multiple samples can be predicted together, improving the prediction speed of the CatBoost algorithm.

Introduction to ASL-CatBoost Algorithm
The icing fault detection of wind turbines is a typical imbalanced data classification problem. During the entire lifecycle of a wind turbine's operation, fault data only accounts for a very small portion of that operation, which can easily cause the model to be greatly affected by normal data, and it can make it difficult to improve the detection accuracy of fault data. The default cross entropy loss function of the CatBoost algorithm is not good at dealing with the problem of unbalanced positive and negative samples in the dataset. To solve the problem of unbalanced positive and negative samples in the dataset, He Kaiming and others proposed Focal Loss [27], as shown in Equation (2), where P t is the probability that the prediction sample is a positive sample and γ is a weight parameter. However, in the actual scenario application process, the author found that the accuracy of the loss function was not high enough.
Therefore, this paper proposes an improved asymmetric loss function based on the focal loss function and considers the application of ASL for the CatBoost algorithm. The main innovations of the asymmetric loss function are as follows: (1) As shown in Equation (3), the asymmetric loss function focuses on the γ Parameter decoupling to γ+ and γ−. The loss weights of positive samples and negative samples are adjusted using the asymmetric focusing method to reduce the impact of negative samples and simple samples on the loss function and help the model better learn meaningful features in positive samples and difficult to detect samples.
(2) In order to reduce the contribution of negative samples to the loss function as much as possible, with high confidence, Asymmetric Loss proposed a probability transfer mechanism to process the hard threshold of negative samples with high confidence. As shown in Equation (4), m ≥ 0 is an adjustable hyperparameter, which is generally set to 0.2. When the predicted probability Pt of positive samples is less than the set hyperparameter m, it indicates that the current sample has a high probability of being a negative sample. Therefore, the probability of predicting the sample as a positive sample can be directly set to 0, and the probability result of predicting the sample as a positive sample can be returned in P m .
After completing the above two improved methods, the Asymmetric Loss expression was finally obtained, as shown in (5). In conclusion, this paper replaces the default cross entropy loss function of the Cat-Boost algorithm with Asymmetric Loss, and it proposes the ASL-CatBoost algorithm, which makes the algorithm more sensitive to fault data. The Asymmetric Loss function has three adjustable parameters, namely, γ + , γ − , and m. During the process of detecting the icing fault of wind turbines using the ASL CatBoost algorithm, the author found that it is more appropriate to set γ + as 2, γ − as 3, and m as 0.3. The model training was conducted under this super parameter. Since the prediction probability is a value between 0-1, the value after the 3rd power is less than the value after the 2nd power, therefore, the impact of negative samples on the Loss function will be reduced. At the same time, for a negative sample, if the predicted result is 0.1, the confidence level of the negative sample is very high. The Loss function will conclude that it is a negative sample, and the weight influence of the sample on the Loss function is 0. Therefore, the improved ASL CatBoost algorithm will focus on training difficult to detect samples and positive samples to improve the detection accuracy of fault data. The feasibility of this algorithm was verified in Section 4.3 of the article.

Introduction to the Reptile Search Algorithm
Setting the ASL-CatBoost algorithm training hyperparameter, as previously proposed, has a great impact on the accuracy and efficiency of the algorithm's fault detection abilities. In order to find the optimal parameter combination and reduce the impact of human factors on the accuracy of the algorithm, an improved Reptile Search Algorithm is proposed to optimize the hyperparameter of the ASL-CatBoost algorithm and improve the fault detection speed and detection accuracy of ASL-CatBoost algorithm.
In 2021, Laith Abualigah proposed a meta-heuristic optimizer called the Reptile Search Algorithm (RSA) [28]. The main function of this algorithm is to simulate the hunting behavior of crocodiles. The two main features of crocodile behavior in the algorithm are considered to be 'rounding up' and hunting; switching between these two different behaviors is affected by the current number of iterations and the maximum number of iterations. When the current number of iterations is t ≤ T/2, the encirclement strategy is executed; when t > T/2, the hunting phase is performed. The round-up process also includes two steps: high-altitude walking or belly walking. Hunting is achieved through hunting coordination or hunting cooperation. The specific process of the algorithm is as follows: (1) Initialization phase In RSA, the optimization process starts with a set of candidate solutions, and in each iteration, the optimal solution obtained is considered to be close to the optimal value. Among them, X is a randomly generated set of candidate solutions, as shown in Equation (6).
In the equation, X i, j represents the position of the i-crocodile individual in the j dimension, N is the number of candidate solutions, n is the dimension of the given problem, rand belongs to the random function in the interval [0, 1], and LB and UB represent the given lower and upper bounds of the problem.
(2) Encirclement stage When t ≤ T/2, the algorithm is in the early stage of its iteration, where the crocodile population searches globally and enters the bounding phase. When t ≤ T/4, the crocodile population adopts a high-altitude walking strategy, and when T/4 < t ≤ T/2, the crocodile population implements an abdominal walking strategy. The position update equation for the crocodile population during the encirclement exploration phase is shown in Equation (7).
In the equation, Best j (t) represents the position of the optimal solution at the current moment, t is the current number of iterations, T is the maximum number of iterations, and η (i,j) (t) represents the hunting behavior of the i candidate solution in the j dimension's operator; the calculation is shown in Equation (8). β is a sensitive parameter which controls the exploration accuracy of the encirclement stage during the iterative process, and it is fixed at 0.1. R (i,j) (t) is a reduction function used to reduce the search area value, and it is calculated using Equation (9). r1 is a random integer between (1, N), x (r1,j) , indicating the j dimension position of the r1 random candidate solution. N is the number of candidate solutions and evolution factor ES(t) is a probability ratio. During the entire iteration process, the value randomly decreased between 2 and −2, and it was calculated using Equation (10).
In the equation, is a very small positive number, r2 is a random integer of [1, N], r 3 represents a random integer between [−1, 1], and P (i,j) represents the percentage difference between the optimal solution and the j dimension position of the current solution, calculated as shown in Equation (11).
M (x i ) represents the average position of the i candidate solution, and its calculation is shown in Equation (12). UB (j) and LB (j) represent the upper and lower bounds of the j dimensional position, respectively. α is a sensitive parameter used to control the search accuracy of hunting cooperation during the iteration process (the difference between candidate solutions), which is fixed to 0.1 in this paper.
(3) Hunting stage When T/2 < t, the population has entered a later stage of iteration, and the crocodile population enters the hunting stage. In this mode, when T/2 < t ≤ 3T/4, crocodiles perform hunting coordination. When 3T/4 < t ≤ T, crocodiles perform hunting cooperation. The relevant equation is shown in Equation (13).

Improvement Strategy of TtRSA
As mentioned above, the initial positions of RSA crocodile individuals are randomly generated within the search space, and this randomness makes it difficult for the population to obtain a more uniform distribution of initial positions. An uneven distribution of the population may increase the severity of an individual's blind spot and reduce the population's diversity. In addition, team cooperation, the search range, and the hunting mechanism of the crocodile population are all updated in terms of the current optimal value, and the individual's iterative update process lacks mutation mechanisms. If the current optimal individual falls into a local optimum, the population may quickly converge within a short period, resulting in the algorithm being unable to break free from the constraints of the local extreme value. To address the shortcomings of the RSA, this paper considers introducing the Tent chaotic mapping and t-distribution mutation strategy to improve the RSA.

Tent Chaotic Mapping
In response to the problem of uneven population distribution caused by the random initialization of the RSA algorithm, this article introduces Tent chaotic mapping to solve this problem. The Tent chaotic map is a method of implementing chaos control using the tent function as the control function. By introducing the Tent chaotic map to generate pseudorandom numbers to initialize the RSA crocodile population, the traversal of the pseudorandom numbers enables the population to be more evenly distributed throughout the entire search space. This is beneficial for reducing the 'blind areas' of crocodile individuals, thus allowing individuals to quickly find better solutions, which improves the convergence speed of the algorithm. Chaotic maps have characteristics such as randomness, traversal, and order, and they can be used to increase the diversity of the population, accelerate the convergence speed of the algorithm in the early stages, and different between chaotic map operators that have different optimization effects. Among them, the Tent chaotic map can produce a uniform chaotic sequence through mapping within the range of (0, 1), and thus, applying the Tent chaotic map to population initialization can increase the diversity of the algorithm population and improve its global optimization ability. The relevant equation is shown in Equation (14).
In the equation, α ∈ (0,1) is the chaos parameter, h n is a random number within the range of 0 to 1, and n is the chaos variable index. The equation for generating the RSA crocodile population using the Tent chaotic map function is shown in Equation (15). Figure 1a shows the population distribution based on random initialization, and Figure 1b shows the population distribution based on Tent chaotic mapping. It can be observed that in the two-dimensional space, although the population generated by Tent chaotic mapping does not have the same level of randomness as the population generated by the rand function, the individual position distribution is more uniform and there are no overlapping points or small search blind spots; this can improve population diversity and enable the population to quickly find optimal solutions. The frequency histogram of the population distribution is shown in Figure 2.

t-Distribution Mutation Strategy
The t-distribution is a probability distribution commonly used for parameter estimations and hypothesis testing in situations with small sample sizes. It was proposed by British statistician William Gosset in 1908. The shape of the t-distribution is determined by the degrees of the freedom parameter, where t(n = 1) → N (0,1) and t(n→∞) → C (0,1), where N (0,1) is the normal distribution and C (0,1) is the Cauchy distribution, which are two boundary cases of the t-distribution.
With the development of intelligent optimization algorithms, introducing Gaussian and Cauchy mutations has been proven to effectively improve the algorithm's ability to search the population and escape local optima. In the early stages of algorithm iteration, the degree of the freedom parameter n is set to a small value, and the t-distribution tends towards the Cauchy distribution, which can effectively increase the diversity of the population and improve the algorithm's global search ability. As the algorithm iterates during later stages, the degree of the freedom parameter n gradually increases, and the tdistribution tends towards the Gaussian distribution, which narrows the population search range and can effectively improve the algorithm's ability to explore the local space. In the RSA, the expression for the effect of the t-distribution mutation is shown in Equation (16).
The equation can be expressed as follows: X j news is the position of the best solution in the j dimension after adaptive t-distribution mutation perturbation, X j best is the position of the best solution in the j dimension before mutation perturbation, and TD(n) represents the t-distribution with a degree of freedom of n.
Sensors 2023, 23, x FOR PEER REVIEW enable the population to quickly find optimal solutions. The frequency histog population distribution is shown in Figure 2.  enable the population to quickly find optimal solutions. The frequency histogram of the population distribution is shown in Figure 2.

t-Distribution Mutation Strategy
The t-distribution is a probability distribution commonly used for parameter estimations and hypothesis testing in situations with small sample sizes. It was proposed by British statistician William Gosset in 1908. The shape of the t-distribution is determined In summary of the above, we propose to improve RSA by using the Tent chaos mapping and t-distribution mutation strategy to address the problems of the initialized population of the RSA algorithm; for instance, the population is not uniformly distributed and easily falls into a local optimum during iteration. Based on the above proposed improvements, the TtRSA algorithm was ultimately suggested. Section 3 validated the feasibility of the improvement strategy of the TtRSA algorithm based on 11 benchmark test functions.

Experimental Design and Test Functions
Experimental Setup: The experiments were conducted on a computer system with the Windows 11 operating system, AMD R7 5800H 3.2GHz processor, and 16 GB of RAM. MATLAB R2022a was used for conducting the experiments. To evaluate the optimization performance of the improved TtRSA algorithm, 11 benchmark test functions were selected, including both unimodal and multimodal functions, that could evaluate the optimization performance of the algorithm for different types of problems. Among them, functions f1-f5 are continuous unimodal functions that are often used to test the optimization accuracy of search algorithms. Functions f9-f13 are multimodal test functions that can evaluate the convergence speed and accuracy of the algorithm and function f15 is a typical fixeddimension multimodal function, commonly used to test the algorithm's ability to escape local optima. The relevant information concerning the benchmark test functions is shown in Table 1.

Improvement Analysis of Optimization Algorithm Performance
In this section, the particle swarm optimization algorithm (PSO) [29], whale optimization algorithm (WOA) [30], chimpanzee optimization algorithm (CHOA) [31], and RSA [32] were used as benchmark algorithms to compare with the optimization performance of the improved TtRSA. The experimental settings included a population size of N = 30, a spatial dimension of D = 30, and a maximum number of iterations of T = 1000. Each algorithm was independently run 30 times with the test functions, and the average result of the 30 runs was taken as the final result.
Based on the comparison results in Table 2, it can be observed that under the same constraints, for the single-peaked test functions f1-f5, the optimization results of the TtRSA were several orders of magnitude (or even several tens of orders of magnitude) higher than those of other improved algorithms. Moreover, f1-f4 were able to converge to the theoretical optimal value of 0. For the complex multi-peaked test functions f9-f13, the optimization results of TtRSA were also better than those of other improved algorithms, and f9 and f11 were able to find the optimal value of 0. For the fixed-dimension multi-peaked test function f15, TtRSA was able to generally converge to the vicinity of the theoretical optimal value. The overall optimization performance of TtRSA was excellent for all 11 benchmark test functions, whether single-peaked test functions or complex multi-peaked test functions. This demonstrates the outstanding stability and robustness of TtRSA, and it proves that the TtRSA algorithm, which integrates multiple strategies, has strong global exploration and local development capabilities.

Convergence Performance Analysis of the Improved Optimization Algorithm
To visually and intuitively compare the convergence of algorithms in the function optimization process, the convergence curves were analyzed, as shown in Figure 3, where the vertical axis represents the fitness value of the corresponding function, and the horizontal axis represents the number of iterations of the optimization algorithm. Figure 3a-e show the running results of five optimization algorithms on a unimodal function. It is evident that the convergence curve of the TtRSA algorithm decreases faster than the other 4 algorithms, whereas the convergence curves of the remaining four algorithms all exhibit varying degrees of stagnation, indicating a lower optimization accuracy. This suggests that applying the improved Tent chaotic mapping strategy to initialize the population increases the diversity of the crocodile population; this makes the initial solution distribution more uniform, and it indicates that the algorithm can find the optimal solution quickly and more easily. Figure 3f-h show the convergence curves of the five optimization algorithms on a multimodal function. It is evident that in the first stage, the TtRSA algorithm's convergence speed is significantly faster, further demonstrating the effectiveness of the improved Tent chaotic mapping and t-distribution mutation strategy; this changes the crocodile population's search step and greatly improves the optimization accuracy and speed of the RSA [33].
Regarding the above experiments, it is evident that the proposed method, based on the Tent chaotic mapping and t-distribution mutation strategy, can effectively solve the effects of uneven population distribution and the difficulties in jumping out of local optima during the initialization of the RSA algorithm. The TtRSA algorithm has significant advantages over the RSA and the other four algorithms in terms of optimization precision, convergence speed, and ability to escape from local optima. Section 4.5 verifies the feasibility of optimizing the hyperparameters of the ASL-CatBoost algorithm based on the TtRSA algorithm to achieve ice fault detection in wind turbines.  Regarding the above experiments, it is evident that the proposed method, based on the Tent chaotic mapping and t-distribution mutation strategy, can effectively solve the effects of uneven population distribution and the difficulties in jumping out of local optima during the initialization of the RSA algorithm. The TtRSA algorithm has significant advantages over the RSA and the other four algorithms in terms of optimization precision, convergence speed, and ability to escape from local optima. Section 4.5 verifies the feasibility of optimizing the hyperparameters of the ASL-CatBoost algorithm based on the TtRSA algorithm to achieve ice fault detection in wind turbines.

Ice Fault Detection Experiment for Wind Turbines
Prognostics and health management are crucial for the lifecycle monitoring of equipment, especially complex equipment such as wind turbines that operate in harsh environments. Improving the speed and accuracy of fan fault detection can reduce maintenance costs and optimize work efficiency. This section aims to verify the effectiveness of the proposed ASL-CatBoost fault detection algorithm and TtRSA with regard to the application of wind turbine fault detection.

Wind Turbine Icing Fault Dataset
The fault detection experiment dataset used in this paper uses the SCADA system data information of two three-bladed wind turbines, F1 and F2, provided by Goldwind, under real operating conditions. There are three state modes in the dataset: icing fault, normal state, and invalid state (wherein it is difficult to determine the type of state). The dataset contains 27 feature dimensions in total, the time span of the F1 wind turbine is two months, and the time span of the F2 wind turbine is one month. The dataset provided the normal operation data and data concerning the specific time period wherein the wind turbine failed due to blade icing. Some dataset examples are shown in Table 3. The time periods of normal operation and icing faults in the dataset are provided in separate Excel files, as shown in Table 4. Therefore, it is necessary to annotate the dataset based on specific state time periods. The Python 'append()' method can be used for annotation, where normal operation data are labeled as 0, and fault operation data are labeled as 1. Data analysis revealed the presence of a few unannotated invalid data points in the dataset, which have unknown operating states. These unannotated data points can negatively impact the accuracy of model training and increase computational overheads. Therefore, they should be removed during the data preprocessing stage.

Evaluating Indicator
Fault detection is a typical binary classification problem. The Confusion matrix is often used to measure the accuracy of the classifier, as shown in Table 5 below. The icing data are 1, and the normal data are 0. TP is the true example, representing both the diagnostic category and the actual category as icing data. FN is a false negative case, representing the diagnostic category, Normal data, and the actual category is icing data. FP is a false positive example, indicating that the diagnostic category is icing data when the actual category is Normal data. TN is a true negative example, indicating that both the diagnostic category and the actual category are normal data. In accordance with the Confusion matrix, three evaluation indicators are extended: Precision, Recall, and F1 score. As shown in Equation (17), Precision refers to the proportion of the number of correctly predicted fault samples identified by the algorithm to the total number of predicted fault samples. As shown in Equation (18), the recall rate (Recall) refers to the proportion of the number of correctly predicted fault samples identified by the algorithm to the total number of true fault samples. As shown in Equation (19), the F1 score is the harmonic mean of accuracy and recall. The higher the values of the above three indicators, the better the algorithm performance.
To more intuitively demonstrate the advantages and disadvantages of the algorithm in terms of classification problems, this article also introduces the Receiver Operating Characteristic Curve (ROC) to evaluate the performance of the classification model, as shown in Figure 4. ROC is a curve that visually describes the true positive rate and false positive rate of a classification model based on different thresholds. The horizontal axis of the ROC curve is FPR, which represents the false positive rate, and the vertical axis is TPR, indicating sensitivity. The data points of the ROC curve are calculated using the TPR and FPR values obtained from the classification model at different thresholds. On the ROC curve, it is generally hoped that the curve will be closer to the upper left corner because at this point, the true probability (TPR) is high, whereas the false probability (FPR) is low, indicating that the classification model performs better. Usually, the better the performance of a classifier, the larger the Area Under Curve (AUC) below the ROC. The range of AUC values is 0.5 to 1, where 0.5 represents a completely random classification effect and 1 represents a perfect classifier. Therefore, both ROC and AUC can be used to evaluate the performance of classification models; when ROC is closer to the upper left corner and AUC values are closer to 1, it indicates that the model's performance is better.

ASL-CatBoost Experiment
This chapter's experiment aims to demonstrate the effectiveness of the proposed ASL-CatBoost algorithm. During the training process of the wind turbine fault detection algorithm, based on the ASL-CatBoost algorithm, the F1 wind turbine dataset was used

ASL-CatBoost Experiment
This chapter's experiment aims to demonstrate the effectiveness of the proposed ASL-CatBoost algorithm. During the training process of the wind turbine fault detection algorithm, based on the ASL-CatBoost algorithm, the F1 wind turbine dataset was used as the training set and validation set. In order to prevent the model from overfitting, a 10-fold cross-validation method was used to improve the generalization ability of the model during the training process, and the optimal model was retained after the training was completed. To test the performance of the fault detection algorithm, the F2 wind turbine data was used as the test dataset, which included 10,638 fault data and 168,930 normal data.
This section's aim is to verify the effectiveness of the improved ASL-CatBoost algorithm. Using the F1 wind turbine icing dataset as the training set, to prevent overfitting of the model, a 10-fold cross validation method was used during the training process to enhance the model's generalization ability. To test the performance of the fault detection algorithm, the F2 wind turbine data in the second section were used as the test dataset, which included 10,638 fault data and 168,930 normal data. The ASL-CatBoost algorithm model, as well as classic machine learning algorithm models, such as the GBDT and Deep Learning model (LSTMAE), were used for comparative experiments. The training and testing datasets used for each algorithm were the same, and default parameters were used for hyperparameters. The fault detection performance of different algorithms is shown in Table 6. To verify the effectiveness of the two improvement methods of the loss function in this paper, ablation experiments were conducted. The CatBoost 1 algorithm used the asymmetric focusing strategy of Equation (3) to complete the decoupling of the loss weight γ parameters in the focal loss function, and CatBoost 2 only refers to Equation (4) for the hard thresholding of negative samples with high confidence. The experimental results show that the two improved strategies exhibit certain improvements, with regard to various evaluation indicators, compared with the initial CatBoost algorithm. Overall, the improved ASL-CatBoost algorithm in this article improved the recall rate by approximately 1% and it improved the accuracy and F1 score by approximately 2%, as compared with the original algorithm. Compared with the LSTMAE model, it improved the accuracy by 9% and the recall rate by 6%. The experimental results validate the feasibility of the improved algorithm.

TtRSA Optimized ASL-CatBoost Algorithm Process
The ASL-CatBoost algorithm is greatly affected by hyperparameters, and the artificially set hyperparameter may not achieve the optimal effect during the algorithm training process. Therefore, this section proposes to use the improved TtRSA optimization algorithm to optimize the ASL-CatBoost algorithm with hyperparameters. The specific steps are as follows, and the process is shown in Figure 6.
Step 1: Data preprocessing. There are issues with missing samples and the incomplete labeling of sample labels in the icing fault data of wind turbines. It is necessary to preprocess the dataset to ensure that it meets the training requirements. The ASL-CatBoost algorithm is greatly affected by hyperparameters, and the artificially set hyperparameter may not achieve the optimal effect during the algorithm training process. Therefore, this section proposes to use the improved TtRSA optimization algorithm to optimize the ASL-CatBoost algorithm with hyperparameters. The specific steps are as follows, and the process is shown in Figure 6.
Step 1: Data preprocessing. There are issues with missing samples and the incomplete labeling of sample labels in the icing fault data of wind turbines. It is necessary to preprocess the dataset to ensure that it meets the training requirements.
Step 2: Dataset partitioning. Divide the preprocessed dataset and determine the training, testing, and validation sets for the ASL-CatBoost algorithm.
Step 3: Set model parameters. Set the crocodile population size N and the maximum number of iterations T. In accordance with Equation (14), the Tent chaotic map is used to randomly initialize the individual positions of the crocodile population, i = 1, 2,..., N. Let the parameter t of the current number of iterations = 1. Set the value range of the maximum number of iterations of the CatBoost algorithm decision tree to (100,2000). Set the value range of the learning rate to (0, 0.2). Set the value range of l2_leaf_reg to (0, 10) and the depth range of the tree to (0, 16).
Step 4: Calculate the fitness values of all crocodile individuals and save the current optimal crocodile individual position X Best .
Step 5: Determine whether t ≤ T/2 is true, and if it is true, use Equation (7) to implement the encirclement mechanism. When t ≤ T/4, implement the high-level walking strategy to update the individual crocodile position; when T/4 < t ≤ T/2, implement the abdominal crawling strategy to update the individual crocodile position. If t > T/2, use Equation (13) to implement the hunting mechanism. When T/2 < t ≤ 3T/4, execute the hunting coordination strategy to update the individual position of the crocodile; when 3T/4 < t ≤ T, execute the hunting cooperation mechanism. The policy updates the position of the crocodile individual.
Step 6: Use Equation (16) to perturb the t-distribution mutation strategy on some crocodile individuals and compare the fitness value of the crocodile individual after the updated position with the original individual. Reorder the crocodile individuals in accordance with fitness value and retain the current optimal fitness Degree value X Best .
Step 7: Set t = t + 1 to judge whether the current termination condition is satisfied, that is, whether the maximum number of iterations of the algorithm Itermax has been reached. If the maximum number of iterations of the algorithm has been reached, the currently saved optimal individual fitness value of the crocodile and the best parameter X Best are outputted, and the algorithm ends, otherwise, go to step 4.
Step 8: After the cycle ends, the obtained global optimal result, that is, the optimal hyperparameters of the ASL-CatBoost, may be substituted into the algorithm for model training, and the optimization effect may be tested.

Experiment for Optimizing ASL-CatBoost with TtRSA
In this section, the ASL-CatBoost is trained using the hyperparameter that was optimized by TtRSA. The experimental dataset and experimental equipment are the same as those in Section 4.3. Table 7 introduces the ASL-CatBoost algorithm obtained through optimization and the hyperparameter related to the comparison algorithm. Table 7. Algorithm-related parameters after the optimization of the TtRSA.

Experiment for Optimizing ASL-CatBoost with TtRSA
In this section, the ASL-CatBoost is trained using the hyperparameter that was optimized by TtRSA. The experimental dataset and experimental equipment are the same as those in Section 4.3. Table 7 introduces the ASL-CatBoost algorithm obtained through optimization and the hyperparameter related to the comparison algorithm. Table 7. Algorithm-related parameters after the optimization of the TtRSA.

Method
Hyper The experimental results of the optimized fault detection algorithm, based on the TtRSA algorithm, are shown in Table 8. From the table, it is evident that the ASL-CatBoost algorithm, optimized using TtRSA, has a better detection accuracy and recall rate throughout the whole dataset than the ASL-CatBoost algorithm with manually set hyperparameters. Regarding Table 6, it is evident that the accuracy and recall of machine learning algorithms (LightGBM, SVM, etc.) and the Deep Learning algorithm (LSTMAE) are improved after TtRSA optimization. The results demonstrate the effectiveness of the hyperparameter search for fault detection algorithms based on the TtRSA algorithm. Moreover, the training time complexity analysis of each algorithm is given in Table 4, which shows that the improved TtRSA-ASL-CatBoost algorithm has significantly less training time than the other algorithms. This is due to the fact that the algorithm obtains a higher accuracy faster at a lower number of iterations after hyperparameter optimization. For fault data detection, each algorithm can detect in time, therefore, the detection speed is negligible.

Enhanced Model Robustness
The robustness of the fault detection algorithm in different scenarios can be improved by adjusting the hyperparameters. To improve the robustness of the model using a different number of features, the model can be trained with 8 features, 16 features, and 22 features extracted from the wind turbine icing fault dataset used in this paper. The method can be used to predict the optimal model parameters in a specific scenario and to improve the robustness and generalization of the model under different scenarios. The optimal hyperparameters and accuracy rates for the three feature count cases are shown in Table 9.

Conclusions
Icing faults of wind turbines can easily lead to serious economic losses. This paper proposes using the improved ASL-CatBoost algorithm to solve the problem of unbalanced positive and negative samples in the wind turbine fault dataset, and to solve the problem concerning the fault detection algorithm, which is sensitive to the setting of hyperparameters; hence an improved crawler search algorithm is proposed to optimize hyperparameters. The following conclusions can be obtained: (1) Replacing the Cross-entropy Loss function of CatBoost algorithm with the asymmetric Loss function can improve the detection accuracy of the algorithm regarding fault data. (2) The use of the Tent chaotic mapping and t-distribution mutation strategy can improve the problem of imbalanced population distribution during RSA initialization and the tendency to fall into local optima during the iteration process. However, this article also has the following limitations; it only details a binary classification problem and it fails to accurately determine which fault is in a multi classification state. In the future, further optimizations should be made to the algorithm to improve the accuracy and efficiency of fault detection. Moreover, future optimizations should enable the algorithm to clearly indicate which category the fault belongs to under multiple fault states.