Hyperparameter Tuning of OC-SVM for Industrial Gas Turbine Anomaly Detection

: Gas turbine failure diagnosis is performed in this work based on seven types of tag data consisting of a total of 7976 data. The data consist of about 7000 normal data and less than 500 abnormal data. While normal data are easy to extract, failure data are difﬁcult to extract. So, this study mainly is composed of normal data and a one-class support vector machine (OC-SVM) is used here, which has an advantage in classiﬁcation accuracy performance. To advance the classiﬁcation performance, four hyperparameter tuning (manual search, grid search, random search, Bayesian optimization) methods are applied. To analyze the performance of each technique, four evaluation indicators (accuracy, precision, recall, and F-1 score) are used. As a result, about 54.3% of the initial failure diagnosis performance is improved up to 64.88% through the advanced process in terms of accuracy.


Introduction
The gas turbine plays an important role in CCPP (combined cycle power plants) and is largely composed of a compressor, a combustor, and turbine.Compressors rotating at a high speed cause surge or vibration problems and combustors have problems such as thermodynamic stress or combustion vibration due to high temperature and high flow rate.Turbines have problems such as coating detachment due to the high temperatures and pressures coming from the combustor.Therefore, gas turbines are very sensitive pieces of industrial equipment, and only a preemptive response to various issues can ensure gas turbine operability, maintainability, and performance.
A concept introduced for the overall health management of gas turbines is PHM (prognosis and health management).Because PHM is directly related to the three issues mentioned above (operability, maintainability, and continuous performance), it is a very important technology, not only for gas turbine OEMs, but also for customers.OEMs can gain product trust from customers through PHM, and, on the contrary, customers can expect stable power via PHM.
Recently, PHM and AI technology have been combined and developed in various ways.The purpose of using PHM and AI technology together is to analyze realtime gas turbine data, detect unexpected prognoses, and act to avoid them by means of a preventive maintenance schedule.The first act of PHM via AI technology is failure diagnosis, which predicts anomalies based on the numerous data generated from gas turbines.A OC-SVM (one-class support vector machine) is a commonly used algorithm for determining system abnormalities.
Takahiro et al. [1] analyzed failure detection accuracy using OC-SVM to detect defects in the gas turbine generator of the thermal power plant.In particular, OC-SVM method demonstrated an accuracy of up to 98% for defect detection.Weizhong et al. [2] conducted a study applying one of the latest technologies, using an extreme learning machine (ELM) technique, to a OC-SVM in order to detect abnormalities in gas turbines combustors.By way of the advantages of OC-SVM, combined with maintaining the advantages of ELM, better performance results can be obtained than those with the use of other classification models.Ioannis et al. [3] conducted a fault diagnosis study using gas turbine vibration data.Among the hyperparameters of the OC-SVM, the width γ and optimization penalty parameters (ν) of the kernel were optimized using a grid search method.As a result of the optimization, the accuracy of the model was close to 100%, and the optimization of the hyperparameter was emphasized to improve the model accuracy.Daogang et al. [4] conducted a study on the failure diagnosis of various sensor data inputs to the gas turbine control system using a SVM.The characteristics of the data were extracted using an EMD (empirical mode decomposition) method, and a fault diagnosis model was constructed using a SVM.As a result, the algorithm accuracy for randomly selected variables reached up to 85%.Weihong et al. [5] analyzed the vibration signal of a gas turbine bearing and conducted a study to separate the operating state and failure type using a OC-SVM.An algorithm to effectively diagnose the bearing state and soundness, even in a small sample environment, was proposed by extracting the characteristic signal of the bearing and applying it to a SVM algorithm.
Unsupervised OC-SVMs are used in a variety of fields besides gas turbine systems (see Figure 1).Takashi et al. [6] conducted a study to detect abnormal signs of hydroelectric power plants using OC-SVM.A method of predicting the failure of a hydroelectric power plant was proposed by determining the abnormal vibration phenomenon of the bearing among various sensor data.Juhamatti et al. [7] used a OC-SVM algorithm to diagnose wind turbine failures.OC-SVM performance changes according to changes in the values of ν and γ were analyzed to diagnose bearing failures from the vibration signals extracted from the wind turbines.So far, several studies have been conducted to improve the anomaly detection performance of OC-SVM.Even with the same algorithm, the prediction accuracy can be increased by adding a special algorithm in the variable selection stage.In addition, by optimizing the various hyperparameters of the OC-SVM, the algorithm accuracy may be improved by up to 10% or more.

Methodology of One-Class Support Vector Machine (OC-SVM)
A support vector machine (SVM) is a method for generating criteria data sets [8,9].In this case, the criterion used is referred to as a decision b hyperplane.Based on this, data types with different features are classified.est to the hyperplane is called a support vector.As shown in Figure 2, t closest to the hyperplane are designated as support vectors, respectively.In this study, anomaly detection study is performed using gas turbine data.A OC-SVM algorithm is used, where the main purpose is to increase the prediction accuracy by tuning the hyperparameter of the OC-SVM.Four methods are used for hyperparameters tuning, including manual searching, grid searching, random searching, and Bayesian optimization.Through the combined above method, the accuracy of the OC-SVM is improved by more than 10% when compared to the initial algorithm.

Methodology of One-Class Support Vector Machine (OC-SVM)
A support vector machine (SVM) is a method for generating criteria for classifying data sets [8,9].In this case, the criterion used is referred to as a decision boundary or a hyperplane.Based on this, data types with different features are classified.The data closest to the hyperplane is called a support vector.As shown in Figure 2, the two points closest to the hyperplane are designated as support vectors, respectively.The margin is the distance between the support vector and the hyperplane.Therefore, the goal of improving the support vector machine performance is to find the hyperplane that maximizes the margin.

Methodology of One-Class Support Vector Machine (OC-SVM)
A support vector machine (SVM) is a method for generating criteria data sets [8,9].In this case, the criterion used is referred to as a decision b hyperplane.Based on this, data types with different features are classified.est to the hyperplane is called a support vector.As shown in Figure 2, t closest to the hyperplane are designated as support vectors, respectively.the distance between the support vector and the hyperplane.Therefore, t proving the support vector machine performance is to find the hyperplane th the margin.In this study, after training using only normal data, data sets not inclu evant criteria correspond to one-class classification, which is classified Therefore, the main purpose of a OC-SVM is to classify normal data and o constructing a hyperplane with the maximum margin using the kernel fun high-dimensionalize the input data.The objective function of the OC-SVM (1) [10], where In this study, after training using only normal data, data sets not included in the relevant criteria correspond to one-class classification, which is classified as abnormal.Therefore, the main purpose of a OC-SVM is to classify normal data and outlier data by constructing a hyperplane with the maximum margin using the kernel function Φ(x i ) to high- dimensionalize the input data.The objective function of the OC-SVM is as Equation ( 1) [10], where 1  2 W 2 is a term that is regularized to minimize the fluctuation caused by the change in x i , and ρ is the distance between the origin and the hyperplane.Minimization of the objective function can be reached by maximizing ρ.Further, 1 νl ∑ l i=1 ξ i is the sum of the penalty x i given to normal data, l denotes the number of normal data, and ν means the smoothness of the boundary, and ξ i is a slack variable and has the same relational expression as Equation ( 2).If a hyperplane to discriminate data is found using the objective function, a decision boundary such as Equation ( 3) is used to classify normal data as positive number and abnormal data as negative number based on this.In this study, the gas turbine dataset is classified into normal/abnormal using OC-SVM.In addition, four hyperparameter tuning processes, which will be mentioned in the next section, are used to improve the fault diagnosis performance.

Hyperparameter
In this paper, four hyperparameter tuning techniques (manual search, grid search, random search, and Bayesian optimization) are applied to improve the performance [11,12].The kernel function uses a Gaussian kernel (radial basis function), and the hyperparameters were set to γ and ν.A kernel function is a function that increases efficiency through highdimensional mapping effect for non-linear datasets that are difficult to classify.Among them, this study uses the most generalized Gaussian kernel.The form for the Gaussian kernel function is given as Equation ( 4), and σ means variance [13].
Note that γ is a variable that determines how much tolerance is given in the classification process, and the allowable range is set based on the support vector.A large γ-value means that the tolerance is small.In this case, overfitting occurs.On the other hand, a small number means a wide acceptable range, and it is difficult to accurately classify complex data, resulting in poor reliability.ν controls the smoothness of the kernel function.The larger ν, the smoother the function.As the classification performance varies according to the two hyperparameters (γ, ν), optimization was performed with the following advanced method.The overall hyperparameter tuning process is shown in Figure 3. Therefore, the purpose of this study is to derive ν and γ in improved performance through four tuning techniques.

Hyperparameter
In this paper, four hyperparameter tuning techniques (manual search, grid search, random search, and Bayesian optimization) are applied to improve the performance [11,12].The kernel function uses a Gaussian kernel (radial basis function), and the hyperparameters were set to  and .A kernel function is a function that increases efficiency through high-dimensional mapping effect for non-linear datasets that are difficult to classify.Among them, this study uses the most generalized Gaussian kernel.The form for the Gaussian kernel function is given as Equation ( 4), and σ means variance [13].
Note that  is a variable that determines how much tolerance is given in the classification process, and the allowable range is set based on the support vector.A large -value means that the tolerance is small.In this case, overfitting occurs.On the other hand, a small number means a wide acceptable range, and it is difficult to accurately classify complex data, resulting in poor reliability. controls the smoothness of the kernel function.The larger , the smoother the function.As the classification performance varies according to the two hyperparameters (, ), optimization was performed with the following advanced method.The overall hyperparameter tuning process is shown in Figure 3. Therefore, the purpose of this study is to derive ' and ' in improved performance through four tuning techniques.

Manual Search
This method is a method of deriving the optimal performance by inputting numerical values arbitrarily (see Figure 4a).After training by substituting a numerical value, the user sees the obtained result and re-enters the value according to the user's judgment.It is an

Manual Search
This method is a method of deriving the optimal performance by inputting numerical values arbitrarily (see Figure 4a).After training by substituting a numerical value, the user sees the obtained result and re-enters the value according to the user's judgment.It is an empirical tuning method without set rules, so it has the advantage of taking less time.However, in order to derive high performance, there is a large difference depending on individual ability, and there are limitations in that it is difficult to search for major hyperparameters in various ways.Therefore, this method is mainly used when there are few parameters and stops when a target value is obtained within a given time.In this paper, after setting arbitrary values for two hyperparameters (γ, ν), the training proceeds by repeatedly substituting the values.
individual ability, and there are limitations in that it is difficult to search for major hyperparameters in various ways.Therefore, this method is mainly used when there are few parameters and stops when a target value is obtained within a given time.In this paper, after setting arbitrary values for two hyperparameters (, ), the training proceeds by repeatedly substituting the values.

Grid Search
This method derives a combination of rules for hyperparameters using the Cartesian product (see Figure 4b).In the iterative process, the result corresponding to each situation is derived without reflecting the previous result.After setting the range for the hyperparameter to be applied in advance, it is a method to find the optimal combination considering the number of all cases.The wider the range and the smaller the interval, the greater the probability of finding the optimal solution.On the other hand, as the range is narrowed and the interval is set, the probability of finding the optimal combination decreases, but the time required is short.Mainly, the latter method is used to gradually decrease the range.Therefore, it is effective when the number of hyperparameters is small.In this paper, a combination was created through the Cartesian product for two hyperparameters (, ).Variables corresponding to  were designated using i, and variables corresponding to  were designated using j.Variable i is set to consist of 100 elements, and variable j is set to consist of 1000 elements.Through this combination, a total of 100 × 1000 combinations were learned, and the optimal combination was found.

Random Search
In the case of grid search, a relatively accurate optimal solution can be derived because all combinations of the range set using the Cartesian product are trained, but it has the disadvantage that it takes a lot of time.To compensate for this, a random search method was applied.Random search is a method for training by deriving hyperparameter

Grid Search
This method derives a combination of rules for hyperparameters using the Cartesian product (see Figure 4b).In the iterative process, the result corresponding to each situation is derived without reflecting the previous result.After setting the range for the hyperparameter to be applied in advance, it is a method to find the optimal combination considering the number of all cases.The wider the range and the smaller the interval, the greater the probability of finding the optimal solution.On the other hand, as the range is narrowed and the interval is set, the probability of finding the optimal combination decreases, but the time required is short.Mainly, the latter method is used to gradually decrease the range.Therefore, it is effective when the number of hyperparameters is small.In this paper, a combination was created through the Cartesian product for two hyperparameters (γ, ν).Variables corresponding to γ were designated using i, and variables corresponding to ν were designated using j.Variable i is set to consist of 100 elements, and variable j is set to consist of 1000 elements.Through this combination, a total of 100 × 1000 combinations were learned, and the optimal combination was found.

Random Search
In the case of grid search, a relatively accurate optimal solution can be derived because all combinations of the range set using the Cartesian product are trained, but it has the disadvantage that it takes a lot of time.To compensate for this, a random search method was applied.Random search is a method for training by deriving hyperparameter values randomly within a range after the user specifies the minimum and maximum values of the hyperparameter (see Figure 4c).Compared to grid search, the training speed is faster, and it has the advantage of being able to learn points other than the points specified by the user.In this study, random search was performed using the "randomized search CV" function.For each of the hyperparameters γ and ν, the minimum and maximum sections were set as shown in Equation ( 5) below.The number of iterations was set to 100,000, and training was performed with a total of 500,000 samples through 5-fold cross-validation.0.01 < γ < 10.001 < ν < 1 (5)

Bayesian Optimization
The three advanced techniques mentioned above are independent methods that do not reflect the results for each point.There is a lack of credibility in the results obtained through training.To supplement this, the Bayesian optimization method, an optimization method made by combining Bayesian theory and a Gaussian process was used (see Figure 4d).
In the case of Bayesian theory, the relationship between the prior probability and posterior probability of two variables is explained.Referring to Equation ( 6), the posterior probability p(A|B) can be obtained when the values of the prior probability p(A) and the likelihood p(B|A) are known.
A Gaussian process is a distribution over a function.As the multi-variate normal distribution is expressed by the mean vector and the covariance matrix, the Gaussian process is defined as the following Equation ( 7) through the mean function and the covariance function.The Gaussian process is used as a prior in Bayesian theory.
Bayesian optimization is a hyperparameter tuning process that derives the maximum or minimum value of a function by introducing the concept of a black-box function rather than a clearly presented function to the objective function.The process proceeds in two stages (surrogate function and acquisition function).First, after training the surrogate function for estimating the objective function, the direction of selecting the improved hyperparameter condition is presented.Among the various surrogate models, the most representative method is the Gaussian process regression described above.The function that calculates the hyperparameter condition to be substituted from the result of the surrogate model is the acquisition function.Acquisition functions are also classified into two types (exploration and exploitation).Exploration is a method of exploring uncertain points by focusing on conditions with large variance.That is, when the objective function is verified using a new point, the prior distribution of the new objective function is updated.Exploitation focuses on the high mean (the best point within a known range).That is, based on the posterior distribution, training is performed at the location with the highest probability of being the global minimum point.In other words, the Gaussian process for the black-box function is applied using the extracted training data using Bayesian optimization, and the next data is extracted in a direction that can minimize the uncertainty of the objective function.After that, the minimum or maximum value of the objective function is calculated by repeating the process.In this study, a Bayesian optimization process was performed by setting a total of 1,000,000 iterations for the same two hyperparameters (γ, ν) as in the previous three methods.

Dataset
In this study, fault diagnosis was performed based on seven types of tag data extracted from the actual gas turbine data of Doosan Enerbility in Korea, consisting of a total of 7976 data, and 7041 training data for training the hyperplane.Among the remaining 935 data for verification, the normal test data consists of 567 points and test abnormal data consists of 366 points (see Table 1).The source of the anomaly data used in in this study is the failure due to decoupling of the turbine blade coating in the gas turbine.Table 2 show the details of seven tag data: the pressure of compressor outlet, temperature of compressor outlet, fuel flow rates, temperature average of blade path, temperature average of front disc cavity, temperature average of middle disc cavity, and temperature average of rear disc cavity.Temperature average of middle disc cavity K 7 Temperature average of rear disc cavity K Figure 5 shows the normal data used for training and the abnormal data used for testing from 1 to 7. The Y-axis of the graph is normalized from 0 to 1 for data security, and the X-axis denotes the amount of data.

Dataset
In this study, fault diagnosis was performed based on seven types of tag data extracted from the actual gas turbine data of Doosan Enerbility in Korea, consisting of a total of 7976 data, and 7041 training data for training the hyperplane.Among the remaining 935 data for verification, the normal test data consists of 567 points and test abnormal data consists of 366 points (see Table 1).The source of the anomaly data used in in this study is the failure due to decoupling of the turbine blade coating in the gas turbine.Table 2 show the details of seven tag data: the pressure of compressor outlet, temperature of compressor outlet, fuel flow rates, temperature average of blade path, temperature average of front disc cavity, temperature average of middle disc cavity, and temperature average of rear disc cavity.Temperature average of rear disc cavity K Figure 5 shows the normal data used for training and the abnormal data used for testing from 1 to 7. The Y-axis of the graph is normalized from 0 to 1 for data security, and the X-axis denotes the amount of data.

Performance Evaluation
To perform performance evaluation on the four hyperparameter tuning models, four indicators (accuracy, precision, recall, and F-1 score) were used.There are four main elements that make up this indicator.TP (true positive) predicts positive, if the correct answer is positive; TN (true negative) predicts negative if the correct answer is negative; FP (false positive) predicts positive if the correct answer is negative; FN (false negative) predict negative if the actual answer is positive.Table 3 summarizes the following explanations [15].Accuracy refers to the number of accurately fit data among all data.Accuracy in an index that can intuitively determine the performance of a model.It has a value between 0 and 1, and the closer to 1, the better.The relation can be expressed as: Precision is the probability of being positive among those predicted as positive.In fact, it is an indicator to prevent erroneous prediction as positive in the case of negative.As with as accuracy, it has a value between 0 and 1, and closer to 1 denotes better performance.The relation can be expressed as: Recall is the probability that the model predicts positive among actual positive.In fact, it is an index to prevent erroneous prediction of positive data as negative and is focused on lowering FN (false negative): Finally, the F-1 score is the harmonic mean of precision and recall.When the data labels are unbalanced, the performance of the model can be accurately evaluated with a single number.When precision and recall are not biased toward either side, the value of F-1 score is large.This number is a value between 0 and 1, and the closer to 1, the better the performance:

Results
Using the gas turbine data set in Table 1, the classification performance was analyzed by applying the four hyperparameter tuning techniques (manual search, grid search, random search, and Bayesian optimization) (see Table 4).The accuracy of the manual search was 0.5429 and F-1 score was 0.6242, which was a result value corresponding to ν = 0.001 and γ = 0.9.Accuracy and F-1 score values corresponding to grid search showed result values corresponding to ν = 0.6 and γ = 0.09.Random search showed ν = 0.2185 and γ = 0.5945.Finally, in the case of Bayesian optimization, with the best classification performance, an accuracy of 0.6488 and F-1 score of 0.6371 were achieved, where the hyperparameter results were ν = 0.4309 and γ = 0.55521.In conclusion, Bayesian optimization, which performed optimization of the black-box function objective function by probabilistic approach through Bayesian theory and a Gaussian process, had good classification performance.Based on the results, it was concluded that manual search and grid search depend on user judgement, but Bayesian optimization obtained more accurate results by searching a wider space with a statistical method.

Conclusions
In this study, the anomaly detection performance of gas turbines, which plays an important role in CCPP, was improved.Actual gas turbine data composed of seven tags provided by Doosan Enerbility in Korea were used.The data consist of a total of about 8000 data points, most of which are normal data and contain a small amount of abnormal data.Since training was performed for one class, anomaly detection was performed using OC-SVM, which has an advantage in classification in the corresponding data set.Additionally, four types of hyperparameter tuning (manual search, grid search, random search, and Bayesian optimization) were applied to improve the performance.The hyperparameter was set to γ and ν.Performance evaluation was performed for analysis between tuning techniques.Four indicators were set as accuracy, precision, recall, and F-1 score.In the case of manual search, an accuracy of 0.5429 and F-1 score of 0.6242 were recorded (ν = 0.001, γ = 0.9).In the case of grid search, an accuracy of 0.6038 and F-1 score of 0.6262 were obtained (ν = 0.6, γ = 0.09).In the case of random search, an accuracy of 0.6381 and F-1 score of 0.6301 were recorded (ν = 0.2185 and γ = 0.5945).Bayesian optimization performed best, with an accuracy of 0.6488 and F-1 score of 0.6371.That is, ν = 0.4309 and F-1 score = 0.55521 corresponding to Bayesian optimization were the optimal hyperparameter combinations.Finally, the Bayesian optimization method made by combining Bayesian theory and Gaussian process had the highest anomaly detection performance for the corresponding dataset.

Figure 2 .
Figure 2. Schematic of the one-class SVM.

1 2‖𝑊‖ 2 Figure 2 .
Figure 2. Schematic of the one-class SVM.Classification problems can be divided into one-class classification and multi-class classification.The standard for this is the data class used in the training process.For example, if a classification process is performed after training only one class, it is included in the one-class classification process.In this study, after training using only normal data, data sets not included in the relevant criteria correspond to one-class classification, which is classified as abnormal.Therefore, the main purpose of a OC-SVM is to classify normal data and outlier data by constructing a hyperplane with the maximum margin using the kernel function Φ(x i ) to high- dimensionalize the input data.The objective function of the OC-SVM is as Equation (1)[10], where1  2 W 2 is a term that is regularized to minimize the fluctuation caused by the change in x i , and ρ is the distance between the origin and the hyperplane.Minimization of the objective function can be reached by maximizing ρ.Further, 1 νl ∑ l i=1 ξ i is the sum of the penalty x i given to normal data, l denotes the number of normal data, and ν means the smoothness of the boundary, and ξ i is a slack variable and has the same relational expression as Equation (2).If a hyperplane to discriminate data is found using the objective function, a decision boundary such as Equation (3) is used to classify normal data as positive number and abnormal data as negative number based on this.In this study, the gas turbine dataset is classified into normal/abnormal using OC-SVM.In addition, four hyperparameter tuning processes, which will be mentioned in the next section, are used to improve the fault diagnosis performance.

Figure 3 .
Figure 3.The whole process for hyperparameter tuning.

Figure 3 .
Figure 3.The whole process for hyperparameter tuning.

Table 1 .
Configuration of total data set.

Table 2 .
Detailed explanation for seven tag data.

Table 1 .
Configuration of total data set.

Table 2 .
Detailed explanation for seven tag data.

Table 4 .
Performance results for all methods.