Real ‐ World Data ‐ Driven Machine ‐ Learning ‐ Based Optimal Sensor Selection Approach for Equipment Fault Detection in a Thermal Power Plant

: Due to growing electricity demand, developing an efficient fault ‐ detection system in thermal power plants (TPPs) has become a demanding issue. The most probable reason for failure in TPPs is equipment (boiler and turbine) fault. Advance detection of equipment fault can help secure maintenance shutdowns and enhance the capacity utilization rates of the equipment. Recently, an intelligent fault diagnosis based on multivariate algorithms has been introduced in TPPs. In TPPs, a huge number of sensors are used for process maintenance. However, not all of these sensors are sensitive to fault detection. The previous studies just relied on the experts’ provided data for equipment fault detection in TPPs. However, the performance of multivariate algorithms for fault detection is heavily dependent on the number of input sensors. The redundant and irrelevant sensors may reduce the performance of these algorithms, thus creating a need to determine the optimal sensor arrangement for efficient fault detection in TPPs. Therefore, this study proposes a novel machine ‐ learning ‐ based optimal sensor selection approach to analyze the boiler and turbine faults. Finally, real ‐ world power plant equipment fault scenarios (boiler water wall tube leakage and turbine electric motor failure) are employed to verify the performance of the proposed model. The computational results indicate that the proposed approach enhanced the computational efficiency of machine ‐ learning models by reducing the number of sensors up to 44% in the water wall tube leakage case scenario and 55% in the turbine motor fault case scenario. Further, the machine ‐ learning performance is improved up to 97.6% and 92.6% in the water wall tube leakage and turbine motor fault case scenarios, respectively.


Introduction
Modern thermal power plants are highly complex and are equipped with advanced data acquisition systems [1]. A huge amount of sensor data is generated and stored in the historical database of TPPs. These historical data represent the health state of the power plant that can be used for performance monitoring, fault detection, and isolation. The early detection and diagnosis of the faults in a thermal power plant can help implement shorter shutdowns, reduced maintenance, and lower generation costs [2].
Boiler tube leakage is the most probable failure in a thermal power plant. Approximately 60% of boiler shutdowns are caused by boiler tube leakages [3]. The most dominant occurrence of leakage occurs in the water wall tube section [4]. The tube leakage arises due to corrosion [5], erosion [5], and fatigue [6], which cause the tube wall thickness to decrease, leading to tube rupture and failure. Recently, an e-maintenance-based system [7] utilizing the process monitoring data was introduced for an intelligent fault diagnosis in TPPS. The process control data can provide sufficient information for effective tube leakage detection [8]. Jungwon et al. [9] utilized the thermocouples sensors data mounted on the final superheater outlet header of an 870 MW coal-fired power plant and proposed a principal component analysis (PCA)-based tube leakage detection approach. The proposed method could successfully detect tube leakage. Recently, Natarianto et al. [10] used process control data and introduced a data analytics-based approach by combining PCA, canonical variate, and linear discriminant analysis (LDA) for water wall tube leakage detection in a 650 MW supercritical coal-fired thermal power plant. Swiercz et al. [11] proposed a multiway PCA approach for boiler riser and downcomer tube leakage detection using expert-provided sensor data. The proposed method could successfully detect the tube leak 3-5 days before boiler shutdown.
Steam turbines are another vital piece of equipment used as the primary energygenerating source in a thermal power plant [12]. Steam turbines consist of multistage steam expansion that makes them complex dynamic structures. The most common faults occurring in the steam turbine are unbalancing, gear fault, looseness, and bearing fault [13]. These faults can stop the smooth operation of the steam turbine and jeopardize reliable power generation. Various research in the past decade has investigated efficient fault detection in steam turbines using historical process data or expert knowledge about the system. The anomalies in the process data can be recognized for each type of failure. Different failures can be further classified using supervised learning. Karim et al. [14] proposed a fault detection and diagnosis approach in an industrial 440 MW steam turbine using four sensitive monitoring parameters. Under challenging noise measurements, twelve major faults were successfully classified using adaptive neuro-fuzzy inference (ANFIS) classifiers. Arian et al. [15] used process monitoring data generated from an Indonesian government steam power plant and proposed a data-driven approach for fault detection in a steam turbine using a neural-network-based classifier.
Generally, a huge amount of sensors are used in power plants for process maintenance [16]. However, not all of these sensors are sensitive to fault detection. The studies mentioned above only depend on expert experience in selecting sensitive sensors to detect boiler and turbine faults. However, redundant and irrelevant sensors may influence multivariate algorithms that are highly reliant on the number of input sensors. Thus, an accurate methodology is needed to select the relevant sensors necessary to detect boiler and turbine failures. Recently, machine-learning algorithms have gained importance for intelligent fault detection and diagnosis in thermal power plants [17]. These machine-learning algorithms are typically combined with dimensionality reduction methods, such as PCA, to eliminate unnecessary data [18,19]. However, these approaches do not help identify the cause of failure, nor do they distinguish the most relevant sensors. The feature selection approaches can overcome the challenges mentioned above by simultaneously identifying the relevant sensors and removing different feature selection techniques that are available in the literature, which can be categorized into three categories: optimization-based feature selection [20], regression-based feature selection [21], and classification-based feature selection [22]. For a TPP application, the optimal sensor selection algorithm should have lower complexity and computational cost. For that purpose, correlation analysis is a well-known approach that estimates the relationship between the pairwise input by using the correlation function and removing the redundant and irrelevant features [23]. Recently, the maximum relevance minimum redundancy (mRMR) algorithm [24] has gained importance, due to its simultaneous ability to minimize redundancy while controlling relevancy among the features. Extra tree classifier [25] is another feature selection technique that has gained popularity among researchers because of its explicit meaning, simple properties, and easy conversion to "if-then rules". This technique is helpful in problems involving a vast number of numerical features. Therefore, this study utilizes the above-mentioned three approaches for the optimal sensor arrangement in TPPs.
This paper proposes a data-driven machine-learning-based optimal sensor selection approach for thermal power plant boiler and turbine faults. The study performs optimal sensor selection via different feature selection techniques (correlation, mRMR, and extratree classifier). Three supervised machine-learning classifiers (support vector machines, k-nearest neighbor, and naïve Bayes) are used for the fault classification. In the end, two real-world power plant equipment fault scenarios (boiler water wall tube leakage and turbine electric motor failure) are employed to verify the performance of the proposed model.

State-of-the-Art Literature Survey
This section lists the state of the art techniques used for equipment (boiler and turbine) fault detection in TPPs. Due to the significant importance of the boiler and turbine in TPPs, numerous attempts have been made to detect the equipment fault detection in TPP by using three main approaches, namely, the model-based method [26], the knowledge-based method [27], and the statistical analysis method [28]. A model-based approach is a conventional approach that uses static and dynamic models of the processes. In most cases, it can provide an efficient solution for fault detection. However, it cannot give correct fault detection results because it is difficult to obtain a correct mathematical model due to the complex operations of industrial systems. For a complex system with unknown models, a knowledge-based approach can be used to detect faults. This approach utilizes the rich industrial operational experience of the operators and includes the expert system method. However, this approach cannot identify the most sensitive process variables (sensors) needed to detect the faults in TPPs. Recently, statistical techniques based on multivariate algorithms such as PCA and ANNS are being used to monitor the processes with a large number of variables, such as in TPPs. However, the performance of these multivariate algorithms is highly dependent on the number of input process variables. Therefore, this study proposes an optimal sensor selection approach to identify the most sensitive sensors needed to detect equipment faults in TPPs. Table 1 covers the state of the art literature survey for the three main approaches (model-based, knowledge-based, and statistical analysis) used for boiler and turbine fault detection in TPP. Table 1. State-of-the-art literature survey boiler and turbine fault detection in TPP.

Model-based approach
Boiler tube leakage detection [26] 1997 Developed the least-square method with forgetting factor derivation for leak detection -Challenging to obtain a valid process mathematical model Boiler tube leakage detection [29] 2008 Developed the input/output loss method by computing fuel chemistry, heating value, and fuel flow Turbine fault detection [30] 2012 Used the time-delay multilayer perceptron model for residual generation for fault detection in industrial turbine Turbine fault detection [31] 2011 A nonlinear dynamic model with a dynamic tracking filter was used to detect turbine fault

Knowledgebased approach
Boiler tube leakage detection [27] 1998 Used radiation heat flux measurements for boiler tube leak detection -Experts provided sensors data -Unknown important monitoring process variables (sensors) Boiler tube leakage detection [32] 2016 Developed artificial neural network (ANN) models to detect tube leak Turbine fault detection [13] 2017 Developed artificial neural network (ANN) models to detect a fault in steam turbine

Statistical analysis approach
Boiler tube leakage detection [11] 2020 Used multiway PCA model to detect boiler tube leakage -Performance highly dependent on the number of input sensor variables -Need to find optimal sensors necessary for fault detection Boiler tube leakage detection [9] 2017 Applied PCA to tube temperature data to detect boiler tube leakage Turbine fault detection [15] 2011 A generalized discriminant analysis approach is used for steam turbine fault detection Turbine fault detection [23] 2011 Proposed a support vector machine (SVM)-based model for fault detection in steam turbine

Overview of a Coal-Fired Thermal Power Plant
The current study is conducted for a coal-fired TPP. This section gives a brief introduction of a TPP and covers the significance of the boiler and the steam turbine in a TPP.
The modern thermal power plants are developed to a great extent, but the essential equipment in a TPP is more or less the same, with a lot of sophistication and advancement to increase efficiencies [33]. Figure 1 shows the essential equipment in a coal-fired TPP. Steam is generated in the boiler and provided to the steam turbine. The steam turbine expands the steam and rotates the generator to supply electricity. The condenser condenses the turbine steam by transferring the heat to the cooling water supplied from the cooling tower.

Boiler Water Wall Tube Leakage and its Significance in a Thermal Power Plant
Bursting of the boiler water wall tube is a severe threat to the continuous and smooth operation of a TPP. In a recent survey conducted by Kokkinos et al. [34], water wall tube leakage is the dominant failure mode in the different TPPs, followed by the final superheater (SH III), first reheater (RH I), and the first superheater (SHI), as Figure 2 shows. Boiler tube leaks represent 52% of the total outages in a TPP. A TPP shutdowns, whether planned or unplanned, can cause significant financial losses among which boiler tube leakage is the most dominant failure to cause power plant shutdowns. An extensive repairing cost ranging from 2 to 10 million dollars is typically required to repair these leaks [29]. Yong et al. [35] utilized the decision-tree-based method to carry out the cost analysis of the economizer tube leaks in TPP. Considering the electricity market price of 25 dollars/MWh, the expected repair prices are shown for different repair time intervals (repair immediately, delay two days, delay four days, and delay six days). By delaying the repair, the amount of expected repair cost increased significantly.
The water wall tubes are located close to the furnace, and due to the presence of high operating temperature, flue ash erosion, and creep, tube leakage occurs. Liu et al. [36] found that the water wall tube bursts because of overheating under high operating pressure. Yang et al. [37] investigated the coal quality and found that the coal used in TPPs has high ash content that causes corrosion in water wall tubes. It was concluded that suitable coal blending could reduce the corrosion in water wall tubes. Similarly, Xue [38] et al. analyzed the boiler water and found that the presence of NaOH causes corrosioninduced perforation leakage in water wall tubes. To prevent water wall tube leakage, power plant inspectors should inspect water quality to avoid tube leakage, and water quality testing should be performed regularly.

Turbine Motor Failure Analysis
The reliability of the steam turbine is highly dependent on the reliable functioning of its hydraulic lubrication and control oil system [39]. An essential requirement is a reliable oil supply over the whole operating range. The oil pumps used for that purpose provide the lube oil [40]. The oil pumps are directly driven by an electric motor (AC supply). In the absence of oil supply, bearing failure of the rotating machinery in the steam turbine can occur. This usually happens when an electric pump-driven motor fails due to a power failure or malfunction of the protection system. Therefore, the reliability of the main oil pump depends to a large extent on the AC electric motor. Different studies have been carried out to analyze the AC electric motors [41]. It was found that 40 percent of the failures of AC motor occur due to the failure of the rolling bearing [42]. Therefore, it is recommended to diagnose the bearing condition on time, before failure occurs [43]. The other most prevalent faults in AC motors are winding, unbalanced stator and rotor, broken rotor bar, and eccentricity [44].
Recently, data-driven condition-based monitoring has gained importance in TPPs for efficient fault detection and diagnosis [45]. There are two main steps involved in condition-based monitoring. The first step consists of the data acquisition phase, which represents the health state of the object. For process control and monitoring, there are many sensors employed on the different components in the power plant. This sensor data can provide healthy and faulty state patterns that can be distinguished and classified using multivariate algorithms. In the second step, data preprocessing is carried out that involves multivariate algorithms to classify the preprocessed data. Thus, water wall tube leakage and turbine motor fault detection can be considered classification problems.

The Proposed Methodology
This section covers the proposed optimal sensor selection methodology and fault detection by using supervised machine-learning algorithms. The study is divided into three phases. In the first phase, the sensors that are essential for fault detection are distinguished by TPP experts. Those sensors are acquired and preprocessed for the optimal sensor selection process. The second phase utilizes the optimal sensor selection techniques to determine the most sensitive sensors. In the last phase, machine-learning algorithms are employed to detect the equipment (boiler and turbine) faults in TPPs and evaluate the performance of the sensor selection algorithms. The schematic of the proposed methodology is shown in Figure 3.

Data Acquisition and Preprocessing
In a TPP, it is difficult to learn the exact moment of a fault occurrence, such as a tube leakage location, and the severity of the tube leakage. Therefore, the fault detection algorithm must estimate the appropriate sensors for fault detection. Power plant historical process control data consist of thousands of process variables (sensors). However, none of those sensors are sensitive to specific faults. Therefore, the essential monitoring parameters should be carefully chosen for efficient fault detection. Power plant experts with years of experience usually carry out this process.
Power plant data tend to be inconsistent and noisy; therefore, data preprocessing is required [46]. In the literature, different noise removal techniques are being used. The traditional methods include Fourier transform analysis [47] and power spectral density analysis [48]. However, these techniques are more sensitive towards hidden oscillation and cannot obtain hidden frequencies. On the other hand, wavelet denoising has recently gained popularity in data denoising, because of its capability to simultaneously analyze both the time and frequency domains [49,50]. The wavelet works by decomposing the signal in the time and frequency domains. The selection of an optimal threshold is required to optimize the noise removal. Equation (1) shows the wavelet transform of the continuous signal: where ψ(t) is the analyzing wavelet, a is the scale parameter, and b is the position parameter.

Optimal Sensor Selection
In a TPP, piping and instrumentation (P&ID) diagrams monitor all the sensors and equipment. Figure 4 shows the P&ID diagram of the low-pressure (LP) turbine section. There are six thermocouple sensors with unique sensor IDs attached on the furnace wall at separate locations. Similarly, in the P&ID diagram of the LP turbine, six thermocouple sensors are connected to the LP turbine casing. These localized attached sensors may contain redundant knowledge, thus influencing the performance of the multivariate algorithms. Therefore, it is important to downsize the input sensors and determine the appropriate sensor arrangement for equipment fault detection in a TPP. This study used three different optimal sensor selection approaches (correlation analysis, mRMR algorithm, and extra-tree classifier). The details of the approaches are as follows.

Correlation Analysis
Correlation analysis is a well-known technique and is usually preferred because of its ease of implementation, lesser complexity, and lower computational cost [51]. This analysis evaluates the strength and relationship between the two sensors [52]. Pearson's coefficient values range (−1 to 1). The value of 1 represents a high positive correlation, while −1 represents a negative correlation between the two sensors. Equation (2) shows how Pearson's correlation (r) coefficient is calculated: where is the sensor data size, and and are the two input sensor variables. The sensors with high correlation represent the same data trend, and removing the highly correlated sensor may not influence the functioning of the multivariate algorithms. Therefore, in this study, highly correlated sensors are discarded, while keeping one of the highly correlated sensors. The step-by-step implementation of the correlation analysis for the selection of optimal sensors is shown below: 1st step: Calculation of Pearson's correlation coefficient between all the sensor signals by using Equation (2).
2nd step: Construction of the correlation matrix representing the correlation between all the sensors.
3rd step: The sensors with a correlation value equal to or greater than 0.95 are considered highly correlated.
4th step: Highly correlated sensors are discarded while keeping one of the highly correlated sensors.

mRMR Algorithm
mRMR is an approach recently proposed by Peng et al. [53] and has gained considerable importance in mechanical fault diagnosis and structural health monitoring. mRMR selects the best features in the workspace by minimizing redundancy and maximizing relevancy. It exhibits fast calculation and strong robustness qualities [54]. Hence, our study adopted this method to find the optimal sensors needed for effective fault detection in a TPP. The theoretical background of the mRMR algorithm is summarized as follows.
The algorithm first calculates the mutual information between the attributes X and Y to quantify the relevance and redundancy. Mutual information is defined as follows: where p(x,y) is the joint probabilistic density, and p(x) and p(y) are marginal probabilistic densities. Let denote the sensor dataset, while represents the already selected sensor dataset that contains sensors, and denotes the to-be-selected sensors, with the dataset consisting of sensors. The relevance of the sensor in with the target can be calculated as: , The redundancy of the sensor in with all the sensors in can be calculated as: To obtain the sensor in with maximum relevancy and minimum redundancy, Equations (5) and (6) are combined with the mRMR function: For the sensor dataset with sensors, the sensor evaluation will continue N rounds. After these evaluations, the optimal sensor set by mRMR is obtained: The sensor index ℎ represents the importance of the sensor. The more important the sensor, the smaller its index ℎ.
The overall steps involved in the computation of optimal sensor selection by using the mRMR algorithm is described below: 1st step: Mutual information is computed between the sensors by using Equation (3). 2nd step: The relevancy and redundancy of the sensor are computed by Equations (4) and (5).
3rd step: Equation (6) is used to obtain the sensor with maximum relevancy and minimum redundancy.
4th step: Score is computed for each sensor to be evaluated, and the sensors with a high score are chosen as optimal sensors

Extra-Tree Classifier (ETC)
The extra-tree classifier is an ensemble learning technique that accumulates the results of multiple decorrelated decision trees. Each decision tree selects the optimal feature by splitting the data based on the entropy value. The entropy of the feature estimates the quality of the split, as shown in Equation (8). The features belonging to the same class have zero entropy value. Thus, the extra-tree classifier works by recursively selecting node splits with the lowest entropy value: (8) where is the number of class labels, and is the portion of the samples that belong to class .
This study uses the extra-tree classifier because of its simple properties, easy conversion to "if-then" rules, and randomizing property for numerical input [25]. Such advantages make an extra-tree classifier useful for many input sensors and, in such situations, may increase accuracy. The step-by-step implementation of the extra-tree classifier is described as follows: 1st step: Computation of the entropy of the data by using Equation (8). 2nd step: Calculation of the total score for each sensor. 3rd step: Selection of the sensors with a high predictor importance score.

Machine-Learning Classifiers
Recently, supervised machine learning has gained importance in intelligent fault detection and condition monitoring [55]. Due to labeled data, the results generated from supervised machine learning are more accurate than that from other machine-learning types, such as unsupervised machine learning and reinforcement learning. In this study, three well-known supervised machine-learning classifiers (support-vector machine (SVM), k-nearest neighbors (k-NN), and the naïve Bayes algorithm (NB)) are used for the fault classification.
Due to its tendency to avoid overfitting and its ability to solve complex problems, SVM is commonly used for fault detection applications [56]. SVM forms the hyperplane between the two classes and adjusts the boundary by expanding the distance between the two classes [57]. SVM uses the kernel functions [58] for the nonlinear and inseparable dataset cases. This study utilizes the RBF kernel function due to its higher robustness and infinite smoothness. k-NN is the second supervised machine-learning algorithm used in this study, which classifies the target by calculating its distance from the nearest feature space. k-NN is chosen in this study because of its ease of execution and requires no new parameters to tune. The third algorithm used in this study is naïve Bayes, which is based on the Bayesian theorem [59], and is commonly used for large datasets. Naïve Bayes is chosen in this study because of its higher classification speed and ease of implementation.
Before executing the machine-learning model in real-world applications, its performance must be estimated to verify its extrapolation ability and generalization. Different validation techniques are available in the literature, among which k-fold crossvalidation is the most popular [60]. In this study, fivefold cross-validation is used to evaluate the training accuracies of the machine-learning models used.

Real-World Power Plant Scenarios-Computational Results
In this section, two real power plant fault case scenarios (boiler water wall tube leakage and turbine motor fault) are employed to validate the performance of the proposed approach. The fault case scenarios analyzed in this study are shown in Table 2. The data obtained from the TPP consist of the time domain signals of the expert's selected process variables (sensors). The detailed description of the acquired data is shown in Figures A1 and A2. The process variables are stored in the historical database of a TPP with a sampling period of 1 s. Ten days of data at the normal working condition of the power plant and 10 days of data from the fault state were acquired from the TPP. As this study focused on the early-stage fault detection of TPP, the data should be provided according to the different fault severity levels so that the fault could be detected at the low severity level of the fault stage. However, the data were not acquired in controlled lab conditions. Therefore, it was not possible to create faults with different severity levels and obtain the data accordingly. This is the main limitation of the acquired data from TPP. Figure 5 shows a schematic of the proposed model for TPP boiler water wall tube leakage detection.

Case Scenario 1-Boiler Water Wall Tube Leakage
This section implements the proposed approach to the real-world power plant boiler water wall tube leakage scenario. The details of the computational results are as follows.

Acquisition of the Sensitive Sensors Data and Data Preprocessing
Thirty-eight sensitive sensors selected by experts from 103 MW coal-fired thermal power plants are utilized in this study. The acquired sensors consist of a generator active power sensor and the thermocouple sensors employed in the different components of the boiler that measure inlet and outlet header temperatures, superheater (SHI, SHII, SHIII) metal temperatures, and reheater (RHI, RHII) metal temperatures. Figure A1 of Appendix A shows the details of the sensors with the power plant sensor ID, and the notations are assigned to each sensor for ease of the optimal sensor selection process. Figure 6 shows the healthy (normal state) and leakage data plots for the SHI inlet header temperature, SHII metal temperature, and the RH II metal temperature with the corresponding generator active power. The data consist of the ten day (10 d) healthy and 10 d water wall tube leakage data acquired before the power plant shutdown. The red line represents the healthy data, whereas the blue color represents the leakage data. Large fluctuations are observed during the water wall tube leakage state of the boiler, as compared to the normal state. After the data acquisition, data preprocessing was carried out. In the data preprocessing phase, the wavelet analyzer toolbox of MATLAB is used to denoise the sensor signals. Soft thresholding with five levels of decomposition was chosen for optimum noise removal. Figure 7 shows the effectiveness of the noise removal by wavelet denoising. The red color shows the denoised signal, while the black color line represents the noisy generator active power sensor signal.

Optimal Sensor Selection Algorithms
Three different algorithms are used in this study for optimal sensor selection. The results of the algorithms are shown in each subsection.

Correlation analysis
The correlation analysis first carries out the optimal sensor selection process. Pearson's correlation coefficient is determined for all the data of the sensors. The sensors showing a high correlation represent the same data trend, and the performance of the multivariate algorithm may not be influenced by keeping one of the highly correlated sensors and removing the rest. Two sensors are assumed to be highly correlated if the correlation coefficient value is > 0.95. Table 3 shows that X6 (steam temperature after SH I) is highly correlated with X7, X8, X9, X10, and X11 (SH I metal temperature) with the correlated coefficient value of >0.95. Therefore, X6 is selected, and the rest of the irrelevant sensors are removed. The exact process is carried out, and the 21 optimal sensors are selected out of 38 sensors. 0.989 Figure 8 shows the correlation matrix with the 21 optimal sensors. The red color region shows the highly correlated sensors.

mRMR algorithm
The minimum redundancy and maximum relevance (mRMR) algorithm selects the optimal tags by selecting the relevant features, while controlling the redundancy within the selected features. Figure 9a represents the sensor rank with the predictor importance score. X25 with tag id P1HAH77CT005XQ01 representing the SH III metal temperature is on the 1st rank with the predictor importance score of 0.22, followed by X1 (generator active power). Figure 9b shows the 21 optimal sensors that are selected.

Extra-tree classifier
The extra-tree algorithm is a type of ensemble learning technique that aggregates multiple decorrelated decision trees to select the optimal tags. Figure 10a shows that the top 21 tags with high predictor importance are selected as optimal tags. X1 with tag id representing the active generator power is on the 1st rank with the predictor importance score of 0.185, followed by X25 (SH III metal temperature). Figure 10b shows the optimal sensors selected by the extra-tree classifier.

Machine-Learning Classification
This section presents the machine-learning performance of the proposed methodology. The sensor data (raw data) obtained from the power plant consists of 38 time-domain signals with 10 days of healthy and 10 days of leakage data. Twenty-one sensor signals (optimal sensors) are selected by each optimal sensor selection scheme (correlation analysis, mRMR algorithm, and extra-tree classifier). The direct application of the time-domain signals in the machine-learning classifiers cannot provide satisfactory results. Therefore, the common practice is to estimate the time-domain statistical features and use these features in the machine-learning classifiers. In this study, four time-domain statistical features (root mean square, variance, skewness, and kurtosis) are computed for the raw and optimal sensors data. Table 4 shows that four data cases are analyzed, and the machine-learning performance is computed and compared. Fivefold cross-validation is performed to avoid overfitting. The data are partitioned into five disjointed folds. The fourfold data were used as the training samples, and the onefold data as a testing sample for each of the five iterations. This methodology provides a reasonable estimation of the predictive accuracy of the final model trained with all the data. Figure 11 summarizes the results of the machine-learning classification for all four case scenarios. Without implementing the optimal sensor selection, the k-NN-based classifier provides the highest machine-learning accuracy of 94.7%. It can be observed that after eliminating the irrelevant sensors, the performance of the machine-learning classifiers increased slightly in the optimal sensor data case scenarios. The k-NN-based mRMR algorithm provides the highest machine-learning accuracy of 97.6%. Figure 11. Machine-learning performance comparison of the four data case scenarios. Figure 12 plots the confusion matrix for the k-NN-based raw data case scenario and the k-NN-based mRMR algorithm to assess the performance of the classifier in the raw and optimal sensor data case scenarios. The confusion matrix indicates the performance of the classifier in each class. The row shows the true class, while the column shows the predicted class. The accuracy in the confusion matrix is calculated as follows: (9) where TP represents the true positive, and FN represents the false negative.
In the raw data, 7.9% and 2.6% misclassification occur in the healthy (H) and water wall leakage (WWL) classes. In the optimal sensors data case scenario, the misclassification in each class is reduced to 4.8% in the healthy class, and k-NN classifies correctly for the water wall tube leakage class, with no misclassification error. In addition to fivefold cross-validation, tenfold cross-validation is performed to validate the robustness of the machine-learning models, and the results are compared with fivefold cross-validation results, as shown in Table 5. It was observed that there is a slight enhancement of cross-validation accuracies in tenfold cross-validation for both the raw and optimal sensor datasets.

Case Scenario-2: Steam Turbine Motor Failure
In the second case scenario, this study analyzes the steam turbine motor failure for the 500 MW thermal power plant that resulted in the unscheduled maintenance shutdown. The proposed data-driven machine-learning-based optimal sensor selection approach is employed intelligently to diagnose the steam turbine motor fault detection.

Acquisition of the Sensitive Sensors Data and Data Preprocessing
Experts of the power plant provided the one hundred and 36 sensor data that are most sensitive to the steam turbine motor fault. Figure A2 of Appendix A shows the details of the sensor data. ID represents the number given to each sensor in the power plant. Notations are assigned to each sensor for the optimal sensor selection process.
The data consist of the 10 d healthy and the 10 d faulty state data, as shown in Figure  13. The different sensors (main turbine speed, vibration-X bearing#1, and HP exhaust steam temperature) are plotted corresponding to the active generator power. The red color represents the healthy data, whereas the blue color shows the faulty state of the turbine. It can be observed that during the faulty state of the steam turbine, the fluctuations in the sensor data increased. Similarly, as in the case-1 scenario, the wavelet analyzer toolbox is utilized to denoise the sensor signals. Figure 14 shows the effectiveness of the wavelet denoising. Black color represents the noisy signal, while the red color shows the denoised signal after employing the wavelet denoising.

Optimal Sensor Selection
This section shows the computational results of the correlation analysis, mRMR algorithm, and the extra-tree classifier.

Correlation analysis
Pearson's correlation coefficient is computed for all the sensor signals. The highcorrelation sensors are removed, while keeping one. The procedure is followed throughout the sensor selection process, and 61 optimal sensors are selected. Figure 15 shows the correlation matrix. The red color represents a high correlation between the sensors.  Figure 16 shows the optimal sensors selected by correlation analysis consist of the actual load (generator active power), HP exhaust steam temperature, main turbine speed, bearing vibrations, bearing metal temperatures, and oil drain temperatures. Figure 16. List of the optimal sensors selected by correlation analysis.

mRMR algorithm
The mRMR algorithm is applied to the sensor data to minimize the redundancy while keeping the relevance. Figure 17 shows the sensor rank with the predictor importance score. X25 with tag id P1HAH77CT005XQ01 representing the SH III metal temperature is on the 1st rank with a predictor importance score of 0.22, followed by X1 (generator active power). Figure 9b shows the 21 optimal sensors that are selected. X36 (vibration-2X in bearing #2) is selected to be the most sensitive sensor with a predictor importance score of 0.698, followed by turbine bearing metal temperature #1. Figure 18 lists the 61 optimal sensors selected by the mRMR algorithm.  Figure 18. List of optimal sensors selected by mRMR algorithm.

Extra-tree classifier
The raw sensors (136 sensors) are given as the input in the extra-tree classifier. The top 61 are selected as the optimal sensors necessary to predict turbine motor fault. Figure  19 presents the predictor importance score of the selected sensors. X37 (bearing#2 vibration-2Y) is selected as the most sensitive sensor variable with a predictor importance score of 0.062.    Figure 20. List of optimal sensors selected by extra-tree algorithm.

Machine-Learning Classification
This section computes machine-learning performance to quantify the proposed machine-learning-based optimal sensor selection approach. The raw data obtained from the power plant consists of 136 sensors with 10 d of data for each healthy and faulty state. The data consist of the time-domain signals; therefore, the four statistical features (root mean square, variance, skewness, and kurtosis) are calculated for the raw and optimal sensors data and used in the machine-learning classifiers to attain satisfactory results. Table 6 shows that four data cases are analyzed, and the machine-learning performance is computed and compared. Three supervised machine-learning classifiers (SVM, k-NN, and naïve Bayes) are chosen in this study to classify the normal and leakage state. Fivefold cross-validation is performed to avoid overfitting. Figure 21 summarizes the results of the machine-learning classification for all four case scenarios. Without implementing the optimal sensor selection, the naïve-Bayes-based machine-learning classifier provides the highest machine-learning accuracy of 87.5%. After removing the irrelevant sensors, the performance of the machine-learning classifiers increased slightly in the optimal sensor data case scenarios. The naïve-Bayes-based extra-tree classifier provides the highest machine-learning accuracy of 92.6%. The machine-learning performance of the naïve Bayes classifier increased to 5.1%, compared to the raw sensor dataset case. Therefore, the proposed machine-learning-based optimal sensor selection approach enhanced the classification performance and reduced the input sensors to 55.1%.  Figure 21. Machine-learning performance comparison of the four data case scenarios. Figure 22 shows the confusion matrix for the naïve-Bayes-based raw data case scenario and plots the extra-tree classifier to assess the classifier performance in the raw and optimal sensor data case scenarios. This indicates that in the raw data case scenario, the false-negative rate is 19.1% in the fault class (f) and 5.9% in the healthy class (h). The naïve Bayes algorithm reduced the false-negative rate to 6% and 4.3% in healthy and fault classes, respectively, and enhanced the machine-learning performance to 92.6%. Similarly, as performed earlier in the boiler water wall tube leakage case scenario, the robustness of the model is validated by performing tenfold cross-validation. The results of the tenfold cross-validation are compared with the fivefold cross-validation results. It was observed that there is a slight enhancement of cross-validation accuracies in tenfold cross-validation for both the raw and optimal sensor dataset cases, as shown in Table 7.

Conclusions
A vast number of sensor data was collected from the historical database of power plants. It is essential to point out the informative sensors necessary to detect the fault in the presence of irrelevant and redundant sensors. Multivariate algorithms are highly dependent on the number of input sensors. The redundant and irrelevant sensors may reduce the performance of these classifiers. Therefore, this study proposed a machinelearning-based optimal sensor selection approach for equipment (boiler and turbine) fault detection in thermal power plants. Three optimal sensor selection approaches (correlation analysis, mRMR algorithm, and extra-tree classifier) are employed in this study. Three supervised machine-learning classifiers (SVM, k-NN, and naïve Bayes) are used to classify the normal and faulty states. The proposed approach is implemented on the two realworld case scenarios (boiler water wall tube leakage and turbine motor fault). The computational results indicate that the optimal sensor selection approaches not only reduced the number of sensors by up to 44% in the water wall tube leakage scenario from 38 to 21 sensors, and by 55% in the turbine fault case scenario from 136 to 61 sensors, but also enhanced the machine-learning accuracy. The k-NN-based mRMR algorithm provides the highest accuracy of up to 97.6% in the boiler water wall tube leakage case scenario. In the second case scenario (turbine motor failure), the naïve-Bayes-based extratree classifier provides the highest accuracy of 92.6% compared with the other comparative models. This study suggests the efficient and straightforward optimal sensor selection approaches that can be implemented in thermal power plants, and in future research work, this may provide the guidelines for efficient fault detection in TPPs.