Evaluation of One-Class Classiﬁers for Fault Detection: Mahalanobis Classiﬁers and the Mahalanobis–Taguchi System

: Today, real-time fault detection and predictive maintenance based on sensor data are actively introduced in various areas such as manufacturing, aircraft, and power system monitoring. Many faults in motors or rotating machinery like industrial robots, aircraft engines, and wind turbines can be diagnosed by analyzing signal data such as vibration and noise. In this study, to detect failures based on vibration data, preprocessing was performed using signal processing techniques such as the Hamming window and the cepstrum transform. After that, 10 statistical condition indicators were extracted to train the machine learning models. Speciﬁcally, two types of Mahalanobis distance (MD)-based one-class classiﬁcation methods, the MD classiﬁer and the Mahalanobis–Taguchi system, were evaluated in detecting the faults of rotating machinery. Their performance for fault detection on rotating machinery was evaluated with different imbalanced ratios of data by comparing with binary classiﬁcation models, which included classical versions and imbalanced classiﬁcation versions of support vector machine and random forest algorithms. The experimental results showed the MD-based classiﬁers became more effective than binary classiﬁers in cases in which there were much fewer defect data than normal data, which is often common in the real-world industrial ﬁeld.


Introduction
Recently, in manufacturing industry, there is much interest in smart manufacturing to improve productivity and competitiveness.The smart manufacturing is realized using advanced technologies such as the Internet of Things (IoT), artificial intelligence, and big data analysis [1].Increasingly complex facilities in manufacturing systems need to be monitored and maintained in more sophisticated manners.To this end, the prognostics and health management (PHM) technology is capable of diagnosing or predicting faults by detecting or analyzing the condition of facilities using IoT, machine learning and big data analytics.
In particular, rotating machinery such as industrial motors, aircraft engines, and wind turbines are playing crucial roles in the automation of manufacturing systems.So, the fault detection of rotating machines has a decisive influence on system productivity.Many problems in rotary machines mainly come from the defects of bearing, gear boxes, or shaft deviation.The failure of a rotating machine that transmits power to various facilities results in great economic loss due to the performance degradation or shutdown of the system.
The rotating parts such as bearings often generate abnormal signal data if they have some problems; so, it is possible to diagnose the abnormal conditions by investigating the signal data.This signal data need appropriate preprocessing tasks based on various signal processing techniques, which make the signal data meaningful information that the user desires to analyze accurately and easily.
In this study, vibration data generated from rotating machines were preprocessed by applying appropriate signal processing techniques, and a fault-detection method was developed that can diagnose the abnormality of equipment parts in real time.The vibration data of normal and fault conditions were collected, and data standardization was then performed to compare with the same distribution.Thereafter, the Hamming window technique was applied to segment the vibration signal and a cepstrum technique was also adopted for enhancing the inherent characteristics by eliminating the existing noise.After preprocessing the data, 10 statistical condition indicators (SCIs), such as root mean squared (RMS) and peak-to-peak, were extracted to use for training the machine learning models.The extracted data were finally used to detect abnormal states by using the Mahalanobis distance (MD)-based one-class classification methods.
The MD-based one-class classification methods construct the Mahalanobis space (MS), represented by the MD using only the normal signal data, and then determine whether a new signal sample belongs to the MS or not.On the other hand, typical binary classification methods such as support vector machines (SVM) and random forest (RF) need both normal data and abnormal data to train the models for detecting abnormal condition of the system [2][3][4][5].Unfortunately, in practical industrial systems, the amount of the fault data that can be collected is extremely small.For this reason, it is often difficult to apply typical two-class i.e., binary) classification techniques to construct the fault-detection models in real-life industrial systems.For that reason, in this paper we aimed to analyze the advantages and disadvantages of one-class classification techniques that consider data distribution.In particular, two MD-based classification methods were evaluated.First, the Mahalanobis distance classifier (MDC) used the Mahalanobis space based on MD to detect outliers and, moreover, the Mahalanobis-Taguchi System (MTS) adopted the Taguchi techniques to choose and use only key factors among all the variables.
The performances of the two MD-based classifiers were compared with binary classification methods and their imbalanced classification versions.The experimental results of performance comparison were investigated for the same test data set after training the models with different levels of imbalanced ratios (IRs) between normal and abnormal data in the training data set.
The remainder of this paper is structured as follows.In Section 2, we introduce related studies on the MD-based classification.In Section 3, we present the fault detection based on vibration data with the framework of the research.The signal processing methods, data preprocessing, and fault diagnosis classification models are also described.In Section 4, we compare the performance between one-class classifiers and binary classifiers according to different IRs of the same training data set.Finally, we conclude this paper with future work in Section 5.

Related Work
The MDC defines a normal group and constructs the MS using data from the normal group data [6].A new sample is classified according to how far away it is from the pretrained MS.Meanwhile, Taguchi proposed the MTS method by combining the MD-based classification method and the Taguchi method [7].The Taguchi method is used to extract only effective variables with a large influence on MD estimation.The MTS method has been applied effectively to many fields such as diagnosis, pattern recognition, speech recognition and optimization [8,9].
The MTS technique is generally used for multivariate analysis.There are various studies comparing the performance between MTS and other multivariate analysis techniques.In large-scale samples, the performance of the techniques is similar, and there is a study in which the MTS technique is superior in small samples [10].Moreover, the MTS still has the limitation of choosing optimal factors among all the variables [8,11], and so some studies integrated to MTS a feature selection such as genetic algorithm (GA) [12], particle swarm optimization [13], and ant colony optimization [14].In particular, to improve the MTS process, Chen et al. developed two-stage Mahalanobis classification system (MCS) [15] and the integrated MCS (IMCS) [16].In this paper, we focused on traditional MDC and MTS methods as one-class classifiers to compare their performance with binary classifiers according to the varying imbalanced ratio in detecting the fault of rotating machines based on the preprocessed vibration data.
Meanwhile, in the actual industry fields, there is little well-designed data that have proper quantities of positive samples and negative samples.Therefore, many researchers have studied to solve the imbalanced data set problem.According to [17], the number of published papers that study the imbalance learning is increasing since 2006.In 2016, 118 papers were published, and this is about 17 times the number of papers in 2006.
There are analytical studies to diagnose faults of rotating machines using the MDbased classification technique.Nader [18] used kernel whitening normalization and kernel principal component analysis (KPCA) to get the MD and showed that the techniques can be good choices when the training samples are small or the class is unique.Wei [19] suggested a novel kernel, Mahalanobis, ellipsoidal learning for one-class classification.Bartkowiak [20] used three methods, Parzen kernel density, mixture of Gaussians, and Support Vector Data Description (SVDD) after calculating MD, to find outliers for diagnosis of gearboxes.

Framework
The procedure for developing a fault-detection model that can classify normal and abnormal data is shown in Figure 1.First, the vibration data of normal and abnormal states are collected for analysis.The collected vibration data are subjected to the windowing process.In this process, a continuous signal having a long length is divided into blocks by using the Hamming window function, and the values are set to values near 0 toward the boundary of the window frame.The original signal is then separated from the noise by the cepstrum transform process and the signal is denoised.In this research, 10 SCIs, such as mean, peak-to-peak, and RMS, were used to extract features for classification models.Those indicators are often used to represent the features from time series data in bearing fault-detection problems [21][22][23][24].
Processes 2021, 9, x FOR PEER REVIEW 3 of 15 some studies integrated to MTS a feature selection such as genetic algorithm (GA) [12], particle swarm optimization [13], and ant colony optimization [14].In particular, to improve the MTS process, Chen et al. developed two-stage Mahalanobis classification system (MCS) [15] and the integrated MCS (IMCS) [16].In this paper, we focused on traditional MDC and MTS methods as one-class classifiers to compare their performance with binary classifiers according to the varying imbalanced ratio in detecting the fault of rotating machines based on the preprocessed vibration data.Meanwhile, in the actual industry fields, there is little well-designed data that have proper quantities of positive samples and negative samples.Therefore, many researchers have studied to solve the imbalanced data set problem.According to [17], the number of published papers that study the imbalance learning is increasing since 2006.In 2016, 118 papers were published, and this is about 17 times the number of papers in 2006.
There are analytical studies to diagnose faults of rotating machines using the MDbased classification technique.Nader [18] used kernel whitening normalization and kernel principal component analysis (KPCA) to get the MD and showed that the techniques can be good choices when the training samples are small or the class is unique.Wei [19] suggested a novel kernel, Mahalanobis, ellipsoidal learning for one-class classification.Bartkowiak [20] used three methods, Parzen kernel density, mixture of Gaussians, and Support Vector Data Description (SVDD) after calculating MD, to find outliers for diagnosis of gearboxes.

Framework
The procedure for developing a fault-detection model that can classify normal and abnormal data is shown in Figure 1.First, the vibration data of normal and abnormal states are collected for analysis.The collected vibration data are subjected to the windowing process.In this process, a continuous signal having a long length is divided into blocks by using the Hamming window function, and the values are set to values near 0 toward the boundary of the window frame.The original signal is then separated from the noise by the cepstrum transform process and the signal is denoised.In this research, 10 SCIs, such as mean, peak-to-peak, and RMS, were used to extract features for classification models.Those indicators are often used to represent the features from time series data in bearing fault-detection problems [21][22][23][24].The preprocessed data were split into training and test sets to evaluate the MD-based classification methods.By using the training sets of preprocessed data, two MD-based classification models, MDC and MTS, were constructed as one-class classifiers.They were evaluated by comparing their accuracy with two representative binary classification methods, SVM and RF, and their imbalanced classification versions, cost-sensitive SVM and cost-sensitive RF.Finally, the performances of the developed models were compared in terms of several classification performance measures with the same test sets.

Data Description
In this study, we used the vibration signal data of the ball bearing provided by the Bearing Data Center of Case Western Reserve University [25].It collected the vibration data using the accelerometer in the sensors attached to the rotating machine.The data set contained 12,000 digital signal values per second under the condition of RPM 1750.It consisted of 12,000 continuous vibration values, and the class consisted of a normal state and three abnormal states of system fault, 'Ball', 'Inner race', and 'Outer race'.
In this experiment, we prepared four training data sets according to different imbalance ratios (IR) to compare the performance of one-class classifiers and binary classifiers by mimicking real-life industrial fields, where the fault data are extremely rare.The IR was used to evaluate the imbalance rate of the binary data, which were calculated as in Equation ( 1).The composition of the training data set according to IR is shown in Table 1.The test set consisted of 25 data including 10 normal and 15 abnormal data (five for each of three failure types).

Signal Processing and Data Preprocessing
In this subsection, we describe appropriate signal processing techniques.Signal processing means processing digitized signals by an algorithm for modifying or improving the signal for a specific purpose.In this research, the signal processing, such as standardization, Hamming window, cepstrum transformation, and statistical indicator extraction, were performed to be used for input of training fault-detection models.

Standardization
First, to compare the collected vibration data with the same distribution, standardization was performed using Equation (2).The x i is the vibration value at time i in a signal data, z i is the standardized value of x i .The x and s mean the average and standard deviation of the vibration values x 1 , . . ., x N , respectively, and N is the number of vibration values.

Hamming Window and Cepstrum
The vibration data used in this study were the arbitrarily divided data from a continuous vibration signal.There might have been a discontinuous part, that is, leakage error, which occurred because of arbitrary cutting of time series data.To remove the leakage, the Hamming window function was applied during Fast Fourier Transform (FFT).The window function made signal values near 0 toward the boundary of the window frame.By applying the Hamming window function, the signal periodicity can be ensured and a more accurate spectrum can be obtained from the result of the FFT.The window function is used when multiplying the original signal, as in Equation ( 3), where windowed signal g i is the multiplication of window function h(i) and input signal x i .Figure 2 shows the signal before applying the Hamming window function and the signal data after applying the Hamming window function.
uous vibration signal.There might have been a discontinuous part, that is, leakage error, which occurred because of arbitrary cutting of time series data.To remove the leakage, the Hamming window function was applied during Fast Fourier Transform (FFT).The window function made signal values near 0 toward the boundary of the window frame.By applying the Hamming window function, the signal periodicity can be ensured and a more accurate spectrum can be obtained from the result of the FFT.The window function is used when multiplying the original signal, as in Equation ( 3), where windowed signal   is the multiplication of window function h(i) and input signal   .Figure 2 shows the signal before applying the Hamming window function and the signal data after applying the Hamming window function.The cepstrum transform has the effect of obtaining an enhanced value of the original signal characteristic because it can extract the original signal, that is, the formant, from the noise, as depicted in Figure 3.The spectrum X(f), which is represented in frequency domain, was obtained by applying FFT to the time domain signal, x(t), then making it squared and giving it log function results in Log|X(f)| 2 .Finally, inverse FFT was applied and we could get the result.

Extraction of Statistical Condition Indicators
SCIs are often used to effectively reflect the characteristics of the vibration data that have undergone signal processing [21][22][23][24].Ten SCIs (mean, peak-to-peak, RMS, standard deviation, skewness, kurtosis, crest factor, shape factor, margin factor, and impulse factor) were extracted from the processed vibration data.Table 2 shows the formula of each indicator.Although we could detect the occurrence of faults by observing the changes in the statistical index values, we used them as the features for fault-detection modeling.The 10 SCIs were used as variables in the later classification modeling.The cepstrum transform has the effect of obtaining an enhanced value of the original signal characteristic because it can extract the original signal, that is, the formant, from the noise, as depicted in Figure 3.The spectrum X(f), which is represented in frequency domain, was obtained by applying FFT to the time domain signal, x(t), then making it squared and giving it log function results in Log|X(f)| 2 .Finally, inverse FFT was applied and we could get the result.
window function made signal values near 0 toward the boundary of the window frame By applying the Hamming window function, the signal periodicity can be ensured and more accurate spectrum can be obtained from the result of the FFT.The window functio is used when multiplying the original signal, as in Equation ( 3), where windowed signa   is the multiplication of window function h(i) and input signal   .Figure 2 shows th signal before applying the Hamming window function and the signal data after applyin the Hamming window function.The cepstrum transform has the effect of obtaining an enhanced value of the origina signal characteristic because it can extract the original signal, that is, the formant, from th noise, as depicted in Figure 3.The spectrum X(f), which is represented in frequency do main, was obtained by applying FFT to the time domain signal, x(t), then making squared and giving it log function results in Log|X(f)| 2 .Finally, inverse FFT was applie and we could get the result.

Extraction of Statistical Condition Indicators
SCIs are often used to effectively reflect the characteristics of the vibration data tha have undergone signal processing [21][22][23][24].Ten SCIs (mean, peak-to-peak, RMS, standar deviation, skewness, kurtosis, crest factor, shape factor, margin factor, and impulse factor were extracted from the processed vibration data.Table 2 shows the formula of each in dicator.Although we could detect the occurrence of faults by observing the changes i the statistical index values, we used them as the features for fault-detection modeling.Th 10 SCIs were used as variables in the later classification modeling.

Extraction of Statistical Condition Indicators
SCIs are often used to effectively reflect the characteristics of the vibration data that have undergone signal processing [21][22][23][24].Ten SCIs (mean, peak-to-peak, RMS, standard deviation, skewness, kurtosis, crest factor, shape factor, margin factor, and impulse factor) were extracted from the processed vibration data.Table 2 shows the formula of each indicator.Although we could detect the occurrence of faults by observing the changes in the statistical index values, we used them as the features for fault-detection modeling.The 10 SCIs were used as variables in the later classification modeling.

Fault Detection Using Mahalanobis Distance
To detect faults using the preprocessed data, we first used MDC, which is a MDbased classification technique.The technique uses MD as a comprehensive measure and constructs the MS using the MD of the normal signal group.The MD value of a signal will be used to distinguish normal and abnormal groups.In addition, the MTS method uses the Taguchi method to select only the important variables that have a major effect on the MD value and proceeds with the same procedure as MDC using only these important variables.The MTS method consists of four steps.Step 1 and step 2 are the classification procedure of MDC, and step 3 and step 4, including the Taguchi method, are the additional procedure for MTS.

Indicator
Formula Indicator Formula Step 1: Constructing the MS with Normal Data First, the normal and abnormal groups are distinguished from each other.MS is constructed using the normal data of the data set, which are denoted as shown in Table 3.The MS is a multi-dimensional unit space that is characterized with MD of the normal group.The MD is calculated through the three steps below.

Standardization of normal data
The mean of the pth feature, x p , and the standard deviation, s p , are first calculated from the feature data of the normal group, X p = (x pj ) for j = 1 . . .n.The pth feature value of the jth sample, x pj , is standardized to z pj as follows:

Calculation of the correlation matrix
The correlation matrix R for the standardized data of the normal group is obtained.The correlation coefficient between two variables, r pq , in the correlation matrix R is calculated as follows. 3.

Calculation of the MD of normal data
The MD of the jth normal data, MD j , is calculated in Equation (7).The MD j is often called the scaled Mahalanobis distance since it was divided by the number of variables, k.
where Z j = z 1j , . . ., z kj T is the standardized vector of the jth variable and R −1 is the inverse of the correlation matrix.If the normal data are collected well, their MD values will have a value close to 1, since the average of MD j is statistically 1.The MS constructed from the MD values in this way is called a unit space.
In this study, we prepared four training data sets with different IRs, as presented in Table 1, which, therefore, constructed different MS from their normal data.Table A1 in Appendix A shows the SCI values of 20 normal data in the training data that were used to construct its MS, and Table A2 in Appendix A shows the standardized data of the SCI values.From the standardized SCI values, the correlation matrix can be calculated as shown in Table 4. Finally, the final MD values of the normal data were calculated, as presented in Table 5a.The values were distributed well around 1. Note that the transformation to MD made the resulting distribution have the mean value of 1.

Step 2: MD Calculation of Abnormal Data and Validation of MS
To check the validity of the MS derived from Step 1, the MD values of abnormal data in the training data set were tested.Mean x p , standard deviation s p , and the correlation matrix R, which were obtained from the normal data in Step 1, were used again to calculate the MD of the abnormal data.If the MS is properly constructed from the normal data, the MD values of the abnormal data will have much larger values than the mean value of the normal group (i.e., 1).
The MD-based classifier for abnormality detection will decide a new data to abnormal if its MD is greater than a predefined threshold.The threshold can be set by comparing the MD values between normal data and abnormal data when they have enough abnormal data.However, it is not proper in one-class classification problems, assuming the small number of abnormal data.In this case, we set the MD threshold based on the chi-square value of a specific confidence interval (e.g., CI = 99%) because the MD was known to follow the chi-square distribution where the degree of freedom (df) is the number of variables [26].

Standardization of abnormal data
Abnormal data were prepared and denoted, as shown in Table 6.An abnormal data y pj was standardized to w pj by using mean x i and standard deviation s i of the normal data.

Calculation of the MD of abnormal data
The MD values of the abnormal data were calculated by using the correlation matrix R of the normal data.W j = w 1j , . . ., w kj T was the standardized vector of the jth variable for the abnormal data. 3.

Validation of the MS
The MD values of the abnormal data are shown in Table 5b.The minimum value of the MD was 2269.5, and all the distance values were very far from the origin.Therefore, it can be said that the MS of the normal data was constructed successfully.Now, the MS prepared in Step 2 can be used as the MDC classifier.Using the validated MS, MDC will classify a new data by comparing its MD with the specific threshold.Suppose that the threshold is set as the chi-square value χ 2 (10, 0.01) = 23.2 of CI = 99% and df = 10 since the number of variables is 10.If the MD value of a new data is greater than the threshold, the data are classified to an abnormal state, and otherwise, to a normal state.

Step 3: Important Variable Selection (Taguchi Method of MTS)
In Steps 3 and 4, the MTS extracts key variables through the Taguchi method and carries out the classification procedure by calculating the MD values in the same manner as Steps 1 and 2.
In addition to the classification procedure of MDC, MTS removes the variables that have no or little effect on the MD values and chooses the key variables.By constructing MS using only the key variables, the system can be easily interpreted, and the classification performance can also be better.In MTS, the Taguchi method is adopted for selecting key variables [27].The Taguchi method uses the signal-to-noise ratio (SN ratio) as a criterion for determining the degree of influence on the MD values.SN ratio in quality engineering is a measure for evaluating system robustness; however, in the MTS, it is used as a measure to select important parameters for pattern recognition.The formula of the SN ratio for the larger-the-better characteristics is as follows.
To calculate the SN ratio, the experiment was planned with an appropriate two-level orthogonal array.One should choose an orthogonal array that has a greater number of columns than the number of the variables used in the experiment.Since the number of variables used in this study was 10, we conducted the experiment with L 12 2 11 as the minimum two-level orthogonal array.Specifically, we used the Plackett-Burman L 12 2 11 , as presented in Table 7, so that the interaction effects among features could be uniformly diluted into all the columns of the arrays, as suggested by Dr. Taguchi [27].Level 1 of the orthogonal array means that the corresponding variable was used, and level 2 means that the variable was not used.As shown in Table 7, the MD values of the normal and abnormal groups were calculated with 12 experimental conditions at each row.The SN ratio was then calculated using the SN ratio formula in Equation (10).
Next, we calculated the gain using the difference of average between the case where the variables were used and the case where the variables were not used.The gain of the SN ratio was calculated as follows.
Table 8 shows the results of calculating the gain of the SN ratio for each feature.If a feature has negative gain of SN ratio, the feature will be excluded from the feature set of MTS since the significance of the feature is low.If the gain of a feature is positive, the feature is selected as the key variable that has a significant effect on the MD value calculation.As shown in Table 8, seven features were selected as the key variables of MTS, excluding peak-to-peak, root mean squared, and crest factor features.Now, a new MS was constructed by using the seven features determined in Step 3, and then the classification procedure of Step 1 and Step 2 was conducted again.The threshold of determining the class was adjusted to the chi-square value, i.e., χ 2 (7,0.01)= 18.5 of CI = 99% and df = 7, because the number of selected important features was seven.

Fault Detection Based on Machine Learning Methods
In this subsection, we describe briefly how to classify normal and abnormal states by using binary classification machine learning methods.The models were developed by using both normal and abnormal data, and then they were used to distinguish whether a new data sample was a normal or abnormal state.
The four data sets shown in Table 1 were used for training two machine learning algorithms, SVM and RF, which are known to show convincible classification performance.In this study, as well as conventional SVM and RF, the imbalanced classification versions of SVM and RF were also tested because three of the four training data sets contained the small number of abnormal (fault) data, which is similar to real-life industrial field conditions.Specifically, cost-sensitive SVM (CS_SVM) and cost-sensitive RF (CS_RF) were used for the imbalanced classification algorithms.They adjust their class weights and make the training better.Parameter tuning of four machine learning methods was performed by using the grid search method under 3-fold cross validation.

Classifiers and Datasets
To evaluate the proposed method, the classification performances using two MDbased one-class classification methods, MDC and MTS, were compared with those of four binary classification machine learning methods, which included classical versions of SVM and RF and their imbalanced classification versions, CS_SVM and CS_RF.
As shown in Table 1, the training data were constructed differently according to the imbalance ratio to investigate the change of binary classification methods.Note that MD and MS use only normal data for training because they are used as one-class classification methods.Twenty-five test data (10 normal and 15 abnormal) were used to compare the performance among all the classification models.

Experimental Results
As described in Section 3, one-class classifiers, MDC and MTS, classify a new sample based on the predefined threshold.We considered that the one-class classifiers do not know abnormal data and, so, the threshold was set according to confidence interval.In this research, the MDC used 10 features, and then the threshold was set to χ 2 (10,0.1)= 16.0 for CI = 90%, χ 2 (10,0.05)= 18.3 for CI = 95%, or χ 2 (10,0.01)= 23.2 for CI = 99%.Additionally, the MTS in this research used only seven important features and, therefore, the threshold was χ 2 (7,0.1)= 12.0 of CI = 90%, χ 2 (7,0.05)= 14.1 of CI = 95%, or χ 2 (7,0.01)= 18.5 of CI = 99%.Table 9 shows the MD values of normal and abnormal data in test data set that were calculated by MDC and MTS.All normal data except for sample #7 were classified by MDC and MTS to normal because their MDs were less than the threshold.However, MDC misclassified sample #7 to abnormal because MD 7 = 26.843was a little greater than χ 2 (10,0.01)= 23.2.The classification performances of MDC and MTS were compared with those of balanced and imbalanced binary classifiers of SVM and RF in terms of four measures such as accuracy, balanced accuracy, F-score, and G-mean.The last three measures are often used for imbalanced classification.
As shown in Table 10 and Figure 4, MTS had perfect accuracy, while MDC had the F-score of 0.968 and the G-mean 0.949 because the normal sample #7 was misclassified.Note that MTS and MDC always had the same performance regardless of any of the four training sets since they used only 20 normal data.

Conclusions and Future Work
In this study, we evaluated two MD-based one-class classification methods, MDC and MTS, for fault detection of rotating machines using vibration data.To use the vibration data for analysis, they were preprocessed by applying signal processing techniques such as the Hamming window and the cepstrum transformation.Moreover, 10 SCIs such as mean, standard deviation, peak-to-peak, and RMS were extracted and used as input variables for model training.To obtain meaningful results in the real-life industrial field where there are very few fault (abnormal) data compared with normal data, MDC and MTS were compared with the binary classification methods of training the data sets with different IRs.
We focused on one-class classification methods using MD because they do not need any abnormal data in training models.The two MD-based classifiers were compared with balanced and imbalanced binary classification algorithms such as SVM and RF.In the experiments, there was a tendency that the classification performances of the binary classification models were highly degraded as the number of abnormal data in the training set decreased.As a result, MDC and MTS showed much better performance than binary classifiers in the case of small amounts of abnormal training data.
The experiments are significant in that most working industrial systems in real fields rarely have fault data because they often stop the system before the occurrence of fault.Although the collection of fault data is possible, it needs a long time or high cost.This means that one-class classifiers are generally more useful in terms of cost, time, and effort if they can work with acceptable performance.
In addition, between MD-based classifiers, the MTS that selects only key variables through the Taguchi method can be useful in an actual operation environment since the small number of features are easily interpretable, as well as being fast and convenient to apply to the applications.In our experiment, MTS was robust enough to show better performance than MDC.In the case of IR = 1.0, all of the four machine learning-based classifiers also showed perfect performance since there were enough abnormal data in the training data.However, as the number of abnormal data in training sets became smaller, which meant IR was higher, the overall classification performances turned lower.In the case of IR = 2.222 and IR = 3.333, CS_SVM showed similar performance to MDC and less than MTS, but SVM, RF, and CS_RF had lower performances.When IR became 6.667, all the binary classification methods had much lower performances than MDC and MTS.
Comparing two MD-based classifiers, MTS had better performance than MDC.Moreover, MTS can be said to be robust since it could be applied with smaller significant features in our experiments.So, the model can easily be interpreted with the small number of features in real-life industrial systems by using the important SCIs.

Conclusions and Future Work
In this study, we evaluated two MD-based one-class classification methods, MDC and MTS, for fault detection of rotating machines using vibration data.To use the vibration data for analysis, they were preprocessed by applying signal processing techniques such as the Hamming window and the cepstrum transformation.Moreover, 10 SCIs such as mean, standard deviation, peak-to-peak, and RMS were extracted and used as input variables for model training.To obtain meaningful results in the real-life industrial field where there are very few fault (abnormal) data compared with normal data, MDC and MTS were compared with the binary classification methods of training the data sets with different IRs.
We focused on one-class classification methods using MD because they do not need any abnormal data in training models.The two MD-based classifiers were compared with balanced and imbalanced binary classification algorithms such as SVM and RF.In the experiments, there was a tendency that the classification performances of the binary classification models were highly degraded as the number of abnormal data in the training set decreased.As a result, MDC and MTS showed much better performance than binary classifiers in the case of small amounts of abnormal training data.
The experiments are significant in that most working industrial systems in real fields rarely have fault data because they often stop the system before the occurrence of fault.Although the collection of fault data is possible, it needs a long time or high cost.This means that one-class classifiers are generally more useful in terms of cost, time, and effort if they can work with acceptable performance.
In addition, between MD-based classifiers, the MTS that selects only key variables through the Taguchi method can be useful in an actual operation environment since the small number of features are easily interpretable, as well as being fast and convenient to apply to the applications.In our experiment, MTS was robust enough to show better performance than MDC.

Figure 1 .
Figure 1.Procedure for fault-detection evaluation in this research.

Figure 1 .
Figure 1.Procedure for fault-detection evaluation in this research.

Figure 2 .
Figure 2. The vibration data (a) before and (b) after applying the Hamming window function.

Figure 2 .
Figure 2. The vibration data (a) before and (b) after applying the Hamming window function.

Figure 2 .
Figure 2. The vibration data (a) before and (b) after applying the Hamming window function.

Table 1 .
Data set configuration according to the imbalance ratio (IR).MDC and MTS use only normal data for training, while binary classification methods use both normal and abnormal data.

Table 3 .
Data schema of normal data.

Table 4 .
Correlation matrix between standardized SCI for the normal data in training set.

Table 5 .
Mahalanobis distances of training data.

Table 6 .
Data schema of abnormal group data.

Table 8 .
Gain of SN ratio for each feature.

Table 9 .
Mahalanobis distances of test data.

Table 10 .
Performance for the test set of classifiers trained with different training sets.

Table A1 .
SCI values extracted from the normal data in the training set.

Table A2 .
Standardized SCI values of the normal data in the training set.