Sub-Health Identiﬁcation of Reciprocating Machinery Based on Sound Feature and OOD Detection

: It is inevitable that machine parts will be worn down in production, causing other mechanical failures. With the appearance of wearing, the accuracy and efﬁciency of machinery gradually decline. The state between healthy and impaired is deﬁned as sub-health. By recognizing the sub-health state of machinery, accuracy and efﬁciency can be effectively guaranteed, and the occurrence of mechanical failure can be prevented. Compared with simple fault detection, the identiﬁcation of s sub-health state has more practical signiﬁcance. For this reason, the sound characteristics of large-scale reciprocating machinery, combined with the concept of OOD (out-of-distribution) detection, are used, and a model for detecting machinery sub-health state is proposed. A planer sound dataset was collected and collated, and the recognition of mechanical sub-health state was realized by a model combining a VGG network and the threshold setting scheme of OOD detection. Finally, an auxiliary decision-making module was added, and Mahalanobis distance was used to represent spatial relationships among samples, further improving the recognition effect.


Introduction
Machinery is an indispensably essential part of society, and it has been since the industrial revolution. Mechanical sub-health state is a transition state between a normal state and a damage state. The standard of mechanical sub-health is manifested as decreasing accuracy and efficiency, or an increasing defective rate, but it does not meet the criteria for mechanical failure. The sub-health state of machinery is not fixed, and it can be changed according to actual needs; its identification can ensure that a given piece of machinery is working in its best state. There are obvious differences between a fault sound and a normal sound. Thus, in order to identify the sub-health state of machinery, ascertaining whether a machine is faulty or not should be considered first; only then can its sub-health state be established. In recent years, the intelligent fault diagnosis technology of deep learning has becoming increasingly mature, and the efficiency of fault diagnosis has continuously improved. Currently, mainstream methods use vibration sensors to collect one-dimensional vibration signals, which are then converted into two-dimensional signals and used through a neural network to diagnose the state of a given machine [1]. However, heavy machinery has great stability, as well as local vibration that a low impact. Due to this, some main working parts are inconvenient objects on which to install vibration sensors, leading to the recognition effects being imprecise.
More recent fault diagnoses mainly use vibration signal characteristics. For example, W. Zhang et al. [2] used a novel method named deep convolutional neural networks with wide first-layer kernels (WDCNN) for fault diagnosis. Q. Hang et al. [3] used highdimensional imbalance data to diagnose rolling bearings. W. Sun et al. [4] proposed a motor fault detection method based on a sparse autoencoder. In early research, sound detection technology and sound source localization were initially used in military applications, such as sonar exploration, detection, and localization. With developments in technology, sound detection technology has gradually been applied in various fields. For example, Yiallourides and Naylor [5] used knee joint sound time-frequency analysis for the noninvasive detection of osteoarthritis. Das et al. [6] used acoustic features by unsupervised learning for heart sound event detection. In addition, Bayram [7] used sequence autoencoders to detect anomalies in industrial processes, while Liu [8] also detected healthy broilers by using abnormal sounds. Tran and Lundgren [9] proposed a drill fault diagnosis method based on an intelligent acoustic signals scale spectrum and the Mel spectrum. Wang et al. [10] used sound signals to detect and locate pipelines leaks. Volkmann et al. [11] and Hou et al. [12] used sound signals to detect the injury on cows' feet and the longitudinal tears of a conveyor belt, respectively. Further, Ramteke et al. [13] outlined a fault diagnosis method for a diesel engine cylinder liner wear fault based on vibration and acoustic emission analysis. There are more others who have theorized equipment health state condition monitoring and fault detection using acoustic signals. Common mechanical fault diagnosis mainly focuses on rotating machinery intelligent diagnosis, such as bearings and gearboxes, and the majority of these datasets were collected on analog equipment. Due to the fact that such equipment has great lubrication, and because its sound features are not obvious, the results of vibration signals are better than sound signals. However, in large-scale and heavy-duty reciprocating equipment, due to the technical requirements for good stability, the vibration effect of key working parts is not obvious, and the sound is relatively loud. In light of this condition, the use of sound characteristics can achieve more comprehensive results.
Under normal circumstances, if a convolutional neural network is used to identify a mechanical state, a large number of normal and abnormal sound samples are needed. In real work, normal sounds are often smooth and regular, and they are accompanied by an obvious abnormal sound when a machine breaks down; however, it is difficult for us to collect abnormal data without deliberately damaging machinery. In 2018, Dan Hendrycks et al. [14] proposed a novel OOD detection baseline with deep learning based on abnormal detection. The model used positive samples as in-distribution data by training the positive samples, and the in-distribution data and out-of-distribution data could be clearly distinguished; this approach is suitable for actual situations related to mechanical fault detection. Moreover, Liang et al. [15] improved the baseline and proposed an ODIN (Out-ofdistribution detector for Neural networks) model, while Devries et al. [16], Shalev et al. [17], Denouden et al. [18], Abdelzad et al. [19], and others improved the baseline in different directions, thus improving detection efficacy.
Based on the above description, we propose the use of deep convolutional neural networks to extract sound features, and we use OOD detection technology to recognize mechanical sub-health state with normal sample data as input. The remaining part of this paper includes four sections: Section 2 introduces the dataset collected and organized by us. Section 3 introduces our experimental method. Section 4 uses three interrelated experiments to prove that using sound features and OOD detection can greatly identify mechanical sub-health state. Finally, a conclusion is drawn in Section 5.
The contributions of this paper can be summarized as follows: (1) A kind of mechanical running state is defined, which is called a mechanical sub-health state, and it has a positive effect on maintaining machining accuracy and preventing mechanical damage in real work; (2) We prove that the working sound of heavy machinery can identify the state of a machine, as well as the enhancement effect of OOD detection on the adaptability and recognition accuracy of our model. Since there are no similar public data, a planer sound dataset was collected and collated by us; (3) A baseline model for identifying mechanical sub-health states is proposed. Then, the performance of the basic model is improved by adding an auxiliary decision module and using Mahalanobis distance to represent the distances between samples.

Dataset
The dataset we collected is the sound of a traditional mechanical shaping machine when it is working, including the intact sound/sub-healthy sound made by machining four materials in six gears. Each recording is a single channel audio of 10 s, which includes the sounds of the machine, its related equipment, and environmental noise. The four materials are: Cold-rolled carbon steel.

Recording Process
We use a square microphone array composed of four different microphones to collect sound. The distribution of the microphone array is shown in Figure 1. By using a microphone array, single-channel and multi-channel methods can be evaluated. In order to simplify the task, only the first channel record in the multichannel is used, and multichannel recording will be used in future research. The microphone array was kept at a distance of 40 cm from the machine and recorded a 10 s sound clip. In addition, each machine sound was recorded in a separate session. In the running state, the sound of the machine was recorded as a 16-bit audio signal sampled at 16 kHz in the reverberation environment.
The dataset we collected is the sound of a traditional mech when it is working, including the intact sound/sub-healthy soun four materials in six gears. Each recording is a single channel audio the sounds of the machine, its related equipment, and environme terials are: Cold-rolled carbon steel.

Recording Process
We use a square microphone array composed of four differen sound. The distribution of the microphone array is shown in Figu phone array, single-channel and multi-channel methods can be ev plify the task, only the first channel record in the multichannel is recording will be used in future research. The microphone array w 40 cm from the machine and recorded a 10 s sound clip. In additio was recorded in a separate session. In the running state, the sou recorded as a 16-bit audio signal sampled at 16 kHz in the reverbe

Introduction to Datasets
As shown in Figure 2, the dataset contains the working sou six gear; each part has a complete training and testing set. Train data and sub-health data, while test data include normal data and part, this dataset provides: (i) intact status data recorded with a ne 50 normal sounds from the source domain used for training), as sub-health sample data that were recorded using cutters that were

Introduction to Datasets
As shown in Figure 2, the dataset contains the working sounds of four materials in six gear; each part has a complete training and testing set. Training data include intact data and sub-health data, while test data include normal data and abnormal data. For each part, this dataset provides: (i) intact status data recorded with a new cutter (clips of about 50 normal sounds from the source domain used for training), as shown in Figure 3; (ii) sub-health sample data that were recorded using cutters that were operating continuously for half a year, including 50 sub-health status sounds for training; (iii) abnormal samples data, which were composed of 30 normal sounds with Gaussian distribution and random noise generated by uniform distribution; and (iv) normal sample data, which were about 15 intact and sub-health sample data in the target domain, as shown in Figure 4. In conclusion, each part contained 160 different sound samples, a total of 960 sample data for each material, and 3840 samples were included in the original dataset.
Machines 2021, 9, x FOR PEER REVIEW 15 intact and sub-health sample data in the target domain, as shown in Figure 4 clusion, each part contained 160 different sound samples, a total of 960 sample each material, and 3840 samples were included in the original dataset.    15 intact and sub-health sample data in the target domain, as shown in Figure 4. In conclusion, each part contained 160 different sound samples, a total of 960 sample data for each material, and 3840 samples were included in the original dataset.

Pre-Process and Feature Extration
Pre-processing was divided into four steps. The first step was t input digital voice signal in order to emphasize and increase the role the signal and remove the role of influences such as "lip radiation". high-pass digital filter is used to achieve pre-weighting generally, a tion is The α is the pre-weighting coefficient with a range of 0.9 < value of speech sampling at time n , and ) 1 ( adding and re-processing, where its value is The second step was sub-frame. In order to make the transi smooth and maintain its continuity, we adopted an overlapping se The third step was to add a window and multiply each frame by th to increase the continuity between the left and right ends of the fram after framing is ; then, after obtaining the is as follows:

Pre-Process and Feature Extration
Pre-processing was divided into four steps. The first step was to pre-emphasize the input digital voice signal in order to emphasize and increase the role of high frequency in the signal and remove the role of influences such as "lip radiation". The first-order FIR of high-pass digital filter is used to achieve pre-weighting generally, and the transfer function is The α is the pre-weighting coefficient with a range of 0.9 < α < 1.0. x(n) is the value of speech sampling at time n, and y(n) = x(n) − αx(n − 1) is the result after pre-adding and re-processing, where its value is α = 0.98.
The second step was sub-frame. In order to make the transition between frames smooth and maintain its continuity, we adopted an overlapping segmentation method. The third step was to add a window and multiply each frame by the Hamming window to increase the continuity between the left and right ends of the frame. Suppose the signal after framing is S(n), n = 0, 1, · · · , N − 1, N; then, after obtaining the Hamming window S (n) = S(n) × W(n), the form W(n) is as follows: Different a value will produce different Hamming windows, so we take a value of 0.46. The fourth step was fast Fourier transform. The time-domain diagram of the data is shown above. It is not easy to ascertain the characteristics of the signal from the diagram. The general method is to perform fast Fourier transform on each frame to attain the energy distribution in the spectrum. Different distributions represent different characteristics.
Finally, the feature collection work was carried out. The feature extraction method we used is currently the most commonly used log-Mel filter in audio processing. The principle of log-Mel is to simulate the structure of the human ear and filter sound. For two sound signals of different loudness, the treble is masked by the bass. Fourier transformation is a step of the entire model, and its purpose is to obtain the energy distribution on the frequency spectrum. Signal energy is used as its basic feature, and signal processing can be used as the output feature. This characteristic is not affected by the nature of the signal, and the corresponding characteristics can be obtained regardless of whether it is treble or bass. This feature has a better recognition effect when the signal-to-noise ratio is low. Figure 5 shows the log-Mel spectrogram of each material we enumerated.
Machines 2021, 9, x FOR PEER REVIEW 6 of 14 can be used as the output feature. This characteristic is not affected by the nature of the signal, and the corresponding characteristics can be obtained regardless of whether it is treble or bass. This feature has a better recognition effect when the signal-to-noise ratio is low. Figure 5 shows the log-Mel spectrogram of each material we enumerated.

The Proposed Model
A simple sub-health data recognition model was established by two convolutional neural networks (the flowchart is shown in Figure 6).

The Proposed Model
A simple sub-health data recognition model was established by two convolutional neural networks (the flowchart is shown in Figure 6).
In the model, CNN-1 and CNN-2 are exactly the same two VGG16 networks (the network structure is shown in Figure 7). VGG16 [20] contains 13 convolutional layers, 3 fully connected layers, and 5 pool layers. Among them, the convolutional layer and the fully connected layer have weight coefficients, so they are also called weighted layers, and the total number is 16. The convolutional layers all use the same convolution kernel parameters. The size of the convolution kernel used by the convolutional layer (kernel size) is 3; in other words, the width and height are 3, and 3 × 3 is small. The size of the convolution kernel, combined with other parameters (stride = 1, padding = same), enables  Figure 6. Illustration of the framework. The output result of the test is two cases, intact and subhealthy, and the result clearly classifies the input data into one of these two categories.
In the model, CNN-1 and CNN-2 are exactly the same two VGG16 networks (the network structure is shown in Figure 7). VGG16 [20] contains 13 convolutional layers, 3 fully connected layers, and 5 pool layers. Among them, the convolutional layer and the fully connected layer have weight coefficients, so they are also called weighted layers, and the total number is 16. The convolutional layers all use the same convolution kernel parameters. The size of the convolution kernel used by the convolutional layer (kernel size) is 3; in other words, , the width and height are 3, and 3 * 3 is small. The size of the convolution kernel, combined with other parameters (stride = 1, padding = same), enables each convolutional layer (tensor) to maintain the same width and height as the previous layer (tensor).  Illustration of the framework. The output result of the test is two cases, intact and sub-healthy, and the result clearly classifies the input data into one of these two categories. Figure 6. Illustration of the framework. The output result of the test is two cases, intact and subhealthy, and the result clearly classifies the input data into one of these two categories.
In the model, CNN-1 and CNN-2 are exactly the same two VGG16 networks (the network structure is shown in Figure 7). VGG16 [20] contains 13 convolutional layers, 3 fully connected layers, and 5 pool layers. Among them, the convolutional layer and the fully connected layer have weight coefficients, so they are also called weighted layers, and the total number is 16. The convolutional layers all use the same convolution kernel parameters. The size of the convolution kernel used by the convolutional layer (kernel size) is 3; in other words, , the width and height are 3, and 3 * 3 is small. The size of the convolution kernel, combined with other parameters (stride = 1, padding = same), enables each convolutional layer (tensor) to maintain the same width and height as the previous layer (tensor).  In training process, intact data are used not only to train the CNN-1 model, but also to train the CNN-2 model, together with sub-health data. The input data in testing would be entered from CNN-2. If they are the normal data that would be entered into the CNN-1 network, it is necessary to determine whether they are sub-health data.

OOD Detection Principle and Fusion
Hawkins found that, in the actual classification task, many instances of high-confidence prediction resulted in errors. If the classifier cannot advise whenever this error occurs, (1) The model can provide a high softmax probability in the OOD samples, and it also has some misclassified samples; in addition, probability cannot directly represent confidence; (2) Currently classified samples have a higher softmax probability than misclassified samples and OOD samples. Therefore, the model shows a difference in softmax prediction probability distribution between the normal classification samples and the OOD samples. By selecting the appropriate threshold, the ID and OOD samples can be distinguished effectively. We incorporate this idea into our model [21][22][23][24][25]. According to the method proposed by OOD detection, we obtained the values of AUPR Succ and AUPR Err (as shown in Table 1). The large difference between AUPR Succ and AUPR Err in the two networks indicates that the predicted threshold score can be set to detect whether a sample is in the correct range; Wilcoxon rank sum test was used to verify this conclusion. Table 1. AUPR is the area under the precision recall curve, reflecting the relationship between precision and recall. Examples of correct classification are treated as positive classes, denoted as Succ; the misclassified example is treated as a positive class, denoted as Err. The "Base" value is obtained using a random detector. Finally, we set the output vector of the neural network as z 1 , z 2 , · · · , z i ; then, the resulting expression after calculation by the softmax layer was as follows:

CNN
If the threshold is set to Q (1 > Q > 0.5), and the probability of S 1 predicted classification obtained through the VGG network is P, the final classification result is 1 0 P ≥ Q P < Q , where 1 represents in-distribution data and 0 represents OOD data.

Further Improvements to the Model
In order to achieve better results, we made two more improvements to the base model, which resulted in our final model. The two further improvements are as follows: Improvement 1: A variational autoencoder auxiliary module added to the network structure (as shown in Figure 8  An autoencoder is an unsupervised learning algorithm, which is mainly used for data dimension reduction or feature extraction (the structure is shown in Figure 9). The encoder part creates a hidden layer (or multiple hidden layers) containing low dimensional vectors of input data features. The decoder reconstructs the input data through the low dimensional vector of the hidden layer [26]. As we all know, it is easier to distinguish normal samples from abnormal samples in a high-dimensional space, so the output of the decoder greatly improves the decider. The variable autoencoder assumes that the hidden layer after neural network coding is a standard Gaussian distribution; it then samples a feature from the distribution and decodes that feature. The expected result is the same as that of the original input. The loss is almost the same as that of autoencoder, but the reg- Improvement 2: Using Mahalanobis distance to measure the distance between a sample and training data in a manifold space.
An autoencoder is an unsupervised learning algorithm, which is mainly used for data dimension reduction or feature extraction (the structure is shown in Figure 9). The encoder part creates a hidden layer (or multiple hidden layers) containing low dimensional vectors of input data features. The decoder reconstructs the input data through the low dimensional vector of the hidden layer [26]. As we all know, it is easier to distinguish normal samples from abnormal samples in a high-dimensional space, so the output of the decoder greatly improves the decider. The variable autoencoder assumes that the hidden layer after neural network coding is a standard Gaussian distribution; it then samples a feature from the distribution and decodes that feature. The expected result is the same as that of the original input. The loss is almost the same as that of autoencoder, but the regularization term of KL divergence of coding inference distribution and standard Gaussian distribution is increased. The variable autoencoder generates a potential probability distribution p(z|x ) for each input x, and then randomly samples from the distribution to obtain a continuous and complete potential space, which solves the problem that the autoencoder cannot be used to generate [27][28][29][30][31][32]. An autoencoder is an unsupervised learning algorithm, which is mainly used for data dimension reduction or feature extraction (the structure is shown in Figure 9). The encoder part creates a hidden layer (or multiple hidden layers) containing low dimensional vectors of input data features. The decoder reconstructs the input data through the low dimensional vector of the hidden layer [26]. As we all know, it is easier to distinguish normal samples from abnormal samples in a high-dimensional space, so the output of the decoder greatly improves the decider. The variable autoencoder assumes that the hidden layer after neural network coding is a standard Gaussian distribution; it then samples a feature from the distribution and decodes that feature. The expected result is the same as that of the original input. The loss is almost the same as that of autoencoder, but the regularization term of KL divergence of coding inference distribution and standard Gaussian distribution is increased. The variable autoencoder generates a potential probability distribution ) ( x z p for each input x , and then randomly samples from the distribution to obtain a continuous and complete potential space, which solves the problem that the autoencoder cannot be used to generate [27][28][29][30][31][32]. In general, a variable autoencoder is used to add constraints to an encoder, i.e., to force it to produce potential variables that obey the units of Gaussian distribution. One of its advantages is its ability to directly compare differences between reconstructed data and original data, which can play a decisive auxiliary role in our convolutional neural network. Mahalanobis distance is an effective way to calculate the similarity of two unknown samples as it can measure the distance between points and a distribution [33]. Therefore, we use it to measure the distance between sample x and the ID training data in the manifold space [34][35][36]: (4) μ and  ∧ are the value of mean and covariance matrices of the multivariate Gaussian distribution. Mahalanobis distance is a constant scale and can also consider the relationship between different dimensions. Finally, using reconstruction error and Mahalanobis distance to detect OOD samples, we find the following: In general, a variable autoencoder is used to add constraints to an encoder, i.e., to force it to produce potential variables that obey the units of Gaussian distribution. One of its advantages is its ability to directly compare differences between reconstructed data and original data, which can play a decisive auxiliary role in our convolutional neural network.
Mahalanobis distance is an effective way to calculate the similarity of two unknown samples as it can measure the distance between points and a distribution [33]. Therefore, we use it to measure the distance between sample x and the ID training data in the manifold space [34][35][36]: µ and∑ are the value of mean and covariance matrices of the multivariate Gaussian distribution. Mahalanobis distance is a constant scale and can also consider the relationship between different dimensions. Finally, using reconstruction error and Mahalanobis distance to detect OOD samples, we find the following:

Parameter Introduction
There are several fixed detection indicators for OOD detection: true positive rate (TPR), TP, and FN represent true positive and false negative, respectively False positive rate (FPR) is calculated in Equation (7), where, FP and TN indicate false positive and true negative, respectively.
Throughout the experiment, we set two core indicators to measure the performance of our model. These two indicators are AUC value and pAUC value. Area under curve (AUC) represents the ability of the model to distinguish between positive and negative samples, and its value is between 0 and 1. The larger the AUC, the better the performance. pAUC is calculated from a part of the ROC curve within a predetermined range. In our measurements, pAUC was calculated as the AUC with a low false positive rate (FPR) in the range of [0, 0.1]. Thus, pAUC is important for stopping a system from sending out false alarms-it is not trusted, like "The boy who cried wolf"-so we added it into consideration.

Results and Discussion
In order to show the effect of OOD detection, we first obtained a group of experimental results that were not integrated into the principle of OOD detection, as shown in Table 2. The experimental results are shown in Table 2. The average AUCs of the two neural networks in the baseline model were 74.11% and 70.32%, which preliminarily realized the identification of mechanical diagnosis and sub-health state, respectively. Therefore, we prove that it is feasible to judge a mechanical state using sound characteristics when a machine is working.
The experimental results with the addition of the OOD detection method are shown in Table 3. OOD detection only needs to train a network within the distributed data to achieve a suitable classification performance. Moreover, OOD detection is also suitable for detecting mechanical diagnosis with more normal data and fewer abnormal data. The experimental results show that the improved model is significantly better than the baseline system. Additionally, the average AUCs are 80.95% and 75.64%, respectively, which is 6.84% and 5.32% points higher than the baseline system.
Experimental results of the improved model are shown in Table 4. According to the experimental results, we know that the average AUC of our final model reached 84.22% and 79.20%, which increased by 3.27% and 3.56%, respectively, compared with the previous improvement. The TPR and FPR values of the three experiments are summarized in Table 5. It can be seen from Figure 10 that the model proposed in this paper can effectively identify different states in the dataset by the mechanical sound features, so as to realize the recognition of a mechanical sub-health state; each improvement improves the recognition effect of the model. In addition, it can also be seen that there are obvious differences between the effects of different materials, mainly because of the different hardness and smoothness of the materials, resulting in different degrees of sound characteristics.   The experimental results show that the improved model is significantly better than the baseline system. Additionally, the average AUCs are 80.95% and 75.64%, respectively, which is 6.84% and 5.32% points higher than the baseline system.
Experimental results of the improved model are shown in Table 4. According to the experimental results, we know that the average AUC of our final model reached 84.22% and 79.20%, which increased by 3.27% and 3.56%, respectively, compared with the previous improvement. The TPR and FPR values of the three experiments are summarized in Table 5. It can be seen from Figure 10 that the model proposed in this paper can effectively identify different states in the dataset by the mechanical sound features, so as to realize the recognition of a mechanical sub-health state; each improvement improves the recognition effect of the model. In addition, it can also be seen that there are obvious differences between the effects of different materials, mainly because of the different hardness and smoothness of the materials, resulting in different degrees of sound characteristics.  Figure 10. Histogram of effect of three models (SCI stands for smooth cast iron; RCI stands for rough cast iron; CA stands for cast aluminum; CR stands for cold-rolled carbon steel). Figure 10. Histogram of effect of three models (SCI stands for smooth cast iron; RCI stands for rough cast iron; CA stands for cast aluminum; CR stands for cold-rolled carbon steel).

Comparison of Effects under Different Conditions
The dataset based on sound characteristics is only the dataset provided by DCASE 2020 Task 2. This dataset only contains the sound of the machine working normally and the sound after damage. Therefore, it is only possible to compare the effects of mechanical fault diagnosis, as shown in Table 6. In order to further prove the performance of our proposed method, we added a comparison with two classic neural network models-AlexNet [37] and ResNet [38]. The comparison results are shown in Table 7.

Conclusions
In this paper, a method for identifying mechanical sub-health state based on sound characteristics was proposed. By extracting the sound characteristics of the working parts of heavy machinery when working, we solved the problem of poor recognition effects of heavy machinery due to the inconspicuous characteristics of vibration signals. The collected data of the bullhead planer were applied for the recognition experiment of a sub-health state. It was found that a good recognition effect could be achieved by a simple neural network; however, because there were only positive samples, the recognition effect could not be improved further, even if the parameters were continuously modified. Through a fusion experiment with OOD detection, it was found that OOD detection was an effective way to solve a single positive sample; then, the auxiliary decision module and the change in distance representation in its structure could effectively improve the recognition effect. The identification of mechanical sub-health status can ensure the safe operation of equipment, reduce maintenance costs, and prevent the occurrence of major accidents. Therefore, subhealth detection is more practical than fault detection. In future work, we will use a more efficient and precise neural network model and a more reasonable framework to improve accuracy and efficiency, and we will draw on more ideas to continuously improve the efficacy of our model's recognition.

Conflicts of Interest:
The authors declare no conflict of interest.