Applications of Machine Learning to Reciprocating Compressor Fault Diagnosis: A Review

: Operating condition detection and fault diagnosis are very important for reliable operation of reciprocating compressors. Machine learning is one of the most powerful tools in this ﬁeld. However, there are very few comprehensive reviews which summarize the current research of machine learning in monitoring reciprocating compressor operating condition and fault diagnosis. In this paper, the recent application of machine learning techniques in reciprocating compressor fault diagnosis is reviewed. The advantages and challenges in the detection process, based on three main monitoring parameters in practical applications, are discussed. Future research direction and development are proposed.


Introduction
The reciprocating compressor (RC) is a key piece of equipment in petroleum and chemical industries. If the RC does not operate in the rated efficiency, it will lead to great economic loss to the company. Sometimes RCs are used to compress inflammable and explosive gases working under high pressures and temperatures, such as hydrogen, ethylene, and natural gas, which would threat human life once the machine malfunctions [1]. Furthermore, due to the intricate structure of the compressor, a large amount of wearing parts, and the complicated interactional relationship between moving parts of the compressor, it is essential to monitor the compressor operating condition and detect failures of RCs accurately and in a timely manner [2].
Traditionally, the fault diagnosis of RCs was carried out by engineers who had extensive experience and knowledge. As an example, an experienced engineer is able to detect the faults of RCs according to the abnormal noise occurring in sound signals [3,4]. However, users still tend to predict faults more accurately before the RC faults occur. This will reduce the operation and maintenance (O&M) costs by using an automatic condition classification system [5]. Recently, with the rapid development of artificial intelligence (AI), machine learning (ML) has turned out to make a great difference in machine fault diagnosis and prediction [6]. Zhang et al. [6] discussed the computational intelligence techniques utilized in machinery condition monitoring and fault diagnosis. A comprehensive overview of some common natural computing methods and their applications in mechanical systems was made by Worden et al. in [7]. Some other researchers [8] intend to develop a review of bearing fault diagnosis based on deep learning (DL), whereas very few reviews were conducted for intelligent fault diagnosis of reciprocating compressors. This paper reviews the recent research and development of fault diagnosis technologies for reciprocating compressors utilizing machine learning in terms of theoretical studies and applications. To finish this review, search terms such as artificial intelligence, machine learning, reciprocating compressor, fault diagnosis, fault detection, monitoring system, etc., were used in the process of searching literature. The filtering mechanism is mainly based on multiple combinations of the search terms mentioned above. Most of the literature surveyed is from journals, in which the English literature was searched using Web of Science and the Chinese literature was searched using China National Knowledge Infrastructure (CNKI). The theories of RCs and ML methods were mostly from books. After the searching process, the literature was classified into several categories based on ML methods applied and the main monitoring parameters. The following reviewing sections are also introduced according to the categories. After the review of literature, the advantages and limitations of the ML technologies, comparison of different monitoring parameters, and an overlook of future research are also discussed. Figure 1 shows a single-stage reciprocating compressor, which is mainly made up of two valves, a piston, a cylinder, a piston rod, a crosshead, a connecting rod, and a crankshaft. The crankshaft is driven by motor, then the crankshaft reciprocates the piston through the slide-crank mechanism, so that the piston can compress gas in the cylinder to a designated high pressure [9]. RCs can be applied in chemical, refining, and petrochemical plants, and they can compress almost any gas mixture from vacuum to over 3000 atm. literature surveyed is from journals, in which the English literature was searched using Web of Science and the Chinese literature was searched using China National Knowledge Infrastructure (CNKI). The theories of RCs and ML methods were mostly from books. After the searching process, the literature was classified into several categories based on ML methods applied and the main monitoring parameters. The following reviewing sections are also introduced according to the categories. After the review of literature, the advantages and limitations of the ML technologies, comparison of different monitoring parameters, and an overlook of future research are also discussed. Figure 1 shows a single-stage reciprocating compressor, which is mainly made up of two valves, a piston, a cylinder, a piston rod, a crosshead, a connecting rod, and a crankshaft. The crankshaft is driven by motor, then the crankshaft reciprocates the piston through the slide-crank mechanism, so that the piston can compress gas in the cylinder to a designated high pressure [9]. RCs can be applied in chemical, refining, and petrochemical plants, and they can compress almost any gas mixture from vacuum to over 3000 atm.

Overview of Reciprocating Compressors
The faults of compressors are caused by failures of different components. In [10], Kostyukov performed a survey into the fault causes of reciprocating compressors based on consumers and manufacturers of RCs. The results showed that one of the main reasons for compressor failure is valves, and it makes up 36%. Piston-cylinder units also constitute over 30% of all faults, where the failures of rings account for 25%. Failures in the slidecrank mechanism and cranking mechanism are also significant [10]. To monitor compressor conditions, many kinds of sensors were used in fault detection systems, such as vibration sensors, temperature sensor, pressure sensor, displacement sensor, acoustic emission sensor, and so on.

Overview of the Four Major Machine Learning Methods
Machine learning is a subject that focuses on research of learning algorithms by which a machine can learn from the data nearly as well as people do [11]. Up to now, there are a lot of machine learning methods that have been applied in RC fault diagnosis. In this section, the four most prevalent algorithms in machine learning are reviewed. The artificial neural network, support vector machine, and Bayesian network are three common traditional machine learning methods, and the deep learning method is one of the latest machine learning algorithms. The faults of compressors are caused by failures of different components. In [10], Kostyukov performed a survey into the fault causes of reciprocating compressors based on consumers and manufacturers of RCs. The results showed that one of the main reasons for compressor failure is valves, and it makes up 36%. Piston-cylinder units also constitute over 30% of all faults, where the failures of rings account for 25%. Failures in the slide-crank mechanism and cranking mechanism are also significant [10]. To monitor compressor conditions, many kinds of sensors were used in fault detection systems, such as vibration sensors, temperature sensor, pressure sensor, displacement sensor, acoustic emission sensor, and so on.

Overview of the Four Major Machine Learning Methods
Machine learning is a subject that focuses on research of learning algorithms by which a machine can learn from the data nearly as well as people do [11]. Up to now, there are a lot of machine learning methods that have been applied in RC fault diagnosis. In this section, the four most prevalent algorithms in machine learning are reviewed. The artificial neural network, support vector machine, and Bayesian network are three common traditional machine learning methods, and the deep learning method is one of the latest machine learning algorithms.

Artificial Neural Network (ANN)
The artificial neural network (ANN) is a mathematical model, inspired by biological neural networks, which consists of a supply of interconnected basic processing elements, Processes 2021, 9, 909 3 of 14 called artificial neurons. Artificial neurons are connected with each other by connection links integrated with different weights. Figure 2 shows an ANN with four layers which are input layer, output layer, and two hidden layers (layers 1 and 2). Each hidden layer includes several neurons, and each neuron is connected to each element of the output vector of the last layer through the weight matrix W i (the weight matrix for the ith hidden layer is written as W i ). Besides, each neuron has a bias b i j (the bias for the jth neuron in the ith hidden layer is written as b i j ), a summer, a transfer function f i j (the transfer function for the jth neuron in the ith hidden layer is written as f i j ), and an output a i j (the output for the jth neuron in the ith hidden layer is written as a i j ). Therefore, the calculating function of each neuron is indicated by Equation (1).
where a i−1 is the output vector of the (i − 1)th hidden layer (note that when i = 1, a i−1 is the input vector of the input layer of the whole network).
The artificial neural network (ANN) is a mathematical model, inspired by biological neural networks, which consists of a supply of interconnected basic processing elements, called artificial neurons. Artificial neurons are connected with each other by connection links integrated with different weights. Figure 2 shows an ANN with four layers which are input layer, output layer, and two hidden layers (layers 1 and 2). Each hidden layer includes several neurons, and each neuron is connected to each element of the output vector of the last layer through the weight matrix (the weight matrix for the th hidden layer is written as ). Besides, each neuron has a bias (the bias for the th neuron in the th hidden layer is written as ), a summer, a transfer function (the transfer function for the th neuron in the th hidden layer is written as ), and an output (the output for the th neuron in the th hidden layer is written as ). Therefore, the calculating function of each neuron is indicated by Equation (1).
, (1) where is the output vector of the ( 1)th hidden layer (note that when 1, is the input vector of the input layer of the whole network). Typically, in an ANN model, the transfer functions are selected by the designer, and the weights and biases are adjustable parameters which can be adjusted by the learning means such as error back propagation algorithm. Therefore, the input and output relationship of the network can meet a specific goal [12,13]. Thus, the ANN model can be used to deduce a function from the observations, which is helpful in solving complex problems. Hence, it can be broadly applied in fault diagnosis, which is an essential classification problem.

Bayesian Network (BN)
The Bayesian network (also called belief network) [14] is a directed acyclic graph (as shown in Figure 3) where the nodes, such as , , … , , are perceived to be the propositional variables. The arrow between two nodes means that the two nodes are related directly, and the weight therein is quantified by a conditional probability. The two essential natures of these networks are consistency and completeness, while the chain-rule representation of the joint distributions is employed to guarantee the two natures for its form [15,16]: Typically, in an ANN model, the transfer functions are selected by the designer, and the weights and biases are adjustable parameters which can be adjusted by the learning means such as error back propagation algorithm. Therefore, the input and output relationship of the network can meet a specific goal [12,13]. Thus, the ANN model can be used to deduce a function from the observations, which is helpful in solving complex problems. Hence, it can be broadly applied in fault diagnosis, which is an essential classification problem.

Bayesian Network (BN)
The Bayesian network (also called belief network) [14] is a directed acyclic graph (as shown in Figure 3) where the nodes, such as {z 1 , z 2 , . . . , z 7 }, are perceived to be the propositional variables. The arrow between two nodes means that the two nodes are related directly, and the weight therein is quantified by a conditional probability. The two essential natures of these networks are consistency and completeness, while the chain-rule representation of the joint distributions is employed to guarantee the two natures for its form [15,16]: P(z 1 , z 2 , . . . , z n ) = P(z n |z n−1 , . . . , z 1 ) · · · P(z 2 |z 1 )P(z 1 ), It can be seen that in the right-chained formula, each variable appears once on the left side of the conditioning bar, which can facilitate the dependence quantification of the network. For instance, the chain rule representation for the network shown in Figure 3 is: The Bayesian network is a methodology integrating the probability theory and graph theory. Not only can it visually exhibit the structure of real tasks by graph, but it can also exploit the structure based on the principle of the probability theory, which would diminish the complexity of reasoning. Therefore, the Bayesian network is applied in many various domains. The Bayesian network also provides a framework for new models, and therein a naive Bayes model is normally selected for classification and prediction of multi-dimensional discrete time series [17,18]. It can be seen that in the right-chained formula, each variable appears once on the left side of the conditioning bar, which can facilitate the dependence quantification of the network. For instance, the chain rule representation for the network shown in Figure 3 is: The Bayesian network is a methodology integrating the probability theory and graph theory. Not only can it visually exhibit the structure of real tasks by graph, but it can also exploit the structure based on the principle of the probability theory, which would diminish the complexity of reasoning. Therefore, the Bayesian network is applied in many various domains. The Bayesian network also provides a framework for new models, and therein a naive Bayes model is normally selected for classification and prediction of multidimensional discrete time series [17,18].

Support Vector Machine (SVM)
The support vector machine (SVM) is a supervised learning technique developed based on a statistical learning theory, which aims to find a hyperplane (see Figure 4). It can separate n-dimensional inputs into two parts associated with the real distinct classes. The hyperplane can be depicted as [19]: where is the normal vector of the hyperplane and is the bias. To ensure the generalization ability of the SVM, the simplest maximal margin bound was adopted, which implies: where ( , ) is the th sample of the training set, and ∈ 1,1 . Formula (5) is actually a convex quadratic programming problem and hence has no local minima [20,21]. By converting the problem with the Kuhn-Tucker condition into the equivalent Lagrangian dual quadratic optimization problem, the parameters of the SVM, namely and , can be obtained [19,22]. Moreover, except for the maximal margin bound, there are other available generalization bounds, such as margin percentile bounds, soft margin bounds, and so on.
The introduction above is based on the linear separable problem; however, most real tasks are nonlinear separable. Hence, the lower dimensional features should be mapped to a higher dimensional feature space utilizing kernel functions, so that the inputs can be

Support Vector Machine (SVM)
The support vector machine (SVM) is a supervised learning technique developed based on a statistical learning theory, which aims to find a hyperplane (see Figure 4). It can separate n-dimensional inputs into two parts associated with the real distinct classes. The hyperplane can be depicted as [19]: where w is the normal vector of the hyperplane and b is the bias. To ensure the generalization ability of the SVM, the simplest maximal margin bound was adopted, which implies: where (x i , y i ) is the ith sample of the training set, and y i ∈ {−1, 1}. Formula (5) is actually a convex quadratic programming problem and hence has no local minima [20,21]. By converting the problem with the Kuhn-Tucker condition into the equivalent Lagrangian dual quadratic optimization problem, the parameters of the SVM, namely w and b, can be obtained [19,22]. Moreover, except for the maximal margin bound, there are other available generalization bounds, such as margin percentile bounds, soft margin bounds, and so on.
The introduction above is based on the linear separable problem; however, most real tasks are nonlinear separable. Hence, the lower dimensional features should be mapped to a higher dimensional feature space utilizing kernel functions, so that the inputs can be linearly separated in the feature space. For this reason, the kernel function must be seriously selected for an efficient SVM classifier [21]. SVM is an initial tool designed for the binary classification. The strategies have to be established to accomplish multiclass classification. Three major SVMs based on distinctive structures are called the one-against-one SVM, one-against-all SVM, and directed acyclic graph (DAG) SVM [23,24].
Processes 2021, 9, x FOR PEER REVIEW 5 of 15 linearly separated in the feature space. For this reason, the kernel function must be seriously selected for an efficient SVM classifier [21]. SVM is an initial tool designed for the binary classification. The strategies have to be established to accomplish multiclass classification. Three major SVMs based on distinctive structures are called the one-against-one SVM, one-against-all SVM, and directed acyclic graph (DAG) SVM [23,24].

Deep Learning (DL)
The artificial intelligence methods introduced above are all conventional machine learning algorithms. One thing they have in common is the performance of classification depending on the feature vector extracted artificially from the raw data, whereas the process of the fault diagnosis is desired to be fully automatic. Deep learning (DL) offers the probability to approach this task [25].
The deep learning model is composed of multiple processing modules and each module transforms the representation from the last layer to a higher and more abstract level in the current layer. With enough suitable modules combined, the extremely intricate relationships can be learned. The internal parameters of the deep learning machine are obtained by utilizing a backpropagation algorithm based on a large set of data. The convolution neural network (CNN), deep belief network (DBN), and auto-encoder are the three main deep learning methods. The CNN is designed to process data with the form of multiple arrays, such as time series and image data [26]. The DBN is an undirected bipartite graphical model stacked by several restricted Boltzmann machines. A Boltzmann machine (BM) is an energy-based model, and its modeling capacity can be improved by increasing the number of hidden variables [27].
An auto-encoder is a purely unsupervised representation learning algorithm. An auto-encoder consists of an encoder and a decoder. The encoder can transform the input into different representations, and the decoder can convert the new representation into the primary form. The auto-encoder can be used to reduce the dimensionality of the dataset, and for learning more abstract features [27,28].

Applications of Machine Learning in Fault Diagnosis of the Reciprocating Compressor
Since the performance of most machine learning methods mainly depends on the feature extractor used before the classification, the selection of the feature extractor depends on the characteristics of the raw signals. Therefore, the subsequent section is divided into four parts according to the nature of signals, and the different machine learning methods were specified by different paragraphs in each part.

Deep Learning (DL)
The artificial intelligence methods introduced above are all conventional machine learning algorithms. One thing they have in common is the performance of classification depending on the feature vector extracted artificially from the raw data, whereas the process of the fault diagnosis is desired to be fully automatic. Deep learning (DL) offers the probability to approach this task [25].
The deep learning model is composed of multiple processing modules and each module transforms the representation from the last layer to a higher and more abstract level in the current layer. With enough suitable modules combined, the extremely intricate relationships can be learned. The internal parameters of the deep learning machine are obtained by utilizing a backpropagation algorithm based on a large set of data. The convolution neural network (CNN), deep belief network (DBN), and auto-encoder are the three main deep learning methods. The CNN is designed to process data with the form of multiple arrays, such as time series and image data [26]. The DBN is an undirected bipartite graphical model stacked by several restricted Boltzmann machines. A Boltzmann machine (BM) is an energy-based model, and its modeling capacity can be improved by increasing the number of hidden variables [27].
An auto-encoder is a purely unsupervised representation learning algorithm. An auto-encoder consists of an encoder and a decoder. The encoder can transform the input into different representations, and the decoder can convert the new representation into the primary form. The auto-encoder can be used to reduce the dimensionality of the dataset, and for learning more abstract features [27,28].

Applications of Machine Learning in Fault Diagnosis of the Reciprocating Compressor
Since the performance of most machine learning methods mainly depends on the feature extractor used before the classification, the selection of the feature extractor depends on the characteristics of the raw signals. Therefore, the subsequent section is divided into four parts according to the nature of signals, and the different machine learning methods were specified by different paragraphs in each part.

Fault Diagnosis Based on Process Parameters
The parametric method is a diagnostic method for the reciprocating compressors based on process parameters including the compressor pressure, temperature, flow rate, etc. The compressor pressure can be monitored via p-V diagram, which is one of the most typical process parameters. The p-V diagram is a two-dimensional cycle diagram which shows the variation trend of dynamic pressure in the compressor chamber with the working volume in a working cycle. The fault of the compressor valves, piston rings, support rings, and other components such as shaft, lubrication oil, and bearings can lead to the change of the pressure in the cylinder, and then the shape of the p-V diagram. Hence, the p-V diagram (cylinder pressure) is a very useful parameter for fault diagnosis in reciprocating compressors.
The support vector machine (SVM) has been widely applied in fault diagnosis based on a p-V diagram. Feng et al. [29] proposed a recognition approach for fault detection based on a p-V diagram using discrete 2D-curvelet transform, nonlinear principal component analysis (PCA), and SVM methods. The data dimension reduction with PCA and the multi-class SVM classifier are used to classify five valve faults in reciprocating compressors. Pichler et al. [30,31] detected broken reciprocating compressor valves in the p-V diagram. The gradient of the expansion phase of the p-V diagram, extracted in a logarithmic coordinate, and the pressure difference between the suction and discharge were used as the features to train the SVM classifiers which were aimed to discriminate between the faultless and faulty cases with six kinds of valves, respectively. The method was validated using real-world data and the results showed a high classification accuracy. Wang et al. [32] introduced an automated evaluation of the p-V diagram. They determined seven invariant moments of the p-V diagram and classified them using the SVM method. In another research [33], Jiang et al. conducted research on RC p-V diagram fault recognition using the SVM method. The fault features were extracted from the indicator diagram by the feature points extraction method. A fault recognition model was constructed based on multi-classification SVM and decision tree with the feature vectors.
The artificial neural network (ANN) also has been used in fault diagnosis based on p-V diagram. Namdeo et al. [34] used an ANN method to detect the valve leakage in RCs. The healthy expansion process of the RC was predicted by the functional link network. A back propagation algorithm is applied to predict the percentage of leakage based on the pressure deviation at a particular instant of time. In another study [35], the features were extracted from raw pressure signal with wavelet packet decomposition. The extracted features, along with temperature data, were used to train a logistic regression model for classifying valve faults. The features were also applied to train a recurrent neural network (RNN) to predict the future performance, namely wavelet energy features of the pressure signal of the system, which could also indicate the detection of the valve failures. Tang et al. [36] used an ANN method to analyze the fault diagnosis of RC gas valves based on geometrical property of the p-V diagram. The features were applied to train the BP neural network, resulting in a network with 100% recognition rate. In the literature [37], the p-V diagrams were normalized before the BP neural network was applied to recognize the failure conditions of RCs.
Guerra [38] extracted data from the dynamic pressure signal processed with a binned fast Fourier transform (FFT) and PCA for the detection of valve faults through Bayesian classification at 50% and 100% load.
Tran et al. [39] applied a noise removal method on the pressure and current signals, which was based on the wavelet transforms, and adopted a Teager-Kaiser energy operator to estimate the amplitude envelope (AM signal) of the transient vibration signal. Then the DBN was applied to classify the RC valve faults.
The applications of three main traditional ML methods and deep learning in RC fault diagnosis based on p-V diagram were reviewed in this section; it is obvious that SVM and ANN are widely used in this field.

Fault Diagnosis Based on Pressures Measured in Other Volumes
Except for p-V diagrams, pressures measured in other volumes can also be used to recognize faults.
Tiwari and Yadav [40] applied an ANN method in condition monitoring of a defective RC. The corresponding values of the pressure pulsations in the discharge pipe were simulated to train the ANN for predicting the percent leakage of discharge valves.
Guerra and Kolodziej [41] proposed a data-driven approach for condition monitoring of RC valves. An FFT was applied to the pressure wave measured in the environment around the discharge valve, and then the FFT values were grouped into several frequency bins. Afterwards, PCA was used to reduce the dimension of the vectors. Finally, the results were used to train the Bayes classifier, which successfully classified various levels of the valve degradation with high accuracy.
The applications of ML methods in RC fault diagnosis based on pressures measured in other volumes (except for cylinder) were reviewed in this section; it is suggested that research about RC fault detection based on pressures measured in volumes is few, and ANN and Bayes classifier were employed.

Fault Diagnosis Based on Vibration Signals
Vibration analysis is a typical monitoring method of RCs. Many faults in RCs lead to abnormal vibration which could be diagnosed from the vibration signals comprising lots of machinery information.
Qin et al. [42] presented a novel SVM scheme composed of three steps: denoising via basis pursuit, feature extraction via wave matching, and classification via support vector machine. The basis pursuit was applied to suppress the background noise and enhance the major component in the vibration signal. Then, the feature extraction was carried out by matching the denoised signal with parameterized waveform, which was optimized by a differential evolution algorithm. In the end, the SVM was carried out in the valve fault classification with 100% accuracy. Ren et al. [27] used SVMs in the automated diagnosis of valve operating conditions. The input features were extracted from the vibration signals using the local wave and higher-order statistical methods. Chen et al. [43] extracted wavelet packet entropy of vibration signals as working condition eigenvectors, and the signals were trained with an SVM classifier. Cui et al. [44] proposed an SVM classifier trained with information entropy extracted from vibration signals. Potocnik et al. [45] developed a semi-supervised approach based on vibration signals which included statistical evaluation extracted from the signals and principal component analysis as preprocess, and then a comparative analysis of classification methods including discriminant analysis (DA), neural networks (NN), SVM, and extreme learning machines (ELM) was conducted. The results showed that the nonlinear classifier performed better. Pichler [46,47] focused particularly on valve fault detection under variable operation conditions. The features of the vibration signals were extracted from the spectrogram difference with two-dimensional correlation. The classification performance was validated using SVMs and logistic regression. Pichler [47] proposed an independent method for detecting the valve faults based on the vibration measurements using several different valves. The classifiers, such as the logistic rule (in a two-class setup) and SVMs (in two-class as well as one-class setup) were compared with each other. The results showed the three classifiers performed equally good for plastic valve faults. However, the two-class SVMs were better for the steel valve faults. Na Lei et al. [48] proposed an integration approach based on the local mean decomposition (LMD) method and autoregressive-generalized autoregressive conditional heteroscedasticity (AR-GARCH) model to extract the features of the vibration signal. Then, the back propagation (BP) neural networks were applied to diagnose the faults of RC valves. Lin et al. [49][50][51] conducted research on the automated valve condition classification. They processed the raw vibration signals using time-frequency analysis such as short time Fourier transform (STFT), smoothed pseudo-Wigner-Ville distribution (SPWVD), and the reassigned smoothed pseudo-Wigner-Ville distribution (RSPWVD). Then, a data reduction algorithm was used to extract fault features which was fed to a probabilistic neural network (PNN) for fault classification. Three modification indices were proposed to extract fault features. The results showed that the modified indices were better than the original indices in the literature [51]. The genetic algorithm was applied to automate the classification process to improve the prediction accuracy [50]. The authors [49] further revealed that the applicability of the resigned smooth pseudo-Wigner-Ville distribution (RSPWV) was better than Wigner-Ville distribution (WVD) and the spectrogram (SP) in the probability neural network classification system. Meanwhile, Ahmed et al. [52,53] also conducted studies about fault classification on RCs. They found that the classification performance of features from the frequency domain were better than those from the time domain which were extracted from vibration signals with a probabilistic neural network (PNN). They further proposed a PNN optimized by GA, in which classification accuracy was higher than the original one. The authors [54] also developed a one-against-one scheme based on the relevance vector machine (RVM) and a multiclass multi-kernel RVM (mRVM). Both methods were optimized by GA, and their classification accuracies were up to 97%. Diego Cabrera et al. [55] developed a long short-term memory (LSTM)-based classifier for valve faults trained with preprocessed vibration time series, and the hyperparameters were optimized by Bayesian method. Li et al. [56] proposed an improved wavelet neural network (WNN) in which original parameters were obtained by genetic algorithm (GA). Yang et al. [57] proposed an online network, adaptive resonance theory-Kohonen network (ART-KNN), which performed more suitable than self-organizing feature map and learning vector quantization on production line. In another study [58], the Wigner-Ville distributions (WVD) of the vibration acceleration signals were calculated and displayed in grey images and the PNN was directly used to classify the new time-frequency images after the images were normalized.
Kolodziej et al. [59] trained a Bayesian classifier for early detection of the spring fatigue and valve seat wear in RCs, and validated it using experimental data. The vibration data was processed using the Wigner-Ville spectrum and quantified using image-based statistical features. The principal component analysis (PCA) was utilized to reduce the feature space.
Tran et al. [60] proposed a hybrid deep belief network (HDBN) which integrated the DBN for pretraining and simplified fuzzy ARTMAP (SFAM) for fault classification. The results showed a great improvement in comparison with the original DBN in classification accuracy.
The applications of ML methods in RC fault diagnosis based on vibration signals were reviewed in this section. There are considerable studies focusing on the fault detection techniques based on vibration signals, and similar to p-V diagram, lots of different SVM models and ANN models were employed as classifiers in these cases, whereas Bayes classifier and deep learning were barely used.

Fault Diagnosis Based on Acoustic Emission (AE)
Acoustic emission refers to the generation of transient elastic waves produced by a rapid release of energy from a localized source within the surface of material, according to the American Society for Testing and Materials (ASTM) [61,62]. By detecting AE signals generated in the reciprocating motion, acoustic emission can be used to discriminate the different types of damage occurring in an RC.
Ali et al. [63,64] investigated fault detection technologies based on artificial intelligence (AI) and AE signals. They proposed two AI models to detect the valve condition in a reciprocating compressor based on several AE signals using SVM and ANN [63,64]. In the literature [65], the ANN and SVM models were trained and evaluated for detection of valve faults in an RC. The results showed that the accuracy of the ANN and SVM detection methods were similar, but the SVM had better ability of handling a large number of input features with low sampling datasets. Zhang et al. [66] extracted the root mean square (RMS), average signal level (ASL) of the time domination, and peak value of the frequency domination as the eigenvectors in the SVM model. With the SVM model, the leakage of the pipeline valve could be recognized. Sim et al. [67] employed the time-frequency analysis of the AE signal through the discrete wavelet transform (DWT) and assessed the characteristics of four acoustic emission parameters [67]. The result revealed that the acoustic emission root mean square (RMS) performed the best. Then, the k-nearest neighbor (KNN) and support vector machine (SVM) classification methodologies were applied to detect the valve faults with AE RMS before estimation of the valve flow rate through regression model [20].
The applications of ML methods in RC fault diagnosis based on AE signals were reviewed in this section. The amount of studies in this field is less than for p-V diagram and vibration signal; also, the applications of classifiers mainly focused on ANN and SVM.

Fault Diagnosis Based on Multi-Source Signals
The faults in RCs are intricate, and it is difficult to recognize all of them by a single signal or parameter. Therefore, it is important to conduct studies on fault detection based on multi-source signals.
Yang et al. [68] studied the condition classification of a small reciprocating compressor for refrigerators using ANN and SVMs. The noise and vibration signals were wavelettransformed into the frequency sub-bands and the fault features were extracted using the statistical method. The classification performance of the SVM, self-organizing feature map (SOFM), SOFM associated with learning vector quantization (LVQ), and LVQ were compared with each other. The results showed that the SVM and LVQ methods performed better than the other methods. Zhang et al. [69] proposed an RC fault diagnosis method based on sensitive parameters extracted by scatter matrix method and SVM. The sensitive parameters were assessed by distance evaluation method. The accuracy of the new method is superior to the traditional methods. A fault detection system integrating data analysis and machine-learning was proposed by Qi et al. [70]. The raw data was denoised by robust principal component analysis (RPCA) first, then the core information of the compressor signal was extracted by a sparse coding algorithm with online dictionary. Based on the learned dictionary, the potential faults were finally recognized and classified by the SVM using the one-on-one strategy.
Li et al. [71] proposed an ART-artificial immune network for RC failure detection, integrating the adaptive resonance theory (ART) and artificial immune network (AIN). The network was trained by the suction pressure, discharge pressure, suction, and discharge temperatures from a multilevel RC. Wang et al. [72] established an RC intelligent diagnosis system based on multi-agent technology. The system involved monitoring agent, management agent, diagnosis agent, diagnosis method agent, fusion agent, human-computer interaction agent, and other modules. The monitoring agent integrated four signal types, such as vibration, temperature, displacement, and pressure. In addition, the diagnosis method agent included the expert system agent, fuzzy logic agent, neural networks agent, and so on.
Zhang et al. [73] proposed an improved K-means algorithm (K-means algorithm is one of the clustering algorithms) for RC fault diagnosis. This new method has gotten rid of the algorithm's dependence on the initial clustering centers.
The applications of ML methods in RC fault diagnosis based on multi-source signals were reviewed in this section, and ANN and Bayes classifiers were mainly employed. Meanwhile, in this section, a clustering algorithm (K-means) [73] was applied in RC fault diagnosis. It can be the guide for future research about RC fault diagnosis.

Discussion
As reviewed in Table 1, it is obvious that the SVMs and the ANNs are the two most widely used methods of traditional ML algorithms. They are employed as classifiers in many of the reciprocating compressor fault diagnosis cases based on the three main monitoring parameters. Figure 5 shows the trends of each ML method over years, from that we can see, although the amounts of literature regarding ANN and SVM are close; the application of SVM is rising, and the application of ANN is declining with each year. Meanwhile, although the application of deep learning was less prominent, it has begun to develop in recent years. However, the application of Bayesian network and clustering algorithms were still barely considered in this domain. Actually, considering the development of AI techniques, we can see that the research tendency of the intelligent fault detection techniques regarding RCs largely depends on the development of AI techniques. Meanwhile, in this section, a clustering algorithm (K-means) [73] was applied in RC fault diagnosis. It can be the guide for future research about RC fault diagnosis.

Discussion
As reviewed in Table 1, it is obvious that the SVMs and the ANNs are the two most widely used methods of traditional ML algorithms. They are employed as classifiers in many of the reciprocating compressor fault diagnosis cases based on the three main monitoring parameters. Figure 5 shows the trends of each ML method over years, from that we can see, although the amounts of literature regarding ANN and SVM are close; the application of SVM is rising, and the application of ANN is declining with each year. Meanwhile, although the application of deep learning was less prominent, it has begun to develop in recent years. However, the application of Bayesian network and clustering algorithms were still barely considered in this domain. Actually, considering the development of AI techniques, we can see that the research tendency of the intelligent fault detection techniques regarding RCs largely depends on the development of AI techniques.  The reviewed cases demonstrated that there are some differences in the three main traditional ML methods with respect to their application scenarios. We can see that ANNs were good at regression, meanwhile SVMs were more focused on classification, and Bayesian networks were more applied in probabilistic prediction. Although they were applied in different scenarios, all of them behaved well with respect to classification performance. Moreover, the main difference between the traditional ML methods and deep learning, in terms of the RC fault diagnosis, is that the traditional ML methods need to select various feature extractors associated with the monitoring parameters, while the deep learning implements the representation learning in the lower processing modules. Hence, the classification results of the traditional MLs deeply depend on the preprocessing techniques and the quality of feature extractors. Furthermore, the application of the clustering algorithm may guide the further development of intelligence RC fault detection techniques. The reviewed cases demonstrated that there are some differences in the three main traditional ML methods with respect to their application scenarios. We can see that ANNs were good at regression, meanwhile SVMs were more focused on classification, and Bayesian networks were more applied in probabilistic prediction. Although they were applied in different scenarios, all of them behaved well with respect to classification performance. Moreover, the main difference between the traditional ML methods and deep learning, in terms of the RC fault diagnosis, is that the traditional ML methods need to select various feature extractors associated with the monitoring parameters, while the deep learning implements the representation learning in the lower processing modules. Hence, the classification results of the traditional MLs deeply depend on the preprocessing techniques and the quality of feature extractors. Furthermore, the application of the clustering algorithm may guide the further development of intelligence RC fault detection techniques.
There are much more discoveries that were dug out from the reviewed literature in the aspects of monitoring parameters. The preprocess, feature extract methods, and occupations of literature of the three main monitoring parameters are summarized in Table 2. It is revealed that both vibration and AE signals entailed the complex preprocess by contrast with the pressure. This is because the AE, as well as vibration signals, exhibited nonstationary behavior due to the RC motion. Therefore, the preprocess was applied to offer a better revolution for the useful components of raw signals. Moreover, Table 2 also represents that more literature focused on the vibration signal which was established in comparison with the other two parameters. This is because the pressure measurement was strict with the mounting environment, and the advent of the AE theory was later than the vibration. Moreover, in the RC fault diagnosis, it is impossible to use one single type of signal to clearly monitor all the potential faults. Only a few studies, such as [8], studied a condition monitoring (CM) system for overall fault diagnosis. Furthermore, it is obvious that considerable existing literature tends to study the fault detection techniques of the valves in RCs. This is because the valves result in 36% of the cases where the compressor needs to be shut down, and they also constitute 50% of the total maintenance cost, and it is much easier to see valve faults in RC fault simulation experiments compared with other structural failures. Besides, most of the reviewed research did not focus on real-time recognition of the condition of the RC, but rather exploited data from a prepared testing set and made a decision about whether the compressor was healthy or not.

Challenges and Prospects
As reviewed above, many studies have been conducted for RC fault diagnosis using machine learning. The challenges and prospects for the RC fault diagnosis based on machine learning are summarized below:

•
The process of RC fault diagnosis could be a pattern recognition problem or a clustering problem, in terms of artificial intelligence. However, most of the current studies only tend to deal with the fault diagnosis as a pattern recognition problem. Therefore, it is expected for researchers to input more efforts to explore the suitable clustering algorithms.

•
With the rapid development of the monitoring system, deep learning can easily take advantage of the large amount of monitoring data. It will make the fault diagnosis process more automatic and accurate.

•
To make sure that the CM system is applicable for practical engineering applications, more attention should be paid to the overall fault diagnosis. Furthermore, for the reasons discussed in Section 4, the instantaneity of fault diagnosis is also indispensable. • Except for the three monitoring parameters (pressure, vibration, and AE), as mentioned above, there are other parameters, such as piston-rod axis orbit, flow rate, displacement of wear, etc., which are more sensitive to relevant failures. Future investigation can focus on these parameters as well.

•
Although valve problems account for most of the cases where the compressor needs to be shut down, and the total maintenance cost, research into detection techniques for other failures in RCs is also necessary.

Conclusions
In this paper, the applications of machine learning for RC fault diagnosis were reviewed. The application status and scenarios of the four ML methods were discussed. SVM models and ANN models were the most two widely used ML methods, and without the feature extraction, deep learning will make the fault diagnosis process more automatic and accurate.
The advantages and disadvantages of the fault detection process of three main monitoring parameters including pressure, vibration, and AE signals were evaluated and discussed. Vibration and AE signals need more complex preprocess to offer a better revolution for the useful components of raw signals.
Finally, the challenges and prospects of machine learning used in RC fault diagnosis were discussed. This can provide valuable guidelines for future research in this domain.

Conflicts of Interest:
The authors declare no conflict of interest.