Real-Time and Robust Hydraulic System Fault Detection via Edge Computing

: We consider fault detection in a hydraulic system that maintains multivariate time-series sensor data. Such a real-world industrial environment could suffer from noisy data resulting from inaccuracies in hardware sensing or external interference. Thus, we propose a real-time and robust fault detection method for hydraulic systems that leverages cooperation between cloud and edge servers. The cloud server employs a new approach that includes a genetic algorithm (GA)-based feature selection that identiﬁes feature-to-label correlations and feature-to-feature redundancies. A GA can efﬁciently process large search spaces, such as solving a combinatorial optimization problem to identify the optimal feature subset. By using fewer important features that require transmission and processing, this approach reduces detection time and improves model performance. We propose a long short-term memory autoencoder for a robust fault detection model that leverages temporal information on time-series sensor data and effectively handles noisy data. This detection model is then deployed at edge servers that provide computing resources near the data source to reduce latency. Our experimental results suggest that this method outperforms prior approaches by demonstrating lower detection times, higher accuracy, and increased robustness to noisy data. While we have a 63% reduction of features, our model obtains a high accuracy of approximately 98% and is robust to noisy data with a signal-to-noise ratio near 0 dB. Our method also performs at an average detection time of only 9.42 ms with a reduced average packet size of 179.98 KB from the maximum of 343.78 KB.


Introduction
Hydraulic systems are utilized in several industrial applications, including manufacturing, automobiles, and heavy machinery [1][2][3][4][5]. Monitoring the condition of hydraulic equipment is essential, as it can maintain high productivity and reduce the costs of system processes [6]. Three challenges to the formulation of fault detection in a hydraulic system exist. First, conducting automated and real-time fault detection without human intervention is difficult because of the time-sensitive requirements to maintain proper system functionality [7]. Second, industrial sensor data are now more abundant in sample quantity and dimension [8]. A hydraulic system operates with multivariate time-series data obtained from multiple sensors, so identifying errors is challenging due to complex nonlinear relationships between data. Third, real-world industrial sensors are often located within noisy environments, so the data collected from these tend to be noisy and unreliable. Noise occurs typically because of inaccuracies in sensor configurations or interference from the external environment [9]. Because operational decisions are made based on these sensor readings, the data must be more reliable. Therefore, a solution that addresses these challenges of fault detection in hydraulic systems must include establishing a real-time detection process, learning multivariate time-series sensor data, and handling noisy data.
The concept of Industry 4.0 [10] is proposed as the current state-of-the-art among IT and manufacturing that offers enhancements in product quality, real-time decision-making, and integrated systems. Advanced technologies, such as the cloud, internet of things, and artificial intelligence, are integrated into modern manufacturing [11,12]. The cloud includes the drawback of high response times due to the long-distance transmission of massive data volumes. However, real-time responses are essential for fault detection systems. Edge computing is designed to overcome this challenge by offering computational resources near the data source resulting in decreased latency and transmission costs [13]. Cooperation between cloud and edge servers must occur for real-time fault detection in hydraulic systems [14]. For example, the cloud server can utilize offline learning, such as selecting the important data to transfer and performing model training, while the real-time intelligent service of fault detection is executed on the edge server.
We propose a genetic algorithm (GA)-based feature selection method that considers feature correlations and redundancies. Sensor feature selection is incorporated to improve the fault detection model performance while simultaneously utilizing fewer data by defining which sensor data should be sent to reduce packet transmission. A GA offers an efficient solution for search without pre-training with any domain knowledge. For the fault detection model, we employ a stacked long short-term memory autoencoder (LSTM-AE) to extract features from the sequence data automatically, which performs efficiently on latent features from clean or noisy data. Finally, the pre-trained encoder is merged and re-trained with a dense layer and Softmax classifier to perform the fault detection.
The main contributions of this study include the following: • We propose cooperation between cloud and edge servers to support a real-time fault detection system. The cloud performs feature selection and offline learning, and the edge computes online detection near the data source, which together reduces latency and transmission costs. • We propose a GA-based feature selection that considers correlation and redundancy to ensure the selection of features that are most important and not redundant for learning. • We propose an LSTM-AE as the fault detection model to learn the temporal relations in the time-series data and extract latent features from noisy data.
In the remainder of this paper, we explain the hydraulic system fault analysis in Section 2. Next, we review previous work dedicated to hydraulic system fault detection in Section 3. An elaboration of the proposed architecture and algorithm are presented in Sections 4-7. We demonstrate extensive experiments to measure the effectiveness of the proposed method in Section 8. Finally, our conclusions and future research directions are discussed.

Hydraulic System Fault Analysis
We use the hydraulic system condition monitoring dataset available from the UC Irvine Machine Learning Repository [15], which is comprised of the primary and secondary circuits shown in Figure 1. This dataset consists of 17 sensor measurements with the details of these sensor specifications listed in Table 1. The components of the hydraulic sensor include 14 physical sensors, including six pressure sensors (PS1-PS6), four temperature sensors (TS1-TS4), two volume flow sensors (FS1-FS2), a motor power sensor (EPS1), a vibration sensor (VS1), and three virtual sensors of a computed values-efficiency factor sensor (SE), a cooling efficiency sensor (CE), and a cooling power virtual sensor (CP). In addition, the hydraulic circuit maintains other sensors, such as oil parameter monitoring (COPS) and oil particle contamination (CS and MCS). Each sensor conducts measurements during a load cycle of 60 s, with sampling rates or frequencies ranging from 1 Hz to 100 Hz. The dataset consists of 2205 load cycles or samples, with each sample having a component state label corresponding to the fault condition of the components. Table 2 describes the detailed taxonomy of the fault states for the  components, which includes four fault component targets of Cooler, indicating a cooling power fault,  Valve, indicating a switching fault, Pump, indicating an internal pump leakage, and Accumulator, indicating a gas leak. Each type of fault includes several classes that represent various component degradation states.

Related Work
Edge computing is a paradigm that brings computation resources to devices on the network that are closer to the data source. Edge computing is utilized for time-sensitive applications, such as industrial condition monitoring. Park et al. [16] proposed an edge-based fault detection using an LSTM model in an industrial robot manipulator that incorporated vibration, temperature sensors, and the use of one edge device attached to the machine and pressure sensors. Syafrudin et al. [17] proposed edge-based fault detection using density-based spatial clustering and a random forest algorithm for an automobile parts factory that employed an edge model on each workstation of the assembly line. Li et al. [18] proposed edge-based visual defect detection using a convolutional neural network (CNN) model in a tile production factory that deployed multiple cameras to capture visual information of the products [19]. These data were then sent to an edge node to inspect potential defects. Even with these approaches, there remain few studies on fault detection within the context of applications in edge computing.
Several works exist that used the same data sets as in this study. Helwig et al. [15,20] convert the time domain data into frequency domain using fast fourier transform, and generate statistical features, such as the slope of the linear fit, median, variance, skewness, the position of the maximum value, and kurtosis [21]. They then calculated features for fault label correlation and selected the n features by ranking or sorting the correlation (CS). Finally, this approach applied linear discriminant analysis [22] for the fault classification. Prakash et al. [23] also utilized statistical features of frequency domain data, such as mean, skewness, and kurtosis, and applied XGBoost [24] to define feature importance (XFI) and select half of the highest correlations along with a deep neural network for the classification model. In [25], the authors proposed a dimensional reduction approach with principal component analysis (PCA) [26] to transform the raw features to a fewer number of principal component features, and then classify the faults using XGBoost. Konig et al. [27] and Yuan et al. [28] proposed a CNN [29] as the classification model into which they directly fed the raw data because CNNs can extract features.
These previous approaches include several drawbacks. First, none considered edge computing for application to hydraulic system fault detection that supports real-time detection and reduces transmission costs. Second, such statistical feature extraction, PCA, and other feature engineering methods are not suitable for real-time detection. These techniques must be applied to each new incoming sample, thereby consuming more time and computation power. Thus, directly using the raw data with less feature engineering is preferred. Third, the proposed feature selection methods, such as CS and XFI, may suffer from utilizing redundant features, as omitting redundant features would improve the model learning. Finally, noisy data in such real-world industrial environments, such as a hydraulic system, are inevitable because sensors can receive external noise factors (e.g., distorted sensor sensing) or internal factors (e.g., sensor malfunction), and no previous work addressed this issue.
With the aforementioned open issues, we propose cooperation between edge and cloud to ensure real-time detection and low transmission costs. Then, selecting and processing raw data with less extensive processing using the new approach of correlation and redundancy-aware feature selection (CRFS) that supports faster processing and better model learning. We overcome multivariate time-series and noisy data handling issues with an LSTM-AE fault detection model. Finally, the comparison between the prior works and our proposed method presented in Table 3.

Real-Time Fault Detection in Edge Computing
We propose a holistic framework for real-time and robust hydraulic system fault detection, as illustrated in Figure 2. The process for real-time detection is described in Algorithm 1. In lines 1-3, we conduct offline learning at the cloud server using historical data consisting of selecting the feature subsets, training the fault detection model or classifier (CLF), and deploying the model to the edge server. We use our proposed correlation and redundancy-aware feature selection to select on most relevant and less redundant features that described in detail in the Section 6. By performing the feature selection, we determine which features are essential during the fault detection process. These smaller feature subsets are directed to the edge and used for real-time detection. We pre-train an LSTM-AE, then put and tune the encoder part with a classifier layer to obtain a classifier model. These learning models will be further depicted in Section 7. By deploying the classifier model on the edge server with the features selected, the urgency of time-sensitive applications, such as fault detection, is supported. All feature data can still be sent to the cloud for collection as historical data and further analysis. Hence, the cooperation between the cloud and edge servers established in this framework provides efficient real-time fault detection for a hydraulic system.

Historical data
Faster processing by using raw data, instead of enginereed data

Real-time & robust fault detection
Offline learning

Real-time detection
Process flow: if number of elements in Q == w then 9: Predict Q using CLF model 10: Pop top element in Q Within a real-world scenario, the sensor data stream arrives sequentially, so these incoming data are queued, and the detection is performed on these queued elements. The window queue functions accommodate the data stream that is then fed into the LSTM model, which learns the temporal relationships between these sequence data. The incoming data include the selected features resulting from the CRFS algorithm. Therefore, our framework is suitable for general industrial systems that conduct simultaneous condition monitoring in time cycles. These processes are shown in lines 4-10.

Data Balancing
The dataset used in this study suffers from an imbalanced class distribution over several fault types, including Valve, Pump, and Accumulator components with a ratio of normal and fault class at least 1:2 or more. An imbalanced class dataset can result in our model ignoring the learning of a minority class, thereby resulting in biased model performance [30]. Thus, we perform data balancing using a random over-sampling to duplicate samples randomly from the minority class. As a result of this pre-processing, the number of component samples becomes 741 × 3 for the Cooler, 1125 × 4 for the Valve, 1221 × 3 for the Pump, and 808 × 4 for the Accumulator.

Mean and Normalization
Additional pre-processing steps are performed after the data are obtained at the edge server. First, the means of the data are calculated to obtain a single value over the different sampling rate data. Second, the incoming data are normalized in the range of 0-1. By normalizing all inputs to a standard scale, the network model can learn the optimal parameters for each input node more rapidly.

Data Windowing
Before sending the input into the fault detection model, we perform data windowing to divide and group the multivariate time-series sensor data based on the window length w, X → X, as expressed in Equation (1). The value of w describes how many time-steps are processed, and the length of data changes from t to n = t − w + 1 sequences, as shown in Equation (2):

Adding Noise
We also evaluate the robustness of our proposed method against noise as compared to the prior works described earlier. We add noise at different levels to the raw sensor data, as illustrated with the temperature sensor TS1, presented in Figure 3. Each figure shows the sensor data applied with a different noise degree in the signal-to-noise ratio (SNR), as defined in Equation (3), where P noise represents the power of the noise, and P signal represents the power of the signal: Figure 3. Plots of noisy data collected at the TS1 sensor with different signal-to-noise ratios.

Correlation and Redundancy-Aware Feature Selection Using a Genetic Algorithm
A GA is designed to find optimal solutions within a large search space by exploiting the process of natural selection. As a GA can discover more fit solutions after each generation, it can then identify the best solution among these [31]. Thus, a GA does not require domain knowledge to assist in the search process. Three standard operators exist to regulate the population, including a selection, crossover, and mutation operator. The selection operator determines the parents from the population, and the crossover operator defines a recombination method between the parents. Then, the mutation operator defines the genetic diversity among the offspring. The GA iterations are terminated when a pre-defined stopping criterion is met, which typically is a maximum generation number. In terms of the feature selection problem, the GA considers all possible subsets of the given feature set [32]. The individual is represented as an n-bit binary string that reflects a certain feature subset. Each bit represents the elimination or inclusion of the related feature, such that 0 represents elimination, and 1 represents the inclusion of the feature.
The CRFS process is presented in Algorithm 2 and follows the goal of selecting features based on their correlation. CRFS adopts the non-dominated sorting genetic algorithm (NSGA) that works with the multiobjective problem [33]. CRFS minimizes three objectives formulated in Equation (5). First, g cor o (x) attempts to obtain features with a high correlation to the output label, which is important for the learning process. Second, g cor f (x) maintains the selected features that have a low correlation to each other because high feature-to-feature correlations represent redundant features. Avoiding these redundant features results in fewer feature subsets, reduces model overfitting, and enables the model to identify interactions and important interrelated information better [34]. In our case, length n is 17 representing the number of all sensor features, and k is the number of features included in the solution. These objectives ensure that the selected features have a high correlation, low redundancy, and are part of a smaller feature subset. CRFS automatically defines the number of selected features because it is included in the objective. At the end of the iteration, we obtain the non-dominated individuals at the Pareto front, from which we select only one solution by first normalizing the values and then applying the weight (importance) of each objective with respect to the others.
In lines 2-3 of Algorithm 2, CRFS calculates the Pearson correlation coefficient [35] using Equation (4), which represents the linear relationship between two variables ranging between −1 and 1. The values of −1 and 1 represent a perfect relationship, while 0 indicates the absence of a relationship between the variables. We take the absolute value because 1 and −1 have the same meaning, and then we run the NSGA from lines 4 through 13. In line 14, we obtain the non-dominated individuals at the Pareto front and select the fittest one among these by scaling the objectives using Equation (6). Then, this scaled objective is applied to calculate each objective weight formulated with Equation (7), where the weight represents each objective priority. The total of the objective weights is 1, and the weighted and scaled objective score ranges between 0 and 1, as presented in Equation (8). Because this process is a minimization problem, we select the solution with the lowest objective value, as performed in lines 15-17: Calculate all feature-to-output correlation ρ( f i , o)

3:
Calculate correlation between features ρ( f i , f j ) 4: Initialize empty population P(i) and archive A(i) at generation i, where initially i = 1 5: Generate random individuals with length k to P(i) 6: Calculate objective values of each individual in P(i) 7: repeat 8: Assign Pareto rank to each individual in P(i) 9: Selecting the highest ranked individuals with ties broken by preferring large crowding distances from population P(i) into archive A(i) 10: Form a mating pool from A(i) using binary tournament selection 11: Generate P(i + 1) using crossover and mutation operator to the mating pool 12: until Maximum generation t reached 13: Get the non-dominated individuals from A(t) 14: Calculate the weighted and scaled objective of each non-dominated individual, g w (x) 15: Get the individual with lowest g w (x) 16: Get the set of selected features f s

Learning Temporal Information
Deep learning models have continuously advanced over many years. For applications in time-series data research, such approaches as the recurrent neural network (RNN) introduced the idea of a time sequence within a neural network structure design that enables it to handle time-series data analysis [36]. The subsequent development of LSTM cells extended this capability to enable the learning of long-term dependencies through gates that control the learning process [37]. With the option to add or delete information from this cell state, LSTM could then solve the vanishing-gradient problem that occurs in the standard RNN [38]. The cell computes a hidden state, or output vector h t , of the LSTM cell, and then updates the cell state c t based on the previous cell (c t−1 , h t−1 ) and the input sequence x t at time step t. The three gates of the input gate i t , forget gate v t , and output gate o t are the key components for the LSTM in learning long-term relations through the selection of which information should be kept. These LSTM gates are represented in Equation (9) and illustrated in Figure 4. The c 0 and h 0 coefficients are initialized with the value of 0, and the operator • represents an element-wise product. The termc t is the cell input activation vector, and σ and tanh are the sigmoid and hyperbolic tangent functions, respectively. The training objective is to optimize W, R, and the b parameter, where W and R are the input and recurrent cell weights, respectively, while b is the bias vector:

Denoising and Latent Feature Extraction
The schema of the proposed fault detection method is shown in Figure 5 and is presented as an LSTM-based autoencoder model or LSTM-AE. AEs are used in representation learning with the objective of reconstructing the input data x from an encoded representation. The components of the AE include an encoder e φ and a decoder d θ as expressed in Equation (10). The process of pre-training is described in Algorithm 3 with lines 4-9. The encoder focuses on the task to learn the prominent characteristics and extract the encoded features h [39]. The decoder reconstructs the input from the encoded features x . Because the encoder returns a vector, we require a repeated vector function to convert the vector into sequence data, so that it can input into the decoder. The encoder and decoder feature a symmetric architecture. The reconstruction error measured with the mean squared error is expressed in Equation (11). The model parameter optimization is based on the error via the backpropagation algorithm. An AE can handle noisy data because it can extract latent features from the data if it is clean or noisy [40]. Our LSTM-based autoencoder basically is a sequence-to-sequence model that reconstruct time-series sequences [41]. The LSTM-AE provides the functionality of the feature extractor during the pre-training phase, which provides the initial weight to the model [42]. Therefore, the classifier has an optimized initial weight for training the model:

Feature-to-feature & feature-to-label correlations
Feature selection x ... Calculate reconstructed output x 8: Calculate reconstruction cost L AE (θ, φ) 9: Update parameter θ, φ via backpropagation Training Classifier 10: Build CLF model from encoder and classifier layer 11: for all epochs do 12: Calculate dense layer output vector z 13: Calculate class probability s(z) i

Fault Classification
After pre-training the LSTM-AE, the encoder is merged and trained with a dropout layer, a fully connected, or dense layer [43], and a softmax classifier layer. The dropout is a regularization method where random neurons are ignored during training so that the network can obtain better generalization and prevent overfitting [44]. The softmax classifier layer [45] assigns the transformed vector into the predicted classes and its distribution. The process of training is described in Algorithm 3 with lines 10-15. This classifier model undergoes supervised training using labeled data, and a categorical cross-entropy loss [46] optimizes the dense weight and encoder parameters via the backpropagation algorithm, as expressed in Equation (14):

Experimental Setup
We performed experiments using three devices to act as a sensor pool, edge server, and cloud server. The sensor pool sends the data to the target server, and either the edge or cloud server collects the data and performs the fault detection with the trained model. We implemented the CRFS algorithm using JmetalPy [47], and the fault detection model was implemented in Tensorflow [48] with Keras [49] for the high-level API to build the model. The experiments were evaluated through the accuracy (0-1), detection time (ms) that maintains the total of data transmission time and model prediction time, and packet size (KB). Table 4 lists the parameters considered for these experiments. We evaluated the feature selection method with the results of the selected features for each fault type presented in Table 5. Figure 6 shows the non-dominated individuals that spread across the Pareto front. Among these, we selected the best according to the lowest weighted and scaled objective value. These features are those that fulfill the CRFS objectives such that they have a high correlation to the label and low redundancy with the other features. From these results, the CRFS reduces the features by 63%, from 17 initially into 5-7 features. The Cooler and Accumulator obtain the smallest number of selected features. Thus, although we did not incorporate domain knowledge of the hydraulic system and its fault correlations, the CRFS finds the sensor features that relate to a fault in certain components. The Valve and Pump components share nearly similar features, which suggests that these have a strong fault correlation. Therefore, these selected features are used for the subsequent experiment.

Performance of Server Types
We evaluated the usage of the CRFS in different architectures, with results listed in Table 6 and presented in Figure 7. The detection time by the edge server is below 30 ms, even when dealing with all features, while the detection time by the cloud server exceeds 200 ms. The packet size is calculated from the total of the selected features packet size. The Cooler component maintains the smallest packet size because its selected features are those with a small sampling rate. The difference between the detection times by the cloud is large (i.e., unstable) and of at least 10 ms, while in the edge it is at least 1-2 ms. The cloud has a high detection time because of the long-distance connection with a significant amount of data being sent. These results suggest that edge computing is a suitable architecture to apply to fault detection systems by providing computing resources near the data sources while maintaining low latency. Hence, an edge server is incorporated in the next experiment.

Performance of Feature Selection
We evaluated the feature selection method on the fault detection model performance, with results listed in Table 7. Although the CRFS maintains fewer selected features, it offers similar accuracy to the model that utilizes all features. The CRFS outperforms the accuracy of CS with its larger number of features. While CS maintains many features with a high correlation to the label, as seen in Figure 8, it also suffers by selecting redundant features, as also shown in Figure 9, with more green block feature-to-feature correlation indicating a high correlation between them. The CRFS maintains a high feature-to-label correlation, but also includes fewer redundant features. Thus, the CRFS should exhibit better feature learning on the model. In contrast, CS maintains a larger packet size and detection time with its larger number of selected features compared to that of CRFS.   EPS1  FS1  FS2  PS1  PS2  PS3  PS4  PS5  PS6  SE  TS1  TS2  TS3  TS4  VS1  CE  CP   EPS1  FS1  FS2  PS1  PS2  PS3  PS4  PS5  PS6  SE  TS1  TS2  TS3  TS4  VS1  CE Figure 9. Feature-to-feature correlation on Accumulator fault label. Lower value represents less redundancy.

Experiment with Fault Detection Methods
We measured the performance of our method and previous approaches spanning various noise levels, with results listed in Table 8 and presented in Figure 10. XFI, CS, and CRFS are based on a one-time analysis during offline learning that defines the selected features. CRFS maintains smaller packet sizes compared to other methods that require all features to be transmitted before they can be processed, so they have a maximum packet size of 343.78 KB. Another advantage of CRFS over XFI and CS is that defining the number of selected features is not required because this is set as an objective. CS+LDA and XFI+DNN have another drawback in that they must generate statistical features on every sample, requiring more time and computational resources. CRFS uses the raw feature subset rather than statistical features, thus CRFS can define the sensor to send only those selected features, while XFI and CS still need to retrieve all of the data to be converted into statistical features before filtering. Due to the model simplicity in CS+LDA as a traditional machine learning approach, this method could maintain the lowest detection time. XFI-XGBoost is also a lightweight model but suffers from the need to transform all features into principal components, which is time-consuming. A comparison of the complexity between the deep learning-based fault detection models is listed in Table 9 and suggests that the classifier part of our proposed LSTM-AE model maintains the least number of parameters and smallest model size. Because the LSTM-AE classifier features a lower complexity than the DNN and CNN architectures used in previous works, this approach also supports real-time predictions at the edge server. The CRFS+LSTM-AE obtains the second-lowest detection time. As the detection times of every algorithm are below 50 ms, this approach enables edge computing that is capable of real-time detection. In addition, the CRFS+LSTM-AE has the smallest packet size among these experiments. The accuracy of the CRFS+LSTM-AE also outperforms the others, even when processing data with a signal-to-noise ratio of 0 dB and above. The CNN also works well in noisy data, followed by the DNN, while XGBoost and LDA do not perform well. Table 9. Complexity of the deep learning models.

Conclusions
The challenges in fault detection for hydraulic systems include maintaining real-time fault detection, learning multivariate time-series sensor data, and handling noisy data. To address these issues, we propose a real-time and robust fault detection approach through cooperation between cloud and edge servers. The cloud conducts feature selection and offline learning, while the edge processes online detection near the data source to reduce latency and transmission costs. We outline a new method leveraging GA-based feature selection to identify feature-to-label correlations and feature-to-feature redundancies. By using fewer important features that are transmitted to the edge and processed, detection time is reduced, and model performance is improved. Finally, we describe an LSTM-AE as a robust fault detection model that fully uses temporal information on time-series sensor data that is capable of handling noisy data. The experimental results suggested that our method outperforms previous approaches as we obtain lower detection times, higher accuracies, and more robust performance in the presence of noisy data. With a 63% reduction in features, from 17 to 5-7, our model still attains high accuracy above 98% while being robust to noisy data when the signal-to-noise ratio is approximately 0 dB. Our method also maintains low detection times with an average of 9.42 ms and reduces the packet size to 179.98 KB from a maximum of 343.78 KB.
For future work, we will first consider other deep learning models, such as CNN combined with LSTM, to offer a better learning model for spatio-temporal data. Then, we will apply this proposed method to other real-world datasets or systems. Finally, we will consider incomplete sensor data issues that could arise from hardware malfunctions or during device replacement.