Fault Detection of Flow Control Valves Using Online LightGBM and STL Decomposition

: In the process industrial systems, flow control valves are deemed vital components that ensure the system’s safe operation. Hence, detecting faults in control valves is of significant importance. However, the stable operating conditions of flow control valves are prone to change, resulting in a decreased effectiveness of the conventional fault detection method. In this paper, an online fault detection approach considering the variable operating conditions of flow control valves is proposed. This approach is based on residual analysis, combining LightGBM online model with Seasonal and Trend decomposition using Loess (STL). LightGBM is a tree-based machine learning algorithm. In the proposed method, an online LightGBM is employed to establish and continuously update a flow prediction model for control valves, ensuring model accuracy during changes in operational conditions. Subsequently, STL decomposition is applied to the model’s residuals to capture the trend of residual changes, which is then transformed into a Health Index (HI) for evaluating the health level of the flow control valves. Finally, fault occurrences are detected based on the magnitude of the HI. We validate this approach using both simulated and real factory data. The experimental results demonstrate that the proposed method can promptly reflect the occurrence of faults through the HI.


Introduction
In process industrial systems, control valves are frequently employed as essential actuators, serving a pivotal role.The flow control valve is the most common type of control valve.Flow control valves precisely regulate the flow passing through them by adjusting the valve stem displacement and the pressure difference across the valve.Nevertheless, flow control valves are often mandated to operate in a variety of environmental conditions, which can include extreme factors like high temperatures, high pressures, corrosive media, and hazardous explosive zones [1].In these challenging environments, control valves face issues such as leakage and viscosity effects, which can lead to unforeseeable production failures and safety hazards [2][3][4].
The industrial process is influenced by various factors, including changes in fluid properties, fluctuations in operating conditions, and equipment aging.These factors pose challenges for flow control valve fault detection.In the event of a flow control valve malfunction, the performance of the control loop is compromised, leading to challenges in regulating both the displacement of the valve stem and the magnitude of fluid flow within the pipeline.Many studies employ physical modeling [5,6], statistical analysis [7,8], and machine learning [9][10][11] approaches to detect faults in control valves.Zhang et al. [12] developed a graphical model capable of simultaneously detecting multiple faults while reducing dependence on statistical methods.Shi et al. [13] proposed a method based on Intrinsic Mode Functions (IMF) and one-dimensional WDenseNet for diagnosing internal leakage faults in directional control valves.Conti et al. [14] selected current, acoustic emission, and vibration signals as the most promising monitoring technique.They optimized the feature extraction and data fusion processes to detect early leakage faults in control valves.
Although there have been many research projects on control valves, the accuracy of fault detection using physics-based methods is affected by the uncertainty in industrial processes [15].On the other hand, when it comes to statistical data and machine learning methods, the scarcity of labeled fault data compared to the vast amount of data from normal operation in industrial processes results in the problem of data imbalance, leading to low accuracy in fault detection methods [16].In response to this issue, approaches to addressing the problem have been provided by methods based on data modeling and residual analysis.Residual-based stepwise attribute assessment methods have consistently held a pivotal and irreplaceable role in the field of fault detection.The most prominent advantage of the residual analysis stepwise attribute evaluation method is its independence from a substantial volume of fault data and the absence of a requirement for data specific to particular fault events [17].
The main emphasis in using residual analysis methods is on building models and assessing residuals.Therefore, it is crucial to address two key issues in this approach: how to quickly and accurately model the system and analyze residuals for effective fault detection.Heydarzadeh et al. [18] proposed a two-stage monitoring architecture for diagnosing actuator abnormalities.Initially, a model was established for fault-free processes using LS-SVM, followed by DWT analysis of the prediction model's residuals to diagnose faults.Simani et al. [19] introduced a model-based dynamic system input-output control sensor fault detection and isolation method that leveraged analytical redundancy.This approach began with the construction of an industrial process model using standard identification techniques for variable error models.Subsequently, statistical tests were applied to the residuals for fault detection and isolation.Hu et al. [20] presented a current sensor fault diagnosis method that combines PSO-optimized residual generation with statistical residual assessment.It involved the development of a current sensor model based on charging principles, followed by statistical analysis of estimated residuals through Monte Carlo simulations to generate empirical residual thresholds, ensuring precise fault diagnosis for current sensors.
Although the residual analysis method has been widely applied in fault detection for various equipment, existing approaches for detecting faults in control valves still need to be revised.Firstly, factories often install a large number of control valves, each of which may operate under different conditions, and the operating conditions of individual control valves can change over time.Consequently, models trained offline often exhibit suboptimal performance when used online due to the diversity and variability of operating conditions.Secondly, when control valves experience gradual faults, the changes in residuals are often not prominent, making it challenging to accurately detect faults solely based on the magnitude of residuals.Therefore, it is essential to develop more precise and applicable methods for detecting faults in flow control valves that address these issues.
To address these challenges, it becomes imperative to establish online flow models for control valves that can adapt to varying operating conditions, thereby ensuring model accuracy.Given the need for high speed in online modeling, this research proposes a LightGBM-based approach for the online construction of control valve flow prediction models.This method not only ensures model accuracy but also boasts exceptional modeling speed.Subsequently, we employ the STL decomposition technique on the model's flow residuals to capture their trends, which are then transformed into a health index (HI).Through the application of HI, we can not only detect the occurrence of faults but also assess the extent of gradual faults.
The contributions of this paper are as follows: 1.
An online LightGBM modeling method is proposed for constructing flow control valve models, and the residuals generated by this model are employed for control valve fault detection.This method is specifically tailored for large-scale and dynam-ically changing control valve systems and demonstrates higher modeling accuracy compared to traditional offline modeling methods.

2.
A residual analysis method based on STL decomposition is introduced.Through the decomposition of residual data from flow models, trend components are extracted and used to construct the HI metric for fault detection purposes.
The rest of this paper is organized as follows: Section 2 explains the fault detection framework using model residuals and the adoption of LightGBM modeling.Section 3 covers the dataset, along with presenting experimental results obtained using the proposed methods for flow control valve fault detection.Finally, in Section 4, we summarize and discuss the research findings.

Residual Analysis-Based Fault Detection
Equipment fault detection focuses on determining the presence of equipment faults that could impact the system's operation.When equipment malfunctions, it exhibits significant differences from its normal state.It is possible to detect the occurrence of faults by identifying these differences.To enable fault detection, it is necessary to establish a model that represents the healthy operational state of the equipment.The residuals of the model, which signify the differences between predicted and observed values, encompass information about these variations.Therefore, effective fault detection is achievable through the analysis of residuals.This underscores the importance of establishing an accurate model, as the foundation of residual analysis relies on the precision of the model.The modeling phase involves learning the normal operational patterns of the system to establish a model that represents the system's normal state [21].
Residuals are computed by feeding the operational data into the model and the actual system and then calculating the difference between the model's output and the actual system's output [22].When provided with the model output y est and the actual system output y, the expression for residuals Res is as follows: Through further analysis of residuals, changes in the residuals can be detected, indicating variations in the health status of the equipment.Considering the sequential characteristics of residual data, we employ the STL (Seasonal and Trend decomposition using Loess) decomposition method to analyze the residual data, aiming to extract potential fault or degradation trends.Compared to signal decomposition methods such as wavelet transform, STL decomposition can adapt to data with different periodicity and trends, without relying on the selection of basis functions, thus demonstrating better flexibility.For the original signal denoted as Y v , the decomposition expression is as follows [23]: where T v represents the trend component, S v represents the seasonal component, and R v denotes the remainder component.The STL approach involves a series of Loess (Locally Estimated Scatterplot Smoothing) smoothers used as an iterative non-parametric regression process.STL decomposition consists of two computational steps: an inner loop and an outer loop.The objective of the inner loop is to obtain the signal's trend component, and its calculation steps are illustrated in Algorithm 1 [24], where T Following the identification of residual trends, we transform this component into an indicator denoted as HI, confined within the interval of 0 to 1.As the equipment undergoes a transition from a state of normalcy to a faulty state, there is a gradual escalation in the residuals' trend.Consequently, a mapping from [0, +∞) to (0,1] is established, effectively translating the residual trend into the HI metric.Under typical operating conditions, HI tends to stabilize around 1, but in the presence of a fault, it experiences a discernible decrease.The introduction of a square root operation to healthy residuals aims to mitigate fluctuations, thereby ensuring a more consistent HI during routine equipment operation.We define the HI as follows, where c is a regulation factor and T v is the trend component.
Cycle-subseries smoothing: smooth the seasonal subseries using LOESS to obtain the sequence Low-pass filtering of smoothed cycle-subseries: apply low-pass filtering to Deseasonalization involves subtracting the low-pass filtered seasonal subsequence to obtain the seasonal component: Trend smoothing is performed using LOESS regression on the sequence from Step 7, resulting in the trend component T (k+1) v 9: until the trend component and the seasonal component converge.

LightGBM Algorithm
LightGBM [25] is an improved version of the Gradient Boosting Decision Tree (GBDT) model, incorporating techniques like one-sided gradient sampling and exclusive feature bundling.In this study, we have employed LightGBM to construct a flow prediction model for control valves.The operational principle of LightGBM involves iteratively adding and training new trees to fit the residuals from the previous iteration.Ultimately, LightGBM allocates a predictive value to each instance by summing the scores of all the leaf nodes.
Consider a training dataset D containing n samples and m features, where D = {(x i , y i )} n i=1 , (x i ∈ R m , y i ∈ R), with x i representing the ith sample and y i representing the label value of the ith sample.
The prediction model expression for LightGBM is as follows: where ŷi represents the ultimate prediction result for the input x i , K represents the number of trees, and f t (x i ) denotes the result of input x i for the tth tree.
During each iteration, LightGBM constructs a decision tree, and for each training process, its objective function is as follows: (5) ) where L is the loss function and Ω is the regularization term.N represents the total number of leaf nodes, and ω is the score of each leaf node.γ and λ are controlling factors to avoid overfitting.
To simplify the loss function, consider a second-order Taylor expansion.
Make the following substitution: The loss function can then be rewritten as follows: As L(y i , ŷ(t−1) i ) is a constant, it does not influence the optimization process and can be removed from the objective function, thus allowing the objective function to be rewritten as follows:

Online Learning Method
Online learning methods enable the model to dynamically adapt to changes in data by continuously updating it based on real-time data generated during system runtime.This approach enables the model to dynamically adapt to new information over time, providing a better reflection of the actual state of the system.Online learning is particularly valuable for handling real-time data and meeting the demands of dynamic changes in the system [26].The process of online learning is illustrated in Figure 1.Online learning algorithms provide enhanced efficiency and scalability, especially in the context of large-scale machine learning tasks in practical data analysis applications.Online learning techniques are frequently applied in two primary scenarios.First, they enhance the efficiency and scalability of existing batch machine learning methods.Second, online learning algorithms are directly employed for the analysis of online streaming data [27].In real-world scenarios, numerous factories often have hundreds or even thousands of control valves, making traditional batch learning methods inefficient in terms of time and space costs.Furthermore, when the operational conditions of control valves change, the model needs to be retrained, which reduces the scalability of large-scale applications.However, online learning methods allow for updates on the existing model, minimizing the resource consumption associated with model retraining.

LightGBM-Based Residual Analysis for Online Fault Detection
Precise flow control is one of the key functions of flow control valves [28], and many flow control valve faults can affect the effectiveness of flow control [29].Therefore, it is possible to establish a flow model for control valves.When a fault occurs in the system, the detection of the fault can be achieved by monitoring changes in the flow model.The changes in the flow model can be reflected through flow residuals.By monitoring flow residuals, potential variations indicative of system faults can be identified, thereby enabling fault detection.The control valve flow equation is as follows [30]: where Q is the flow, P1 is the upstream pressure, P2 is the downstream pressure, and C d is the discharge coefficient, which is related to fluid temperature and valve position.Based on the control valve's flow equation, it is evident that the flow can be predicted using the valve's upstream pressure, downstream pressure, valve position, and fluid temperature.This method combines online learning with LightGBM and residual decomposition using STL to conduct fault detection.The online LightGBM can promptly update the model to adapt to changes in the operating conditions of the flow control valve, thereby reducing model prediction errors and improving the accuracy of fault detection.Simultaneously, the trend of residual changes obtained through STL decomposition provides an intuitive reflection of the health status changes in the flow control valve, enabling the detection of faults based on this information.The flowchart of the method is shown in Figure 2. To illustrate this method, an example is provided.At first, historical operational data from the control valve are gathered, encompassing parameters such as flow rate, pressure, fluid temperature, and valve position control signals.Due to the diversity of various types of physical quantities, data normalization is necessary.After normalization, the numerical range of the data is adjusted to be between 0 and 1.This facilitates the processing of different types of physical quantities on a unified scale, making them easier to compare and train.Subsequently, input features like upstream pressure, downstream pressure, fluid temperature, and valve position control signals are chosen, with flow rate serving as the target label.An initial flow rate prediction model is constructed through offline training using LightGBM.Assuming data are collected at a rate of one sample per second, and 600 training samples are needed to update the model, online updates are performed every 10 min using data from the previous 10 min of operation.When the model is employed to predict flow rate, the difference between the model's predictions and the actual measurements results in flow residuals.Then, these residuals undergo STL decomposition.The trend components derived from this decomposition are then converted into an HI.Detection of fault occurrences is determined by the magnitude of the HI concerning a predetermined fault threshold.It is noted that the construction of HI is influenced by the tuning factor c. A smaller value of the tuning factor makes HI more sensitive to changes in residuals.Therefore, the selection of an appropriate fault threshold depends on the magnitude of c.If a fault threshold of 0.9 is set, any HI value below this threshold indicates the presence of a fault.

Data Acquisition
The experiments were conducted using the DAMADICS (Development and Application of Methods for Actuator Diagnosis in Industrial Control Systems) [31] platform for simulation to obtain operational data of the control valve actuator.DAMADICS is a well-known benchmark for fault detection and isolation.It establishes simulation models based on the valves used in the Polish Lublin Sugar Plant production process and has developed a control valve actuator model library using MATLAB-SIMULINK.It effectively simulates typical fault modes of control valves.This platform can simulate 19 types of faults, and the simulated faults in control valves can be categorized into four types: 1. control valve body faults, 2. pneumatic servo motor faults, 3. positioner faults, and 4. external faults.Faults can also be classified as abrupt or gradual based on their temporal characteristics.During normal operation, the fault type is set to "f0", indicating no fault.When simulating fault occurrences, the fault type is adjusted to correspond to the model fault.In this experiment, we simulate the operation of the control valve by providing periodic control signals and simulate the occurrence of valve faults by periodically varying the fault types.The DAMADICS model is depicted in Figure 3.

Online Learning Experiment
For methods based on model residuals, accuracy is crucial.If there is significant error in the modeling process, it may mistakenly diagnose a normally functioning system as having a fault.Within a factory setting, different control valves serve various purposes, leading to variations in their operating conditions.In such cases, if distinctions among different operating conditions are made, the model's performance may be better when applied to data from varying conditions.Even when models are separately trained for each distinct operating condition, the effectiveness of the model may still be compromised, given that control valve conditions can change rapidly, and the model needs to adapt promptly.
From empirical observations, it is evident that if the data used during model training align closely with the operational characteristics of the target control valve, the predictive performance of the model on that specific control valve tends to be superior.Therefore, updating the model with new data in a timely manner, especially when the operating conditions of the control valve change, can significantly enhance model performance.To achieve this objective, we have employed an online learning approach to ensure that models for each control valve receive timely updates.
Through simulation experiments conducted on the DAMADICS platform, we generated operational data for control valves V1, V2, and V3 under three distinct operating conditions.Initially, these three different operating modes' data were amalgamated to form the offline training dataset.Backpropagation (BP) neural network is a type of multilayer feedforward neural network trained using the backpropagation algorithm.By adjusting the weights within the network, BP neural networks aim to minimize the error between the actual output and the desired output.We employed a three-layer BP neural network to train our foundational flow prediction model using this dataset.Following this, we utilized data from various operating conditions to update the foundational flow prediction model, simulating the online learning process.The mean and variance of the flow data used for both offline and online training are presented in Table 1.We compared the performance of the offline model and the online model, assessing model performance using evaluation metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and Coefficient of Determination (R 2 ).After comparing their ability to make predictions, it was clear that when dealing with data from control valves operating under three different conditions, the online model yielded better results compared to offline modeling, as shown in Figure 4 and Table 2.The offline model presents challenges in dealing with diverse data, potentially constraining the model's generalization capability.This limitation becomes particularly evident when faced with various possible operational modes of control valves in real industrial scenarios.In contrast, the online model exhibits greater flexibility and adaptability, enabling timely model updates based on distinct data characteristics, ultimately resulting in superior predictive performance.In order to maintain optimal performance for the flow model, we employ an online learning approach to update the model.However, compared to the offline model, the online model entails a significant increase in time consumption due to the need for continuous model updates.Therefore, there is a requirement for fast online modeling techniques.Currently, commonly utilized models for online learning include neural networks and tree models.Deep neural networks based on artificial neural networks exhibit significant advantages in terms of precision.However, they come with the drawback of lengthy training times, which is not advantageous for online modeling.Similar to neural networks, tree models possess robust scalability and the capability to update and train on existing models.Through ensemble learning, the combination of multiple decision tree weak learners forms a classification regression tree, preserving the tree model's characteristic of fast modeling speed while demonstrating excellent performance in modeling precision.Therefore, in this context, we have chosen LightGBM, XGBoost, and BP neural networks as training models, comparing them in terms of model training speed and predictive performance.We have continued to employ the valve operation data from Section 3.2, applying the aforementioned modeling methods to valves operating under three different conditions.
Table 3 lists the prediction performance and required training time for BP, XGBoost, and LightGBM.The results show that LightGBM is very fast at modeling, beating both XGBoost and BP neural networks.When it comes to making predictions, LightGBM does as well as XGBoost and is even better than BP neural networks.For factories with lots of control valves, the shorter training time means using fewer resources.As previously mentioned, control valves are subject to variations in operating conditions while in use.In such cases, employing online modeling methods becomes crucial for timely model updates, thereby ensuring model performance.Figure 5 demonstrates the prediction effects of offline LightGBM and online LightGBM.To verify the effectiveness of the online modeling approach under varying operating conditions, we compared the prediction performance of offline and online models using data from changing operational scenarios.Figure 6 illustrates the variation in control valve flow before and after a change in operating conditions.We utilized data from before the change to establish the offline model and then updated the model online using data collected after the operational shift.Table 4 presents the prediction performance metrics for both offline and online models, including BP, XGBoost, and LightGBM, on the data collected after the change in conditions.Figure 7 visualizes the prediction results of the offline and online LightGBM models for the data reflecting these operational changes.

Fault Detection Using Simulation Data
To validate the effectiveness of the proposed fault detection method, we conducted a series of simulation experiments using the DAMADICS platform to simulate five different types of control valve body faults.The fault labels and their descriptions are listed in Table 5.In these experiments, data were collected for each fault type, with each dataset comprising continuous data spanning 3000 s.The first 900 s of each dataset represent normal operating conditions, after which the faults were induced.As shown in Figure 8, under normal conditions, the residual is minimal.However, when a fault occurs, a change in the residual is observed.It is worth noting that clogging and flashing exhibit the most significant changes in residuals, while sedimentation and internal leakage show relatively less pronounced changes in the early stages of the fault.
As shown in Figure 9, the residual trend components obtained through STL decomposition exhibit no significant variations under normal conditions.However, when sudden-failure-type faults occur, these trend components experience abrupt increases or decreases.In the case of gradual-failure-type faults, the trend components exhibit a slowchanging trend.By observing variations in the trend component, we can make preliminary assessments of the health status of a system or device.However, due to the diverse impacts of different types of faults on the trend component, theoretically, its numerical value can fluctuate indefinitely.This poses a challenge in directly describing the trend term numerically, as it becomes difficult to intuitively discern whether its changes have exceeded normal parameters.Consequently, a viable approach is to transform the trend term into an HI.By constraining the numerical value of the trend term within a range of 0 to 1, we can more conveniently utilize the HI value to assess changes in the health status and detect the occurrence of faults.To accurately detect faults using HI, it is essential to establish a precise criterion, known as the fault threshold, for determining malfunctions.In the case of abrupt failures, where the transition from normal operation to a faulty state occurs instantaneously, selecting an appropriate threshold is relatively straightforward.However, for faults that deteriorate gradually, a clear boundary is essential to determine the point when equipment performance declines to an unacceptable level.Consequently, we have chosen to monitor changes in flow rate as our benchmark.Specifically, if the deviation in flow rate exceeds 0.5% compared to the normal flow rate under identical operating conditions, the equipment is deemed to have suffered a failure.The HI derived from the residual trend term is shown in Figure 10, where the tuning factor c is set to 0.2.The determination of the fault threshold relies on the variation of the HI during a fault occurrence.For this experiment, we set 0.85 as the critical threshold, and any HI value falling below 0.85 will indicate a system failure.At the 900 s mark, the abrupt occurrence of mutation faults, designated as f1 and f5, triggered a precipitous decline in the HI, pushing it below the critical threshold of 0.85.Concurrently, the gradual fault labeled f3 exhibited a more rapid evolution, culminating in system failure at 1773 s, coincident with a drop in HI to below 0.85.In contrast, gradual faults f2 and f4, owing to their sluggish progression, had not attained a failure state by the 3000-second benchmark, thereby maintaining their HIs above the 0.85 threshold.We compared the fault detection effectiveness of three online training models combined with STL decomposition to generate HI, as shown in Table 6.

Fault Detection Using True Factory Data
To validate the effectiveness of the proposed method in real industrial settings, we conducted experiments using the dataset from the Lublin Sugar Factory.This dataset encompasses operational data recorded from 29 October 2001 to 22 November 2001, with faults occurring on 30 October, 9 November, and 17 November.In this experiment, we selected 5000 data points from 29 October at 0:00 for offline model training.Afterwards, we updated the model using data collected before the occurrences of faults on 30 October, 9 November, and 17 November, respectively.On each fault day, the model was updated five times before the fault occurred, using 600 data points collected within a 10 min period for each update.Figure 12 illustrates the changes in HI during the occurrence of faults on three separate fault days.Evidently, when a fault occurs, there is a notable decrease in HI.In this context, we have set the adjustment factor for HI to 0.1, and the fault detection threshold remains at 0.85.The fault detection effectiveness is shown in Table 7.

Conclusions
This paper proposed a fault detection method for the control valve based on online LightGBM model and residual STL decomposition.Initially, a control valve flow model is constructed using online LightGBM, demonstrating better adaptability to control valve conditions changes and higher accuracy than traditional offline models.Furthermore, LightGBM exhibits faster modeling capabilities than XGBoost and BP neural networks, making it more suitable for online learning requirements.Subsequently, the STL method decomposes the residual flow model, thus extracting trend components, which are then transformed into an HI for fault detection.The method is validated using DAMADICS simulation data and Lublin factory data, and it exhibits good performance in detecting abrupt and gradual fault types.The proposed method is influenced to some extent by the construction of the HI (Health Index) and the division of fault thresholds, leading to fluctuations in detection effectiveness when dealing with real factory data.Therefore, subsequent considerations should be given to alternative threshold determination methods to enhance the robustness of the detection approach.
component and the seasonal component obtained from the kth decomposition, respectively.

Figure 1 .
Figure 1.The online learning process.

Figure 2 .
Figure 2. The flow chart of the proposed fault detection method.

Figure 3 .
Figure 3. General scheme of the DAMADICS model.

Figure 4 .
Figure 4.The prediction performance of the backpropagation (BP) neural network: (a) offline model; (b) online model.

Figure 10 .
Figure 10.HI in normal and 5 fault states.
Figure 11  displays the upstream pressure, downstream pressure, temperature, and valve control signals used as inputs for the model, while Figure2shows the corresponding model output of flow rate.

Table 1 .
The mean and variance of the dataset for offline BP and online BP.

Table 2 .
The performance comparison between offline BP and online BP.

Table 3 .
The performance comparison of BP, XGBoost, and LightGBM.

Table 4 .
Prediction performance of offline and online models under changing operating conditions.

Table 5 .
Fault labels and descriptions.

Table 6 .
Comparison of online model fault detection effectiveness.

Table 7 .
Fault detection effectiveness on three fault days.