1. Introduction
Valve stiction (derived from the words “stick” and “friction”) is a prevalent issue in industrial process control systems, especially in chemical plants where thousands of control loops regulate fluid flow through valves. Stiction occurs when static friction in a control valve exceeds dynamic friction, preventing the valve from responding smoothly to control signals. While seemingly mechanical, this phenomenon can result in persistent oscillations in closed-loop systems, leading to decreased process efficiency, increased energy consumption, material waste, and safety concerns. Studies estimate that approximately 80% of industrial control loops exhibit oscillatory behavior, with valve stiction responsible for approximately 30% of these disturbances [
1,
2].
Accurate identification of stiction is essential for effective process optimization. Traditional detection methods, such as the Bicoherence and Ellipse Fitting with Bayesian Information Criterion (BIC) technique introduced by Choudhury [
1], analyze the frequency filtered controller output (OP) and process variable (PV) relationships to infer stiction [
1]. Although such methods are interpretable and grounded in engineering theory, they often require long periods of clean, noise-free data and prior system knowledge. These limitations reduce their robustness and practical implementation in real-world settings.
Recent advances in machine learning (ML) and artificial intelligence (AI) have brought new tools to the process industries, offering the ability to learn automatically from data and adaptively improve system monitoring, control, and maintenance. ML techniques are widely applied to tasks such as inferential measurement, fault diagnosis, autonomous Proportional–Integral–Derivative (PID) tuning, and predictive maintenance. In the context of valve stiction, various ML-based methods have emerged, such as a proposed moving window approach combined with K-means clustering to detect both stiction and abrupt valve closures [
2]. Similarly, researchers have introduced regression and sigmoid-based methods for quantifying stiction by analyzing the derivatives of PV and OP signals [
3].
More recent approaches leverage multi-layer perceptron (MLP) networks and support vector machines (SVMs) for fault detection [
4]. While effective, these methods often struggle with noise sensitivity and reliance on handcrafted features. In contrast, Deep Neural Networks (DNNs), particularly Convolutional Neural Networks (CNNs), provide enhanced capabilities by learning complex nonlinear relationships directly from high-dimensional input data. CNNs have proven successful in signal processing and fault diagnosis, including stiction detection [
5]. However, their performance is heavily dependent on large volumes of labeled data, which are scarce in industrial contexts. To address this, researchers have resorted to simulation-based data augmentation. The Choudhury valve stiction model method was utilized to generate training data for a retrained ResNet50 CNN and validated using real-world datasets from the International Stiction Database (ISDB) [
1]. Other contributions, such as the butterfly shape-based detection method, demonstrated the growing integration of CNNs in model-free stiction detection [
6].
Despite these advances, several challenges remain. Deep learning models often function as “black boxes,” limiting interpretability. Furthermore, their sensitivity to simulation parameters can hinder generalizability.
Recent studies have addressed the interpretability limitations of deep learning models through three principal methodological categories. First, post hoc explanation techniques, such as saliency maps and Gradient-weighted Class Activation Mapping (Grad-CAM), provide visual attribution of model decisions; however, these approaches are often sensitive to noise and may lack consistency across similar inputs. Second, intrinsically interpretable architecture aims to embed transparency within the model structure, though this typically constrains representational capacity and reduces classification performance in complex nonlinear systems. Third, hybrid and physics-informed learning frameworks integrate domain knowledge with data-driven models, offering a balance between interpretability and predictive accuracy. While post hoc methods enhance model transparency, they do not guarantee causal interpretability, whereas hybrid approaches demonstrate improved robustness and physical consistency, particularly in industrial fault diagnosis contexts. Consequently, the integration of wavelet-based representations with CNN architecture in this study aligns with emerging trends toward interpretable, physics-guided deep learning models [
7].
In response to these challenges, this study proposes a hybrid approach that integrates domain knowledge with deep learning. We build on the ellipse fitting concept of traditional methods to preprocess both simulated and real industrial data, eliminating the need for Gaussian and linearity tests. To enhance signal quality, wavelet-based reconstruction is applied to retain relevant spatial and frequency components and improve pattern visibility in OP/PV plots.
The proposed method is evaluated on real industrial datasets from the ISDB [
8], demonstrating its robustness, accuracy, and applicability. By addressing current gaps in generalizability and interpretability, the research contributes to safer, more efficient, and automated control loop monitoring in the process industry.
While time–frequency representations combined with deep learning have been widely reported in fault diagnosis, their direct application to valve stiction in closed-loop control systems is non-trivial. Stiction introduces nonlinear stick–slip behavior that generates limit-cycle oscillations and asymmetric transients governed by controller–process interaction, rather than purely signal-driven anomalies. These characteristics are often insufficiently captured in conventional time- or frequency-domain analyses and are not explicitly considered in many existing time–frequency CNN-based approaches.
In this work, the use of continuous wavelet transform (CWT) scalograms is formulated to retain localized energy patterns associated with stiction-induced dynamics, enabling the representation of intermittent sticking, release events, and sustained oscillations in a structured time–frequency domain. Unlike generic image-based implementations, the scalogram construction and selection are aligned with the underlying control behavior of the system. In addition, the transfer learning strategy based on ResNet50 is adapted to extract features that are consistent with process signal characteristics, rather than natural image features, thereby improving the relevance of the learned representations.
Accordingly, the contribution of this study lies in the development of a control-informed time–frequency learning framework that integrates wavelet-based feature encoding with deep residual networks, specifically tailored to capture nonlinear valve stiction signatures in closed-loop industrial systems.
1.1. Motivation and Contribution
Stiction resulting from the difference between static and dynamic friction can introduce persistent oscillations in closed-loop systems, leading to excessive energy consumption, increased material waste, and compromised safety. Despite its impact, accurately detecting valve stiction remains challenging due to the complexity of industrial environments and the limitations of both traditional and modern techniques [
3]. Traditional feature engineering-based methods, such as the BIC approach, provide interpretability but are constrained by their reliance on clean, long-duration datasets and specific system knowledge. Conversely, ML and AI techniques have shown potential in fault detection applications by learning complex, nonlinear patterns in process data. However, these methods are often hindered by the scarcity of labeled industrial datasets and the black-box nature of their operation [
9].
This study addresses these challenges by developing a hybrid valve stiction detection framework that integrates wavelet-based signal processing with pattern recognition using CNNs, specifically ResNet50. The use of CWT enables effective signal denoising and enhancement, while CNNs provide robust pattern recognition capabilities for distinguishing between stiction and non-stiction behaviors.
The primary contributions of this research are as follows:
The development of a hybrid detection framework that merges traditional engineering principles with advanced deep learning to improve accuracy, interpretability, and applicability.
The implementation of wavelet decomposition for signal preprocessing, which enhances fault-relevant features while minimizing noise and irrelevant fluctuations.
Application of transfer learning with CNNs (ResNet50) to enable effective classification using limited labeled data.
The alignment of simulated and real-world applications through data transformation techniques minimizes discrepancies between simulated and industrial datasets.
Validation on benchmark data from the International Stiction Database (ISDB), demonstrating the robustness and reliability of the proposed method.
1.2. Control Valve Stiction
Stiction occurs when high static friction prevents valve movement until the actuator force exceeds the resistive force, resulting in a sudden “jump” followed by a “slip”. This nonlinear phenomenon has been widely studied, with key models including the two-parameter (deadband + slip–jump) and four-parameter (deadband, stickband, slip–jump, and moving friction) models. When the valve motion stops or reverses direction at point A in
Figure 1a, adhesion between the valve plug and seat causes it to stick. Deadband is the interval in which the valve does not respond to a given input (controller output) range. When a valve that is affected by stiction is in its fully open or fully closed position, it will stick (deadband plus stick band) in that position due to the static friction. When the controller output overcomes the valve static friction, the valve will abruptly jump (slip and jump) into a new position and then move freely (moving phase) [
1].
Figure 1b illustrates the mechanical regions where stiction develops within the control valve assembly, emphasizing the interaction between static and kinetic friction. Various detection methods have been proposed, including shape-based techniques (e.g., ellipse fitting) and model-based approaches, where recent advancements employ data-driven strategies, such as ML, to enhance stiction identification. However, mitigation remains challenging, often necessitating valve maintenance, lubrication, or advanced control tuning. A thorough understanding of stiction mechanisms, supported by the accompanying figures, is essential for improving control loop performance and minimizing energy losses in industrial applications [
10].
1.3. Physical Model Formulation of Valve Stiction
This section describes the underlying mechanics of valve friction, which replicates the behavior observed in actual industrial process data. In the case of a pneumatic sliding stem valve, the force balance can be formulated using Newton’s second law as follows:
Here,
M denotes the mass of the valve’s moving components, and
x represents the relative position of the valve stem. The actuator force is defined as
, where
is the diaphragm area and
u are the actuator’s air pressure or input signal to the valve. The spring force is given as
, with
k representing the spring constant. The force exerted by the fluid pressure drop is expressed as
, where
α represents the plug’s unbalance area and
is the pressure differential across the valve. Additional forces include
the seating force needed to press the valve into the seat, and
, which is the frictional resistance.
and
are omitted from the model due to their negligible effects [
11].
This model distinguishes between static and dynamic (kinetic) friction. The dynamic friction component, described in the first part of Equation (2), includes a constant term
Fc, known as Coulomb friction, and a viscous component
, which scales linearly with velocity. Both components act in opposition to the direction of movement, as indicated by the negative signs.
The second part in Equation (2) indicates a stuck valve. In this case, the valve’s velocity and acceleration are both zero, leading to a net force of zero on the right-hand side of Newton’s equation. As a result, the frictional force becomes . The third part of Equation (2) depicts the instant of breakaway as motion begins. At this point, the net force is expressed as , which becomes non-zero if the magnitude of exceeds the static friction threshold . This imbalance initiates acceleration and causes the valve to start moving.
2. Related Studies
2.1. Data-Driven Valve Stiction Detection Methods
Valve stiction detection has evolved from classical signal analysis to advanced ML and deep learning techniques that exploit routinely available process data. Classical geometric methods such as the PV/OP plot remain foundational. These visual diagnostics infer stiction from characteristic hysteresis loops without requiring direct measurement of manipulator variables. Choudhury et al. pioneered the use of ellipse fitting and BIC on filtered PV/OP data to identify stiction, achieving high empirical discrimination between stiction and tuning issues on long, quasi-stationary datasets [
12]. However, this method’s reliance on long undisturbed intervals and sensitivity to noise limit its utility in dynamic industrial settings. Recent deep learning approaches leverage automated feature extraction to improve interpretability and generalization. Hybrid deep architectures such as Toeplitz matrix encoding with CNN-LSTM have been proposed to preserve both spatial and temporal patterns in OP and PV signals and achieve ~90.47% classification accuracy on public datasets [
13]. While promising, these models are complex, require substantial training data, and involve higher computational cost factors that hinder real-time adoption in large-scale industrial settings.
An alternative image encoding strategy is the Poincaré plot-based CNN, which transforms a single time series into a two-dimensional representation and feeds it to a CNN classifier [
14]. This method eliminates the need for multichannel data but, to date, has only been evaluated qualitatively, with “satisfactory” classification performance reported; explicit metrics such as F1-score or recall are not consistently disclosed in the available literature, making quantitative comparison challenging.
Another learning-based hybrid method is the CNN-PCA framework, which integrates convolutional features with principal component analysis for both stiction detection and severity identification [
15]. On benchmark industrial loops, this method demonstrates balanced false positive and false negative rates, but published results indicate relatively modest classification metrics (e.g., F1 ≈ 0.70 using CNN-PCA on the International Stiction Database) when compared with best-performing deep learning methods [
16]. Severity quantification through
T2 and Q statistics adds practical interpretability, but performance is sensitive to choice of PCA window size and thresholding criteria. Beyond convolutional classifiers, Markov Transition Field (MTF) + CNN encodings have been explored to convert OP/PV series into probabilistic state transition images for CNN training [
11]. Initial studies report correct classification for the majority of loops drawn from diverse industrial sectors; however, these works have yet to publish standard performance metrics such as precision, recall, or F1-score, limiting rigorous comparison to other methods.
2.2. Limitations of Existing Methods and Research Gaps
Across classical and modern approaches, several limitations persist:
Lack of Standardized Metrics: Many proposed methods, particularly Poincaré-CNN and MTF-CNN, do not report complete F1, precision, and recall results, impeding objective comparison and benchmarking. This contrasts with more established evaluations for CWT-CNN and CNN-LSTM models [
13,
17].
Generalization to Real-World Data: Techniques trained primarily on simulated datasets often struggle when applied to noisy, time-varying industrial signals. Hybrid methods such as Toeplitz CNN-LSTM aim to address temporal complexity, but require large, labeled datasets and considerable training effort [
13]. Similarly, integrated CNN-PCA models report balanced detection but show lower F1 relative to deep image-based classifiers [
16].
Interpretability vs. Complexity: Classical methods like ellipse fitting and CNN-PCA provide interpretable severity indices, but at the expense of reduced sensitivity to time–frequency dynamics and limited robustness under non-stationary conditions. Complex deep learning architectures (e.g., CNN-LSTM) offer richer feature learning at the cost of computational and data intensity.
Real-Time Applicability: Most advanced methods are evaluated offline, while real-time implementation remains underexplored, especially for hybrid CNN-LSTM and MTF approaches that require substantial preprocessing.
Severity Quantification: Although methods like CNN-PCA offer severity classification (weak vs. strong), few studies provide consistent numerical severity grading alongside detection. Such capability is crucial for predictive maintenance and prioritized interventions [
18].
These limitations motivate the development of HW-CNN scalogram analysis, which encodes valve dynamics into a time–frequency representation and leverages CNN feature extraction that captures transient and nonstationary behavior without excessive model complexity. By combining CWT with deep CNNs, HW-CNN offers robust classification performance while retaining sensitivity to both transient slip events and long-range oscillatory patterns, addressing key gaps in generalization, interpretability, and real-time applicability [
17,
19].
3. Materials and Method
The proposed HW-CNN integrates continuous wavelet transform based preprocessing with deep CNN classification. The purpose of the wavelet stage is to transform non-stationary time-series signals (OP and PV) into two-dimensional time–frequency representations that preserve transient and oscillatory behaviors induced by valve stiction. By decomposing the signals into localized frequency bands, the wavelet stage enhances discriminative spectral temporal patterns and minimizes noise effects before CNN feature extraction [
19]. The CNN then operates on these scalogram images to learn spatial correlations corresponding to stick–slip transitions, forming an end-to-end HW-CNN pipeline optimized for stiction identification [
20].
In control valve dynamics, stiction generates characteristic patterns within the time-series signals of the OP and PV. During the stick phase, the valve remains stationary or changes only slightly, producing flat or slowly varying OP segments dominated by low-frequency content. Once the control signal exceeds the combined deadband and static friction threshold, the valve transitions into the slip phase, marked by sudden jumps in OP and PV that introduce brief, high-frequency bursts. These behaviors are distinctly represented in the corresponding wavelet scalograms, where stick regions appear as low-to-medium energy concentrations in the lower frequency bands (typically shown as green or blue patches), while slip events form localized, high-energy clusters at higher frequencies (visible as red or yellow islands). Under cyclic stiction conditions, these alternating regions occur periodically over time, reproducing the stick–slip rhythm observed in the time-domain response [
21]. The proposed HW-CNN framework for valve stiction is represented in
Figure 2.
In the proposed HW-CNN framework, these scalogram features serve as the visual signatures from which the CNN automatically learns spatial–temporal correlations. The network identifies and differentiates stick, slip, and transition patterns across scales, enabling robust discrimination between stiction and non-stiction valve behavior without manual feature engineering.
3.1. Framework Integration into the CNN Pipeline
The standardized scalogram images, with reproducible color magnitude mapping, served as consistent inputs for CNN training and classification. This visual justification ensures that the CNN is trained not only on abstract image features but on interpretable representations of physical valve dynamics. The converted scalogram images were then organized into stiction and non-stiction categories and used to train a CNN classifier. This preprocessing pipeline preserves intricate dynamics such as oscillations, delays, and nonlinear valve characteristics, thus enhancing classification accuracy by embedding frequency-specific behaviors directly into the image domain [
22].
3.2. Continuous Wavelet Transform (CWT): Scalograms
Scalograms, derived from the CWT, have emerged as a valuable diagnostic tool for analyzing non-stationary signals such as those encountered in valve stiction phenomena within industrial process control loops. Unlike Fourier-based methods, which offer only global frequency resolution, CWT enables localized time–frequency analysis, thereby enhancing the interpretability of transient events and nonlinear behaviors [
23,
24]. The CWT of a time-domain signal
x(
t) (OP or PV) is mathematically represented as follows:
where
denotes the complex conjugate of the mother wavelet.
Energy density (per time-scale) is as follows:
where
denote the scale and translation parameters, respectively, and
is the mother wavelet function. In this study, the analytic Morse wavelet was adopted, in line with recent studies advocating its superior resolution and tunability for dynamic systems [
25]. Scalograms, defined as the squared magnitude
2, serve as 2D representations of energy distribution over time and scale. These images are particularly suitable for input into CNNs, which require structured spatial features for classification. Recent studies have demonstrated that scalogram based inputs significantly outperformed raw time-series signals in detecting subtle control loop anomalies using deep learning models [
26].
Additionally, filtering techniques such as the low-pass Butterworth filter are frequently employed before CWT to attenuate noise and enhance relevant signal components [
27]. Normalization of power in the scalogram improves contrast, making defect signatures more distinguishable. This approach has shown notable success in recent work focused on fault diagnosis in nonlinear control systems. Thus, the integration of CWT-derived scalograms and CNNs presents a robust methodology for detecting valve stiction, enabling automated, image-based fault classification with high accuracy and generalization.
3.3. Scalogram-Based Preprocessing for Feature Extraction
The raw OP and PV signals were initially preprocessed using CWT for time–frequency analysis and interpretability. However, only the OP-derived scalograms are used as inputs to the CNN classifier. The PV signal is not involved in the model training or classification stages; instead, it may be utilized for comparative analysis and physical interpretation of stiction behavior, particularly to establish time–frequency correlations and validate the presence of stick–slip dynamics. This distinction ensures that the learning process remains focused on actuator-level nonlinearities while retaining PV-based insight for interpretability. In contrast to DWT, which offers limited resolution at low frequencies due to dyadic scaling [
22], CWT provides high resolution, redundant time–frequency representations, enabling robust detection of non-stationary and transient features in control signals. The CWT analyzes signals by convolving them with a family of wavelets localized in both time and frequency, making it particularly effective for capturing signal irregularities indicative of valve stiction [
22,
28].
The resulting CWT coefficients were transformed into 2D and 3D scalogram images, representing the normalized wavelet power spectrum. These scalograms were color-mapped using the jet colormap and resized to a standard resolution for CNN input. The use of scalograms has been shown to improve deep learning performance in time-series classification tasks, especially in industrial condition monitoring and fault diagnosis [
29].
3.3.1. Standardized Color–Magnitude Mapping and Visual Justification Framework
To establish an interpretable link between scalogram features and stiction phenomena, the following framework was implemented:
Signal Acquisition and Transformation: Closed-loop time-series data of PV and OP were generated under stiction and non-stiction scenarios. CWT was applied to these signals to produce scalograms, encoding signal energy distribution in the time–frequency domain.
Standardized Color–Magnitude Mapping: A fixed color scheme (red, green, blue and yellow) was adopted, where warmer tones such as red and yellow correspond to higher wavelet coefficient magnitudes and cooler/darker tones such as green, blue, black indicate lower magnitudes. This mapping was held constant across all scalograms to ensure comparability between stiction and non-stiction datasets. The colormap for red, green and blue (with mixed colors yellow/black), directly corresponds to relative energy intensities in
Figure 3 and
Figure 4.
Time–Frequency Color Association: High-energy oscillatory content in the scalograms was systematically correlated with stick–slip transitions observed in the time series. Regions of low-energy steady-state operation were consistently represented by cooler tones, visually reinforcing the absence of stiction.
Visual Justification Procedure: Time-series plots were first annotated to mark intervals of suspected stiction such as sustained oscillations or asymmetric valve response. Corresponding regions in the scalogram were highlighted, where warm color clusters confirmed the presence of high energy nonlinear dynamics. In non-stiction cases, the scalograms displayed predominantly cooler tones with minimal warm color transitions, confirming steady valve behavior.
For valve stiction detection, the analysis focuses on the energy distribution of the CWT coefficients, computed as follows:
This representation corresponds to the wavelet power spectrum, which captures how signal energy is distributed across time and frequency. In practice, the squared magnitude 2 is employed rather than the raw magnitude, since stiction typically induce pronounced low-frequency oscillatory components whose energetic contrast is more distinctly emphasized in the power domain. The resulting time–frequency energy map is then normalized prior to RGB mapping, ensuring consistent dynamic range and comparability between stiction and non-stiction cases. When encoded on a color scale, the energy intensity at each time–frequency coordinate forms the scalogram image, enabling the convolutional neural network (CNN) to learn discriminative stiction-related patterns from multiscale temporal–spectral features.
Figure 3 represents the PV and OP signals as subplots above the scalogram to provide a time domain to frequency domain correlation. This layout visually links time-domain flat spots in OP with frequency-domain anomalies in PV, making the justification for stiction detection clear in this study.
Figure 4b represents the scalogram top view of the OP signal under valve stiction conditions and reveals distinctive diagnostic features. Notably, the presence of prominent red and yellow regions at lower frequencies is associated with recurrent stick-and-slip behavior. These localized, high-energy bands reflect sudden control signal changes, which punctuate intervals of stable operation. In contrast to smooth operation, the stiction case exhibits abrupt color transitions and discrete energy clusters at higher frequencies. These “energy islands” are indicative of sporadic valve movements and the irregular release of friction, producing bursts of frequency content. The overall texture of the scalogram demonstrates increased variance and less uniform gradients, signifying the erratic and nonlinear dynamics introduced by mechanical ‘sticking’. This distribution of color and intensity signals greater variability in both the timing and strength of control actions, thereby providing qualitative evidence for the presence of valve stiction.
The image in
Figure 4b reveals the following:
A uniform distribution of energy concentrated in lower-frequency bands, corresponding to slow process dynamics.
The absence of abrupt transients or localized high-frequency bursts, confirming the controller’s smooth engagement with the process.
A lack of horizontal ridges or vertical edge patterns, which are commonly associated with discontinuous jumps or valve sticking events.
Such structured and noise-suppressed scalograms provide an ideal reference class for training CNNs, which rely on stable background patterns and distinguishable anomalies in fault classification tasks.
3.3.2. CNN Architecture and Training
The valve stiction classification stage is implemented using a two-dimensional convolutional neural network based on the ResNet50 architecture, selected for its proven ability to learn deep hierarchical representations while mitigating vanishing gradient problems through identity skip connections [
30]. The network processes wavelet-based scalogram images generated exclusively from the OP signal, which directly reflects actuator stick–slip behavior and deadband effects associated with valve stiction [
10,
31].
A pretrained ResNet50 backbone initialized with ImageNet weights is employed as the feature extractor. The original classification head is removed, retaining the convolutional stem and four residual stages composed of bottleneck blocks with identity mappings. These residual connections enable stable gradient propagation and facilitate the extraction of multiscale stiction-related patterns embedded in the time–frequency domain [
32,
33].
The high-level feature maps produced by the final residual block are passed through a Global Average Pooling (GAP) layer, reducing spatial dimensionality while preserving salient activations and improving generalization [
33]. A fully connected layer with 128 neurons and ReLU activation follows, enabling nonlinear feature interaction. To suppress overfitting during fine-tuning, a dropout layer with a rate of 0.5 is applied. Final classification is performed using a single-neuron dense layer with sigmoid activation, yielding the posterior probability of valve stiction.
Network training is conducted using the Adam optimizer with a learning rate of 1 × 10
−4 and binary cross-entropy loss. Mini batches of 64 scalogram images are processed over a maximum of 100 epochs. Early stopping based on validation loss with a patience of 10 epochs is employed to ensure convergence while preventing overfitting. Image augmentation is deliberately avoided to preserve the physical fidelity of time–frequency structures critical for reliable stiction identification [
34].
This transfer-learning-based ResNet50 framework in
Figure 5 enables robust valve stiction detection under limited industrial data availability and is consistent with recent CNN-based fault diagnosis approaches reported in control and condition monitoring research [
33].
Figure 6 presents the proposed structure of the 2D model, with the time-series (PV and OP) graph being converted to a 2D scalogram with only the OP as the input into the CNN. The OP scalogram is selected as the primary input for CNN based classification rather than the PV. The rationale is that valve stiction manifests directly in the actuator movement, which is reflected in the OP signal through characteristic stick–slip patterns, deadband effects, and abrupt discontinuities [
1]. In contrast, the PV response is influenced by process dynamics, transport delays, and external disturbances, which can attenuate or mask the nonlinear features introduced by stiction. Consequently, OP provides a more sensitive and direct indicator of valve nonlinearity, whereas PV embeds secondary process effects that reduce the separability of stiction from normal behavior in the feature space. Prior studies have similarly emphasized the diagnostic superiority of OP-based analysis, noting that stiction symptoms are more distinct and less process-dependent in the actuator domain than in the controlled output [
12]. Thus, employing OP scalograms enhances the discriminative capacity of CNNs for stiction detection while mitigating confounding influences from plant dynamics.
4. Experimental Setup
In this section, we describe the experimental setup used to evaluate the proposed methodology. The datasets, evaluation metrics and implementation details are provided.
4.1. Data Generation
To overcome the scarcity of labeled industrial data, synthetic datasets were produced using a simulated closed-loop system. To generate high-fidelity time-series data that reflect realistic valve stiction conditions, a Single-Input–Single-Output (SISO) feedback control loop was implemented using a First-Order Plus Dead Time (FOPDT) process model. This is representative of a wide range of industrial process control systems where transport or actuation delays and slow dynamics dominate system response. The process transfer function is given by the following:
where
is the process gain,
min is the process time constant, and
is the dead time. The process receives its input from a PI controller implemented as follows:
where
and
. Controller tuning was varied to simulate different levels of closed-loop responses. The closed-loop system receives a unit step change in setpoint at
and runs for 10 min with a sampling interval of 0.01 min, yielding 1000 data points per simulation.
To ensure consistency between the continuous-time formulation and the simulation framework, the process and controller models are discretized as described in the following subsection.
Data Partitioning and Leakage Prevention
To ensure the reliability of the reported results, a structured data partitioning strategy was implemented prior to model training. The time-series data were first segmented into non-overlapping samples, after which the dataset was divided into training (70%), validation (15%), and testing (15%) subsets. To prevent information leakage, segments originating from the same simulation run or industrial operating instance were assigned exclusively to a single subset. This group-wise separation ensures that temporally or dynamically correlated samples are not shared across training and testing phases.
In addition, the partitioning process was performed based on operating conditions and dataset origin (simulated versus industrial), thereby preserving the independence of evaluation data and avoiding bias due to repeated patterns. This approach is particularly important for control systems data, where strong temporal correlation can otherwise lead to overestimated classification performance.
4.2. Continuous-to-Discrete Model Transformation
To ensure consistency between the continuous-time formulation and the numerical simulation framework, the process and controller models defined in Equations (6) and (7) are transformed into their discrete-time equivalents. This transformation is necessary for digital implementation, where signals are sampled at finite intervals and processed iteratively.
4.2.1. Discretization of the Process Model
The process dynamics are represented by a First-Order Plus Dead Time (FOPDT) model in Equation (8):
Rewritten in state-space form:
To obtain a discrete-time representation, the forward Euler integration method is applied. The time derivative is approximated as follows:
Substituting into the continuous-time model yields the following:
where
represents the discrete-time delay in samples.
Rearranging, the discrete-time process model is expressed as follows:
This formulation describes the evolution of the process state as a function of its current value and the delayed control input. The dead time is implemented using a finite-length delay buffer.
4.2.2. Discretization of the PI Controller
The continuous-time PI controller is defined as follows:
where
and
denote the proportional gain and integral time constant, respectively.
The integral term is approximated using forward rectangular integration:
Thus, the discrete-time control law becomes as follows:
The discrete integral gain is directly derived from the continuous-time formulation as , which results in the scaled term in the discrete-time controller implementation.
For efficient implementation, the controller is expressed in incremental form as follows:
where
is the control error.
4.2.3. Integrated Discrete-Time Simulation Framework
The complete discrete-time control loop is implemented sequentially at each sampling instant as follows:
4.2.4. Valve Nonlinearity for Stiction Model
The control signal is processed through the nonlinear valve model to produce the effective valve output .
Process update with delay:
Measured output:
where
represents additive Gaussian noise.
The presented discretization establishes a direct analytical relationship between the continuous-time model and its discrete-time implementation. The use of forward Euler integration ensures computational efficiency while preserving the essential dynamics of the system for sufficiently small sampling intervals. The inclusion of a delay buffer accurately represents transport lag, which is critical in industrial process systems.
This formulation ensures that the simulation framework remains consistent, reproducible, and aligned with the theoretical model, thereby enabling reliable generation of time-series data for subsequent wavelet-based feature extraction and deep learning classification. The derived discrete-time model forms the basis for the simulation framework used to generate both stiction and non-stiction datasets, ensuring consistency between the theoretical formulation and the data used for training the proposed HW-CNN model.
4.3. Choudhury Valve Stiction Model
Following the discrete-time formulation presented in
Section 4.2, the nonlinear behavior of the control valve was modeled using the Choudhury stiction model [
1]. This model incorporates two physical parameters:
Static friction (), which is the minimum magnitude of input change that is required to unstick the valve.
Dynamic deadband (), which represents the minimum change required for continued valve movement after ‘un-sticking’.
The model operates with hysteresis-like logic; namely, when the controller’s output increment is below , the valve remains ‘stuck’ at its last output. Once is exceeded, the valve may begin to move. If the subsequent increment remains below , the valve re-sticks. This behavior introduces discrete nonlinearity and cyclic jumps, observable in the process output. The simulation was implemented in MATLABTM (2024b), where the process dynamics were discretized using forward Euler integration. A delay buffer was used to simulate the 10 min process dead time. At each time step (k), the following operations were executed:
The controller computes the error
and outputs the control signal:
The valve receives OP(k) and employs the Choudhury model which determines whether move or remain stuck based on the change from the previous output (∆u) compared to and
The delayed valve signal is then fed into the process model:
where
represents the Gaussian noise.
The output at each time step OP, valve signal u, and process variable PV is recorded.
Table 1 presents the comprehensive set of parameters employed in the simulation framework. These parameters were strategically selected to span a wide operational space by varying control loop aggressiveness, stiction severity, and measurement noise levels. Specifically, the controller tuning parameters, proportional gain
and integral gain
were varied across a defined range (i.e.,
]) to mimic both conservatively and aggressively tuned control systems. This variation impacts the responsiveness and stability of the control loop. Hence, it influences how the system responds to nonlinear valve behavior [
35,
36]. Concurrently, the stiction model parameters, static friction
and dynamic friction
were systematically modified to simulate various fault severities: pure stiction (
S-type) was modeled by setting
, while stick–slip (
J-type) behavior was induced by enforcing
. These values were drawn from the range
ensuring a diverse representation of fault dynamics [
1]. Moreover, white Gaussian noise with variance
was superimposed on the process variable to simulate real-world sensor disturbances.
This stochastic component introduces realism to the simulated data and is essential for testing the robustness of fault detection models under noisy conditions [
34,
35]. The factorial combination of these parameter sets produced a comprehensive dataset encompassing nominal operation, mild to severe stiction scenarios, and varying signal-to-noise ratios, all sampled over a fixed time horizon of 10 min at 0.01 min intervals (1000 data points per simulation).
For the non-stiction dataset, all simulation parameters were held consistent with those used in the faulted (stiction) simulations, except that the valve was modeled as an ideal linear actuator with no deadband or friction [
1,
15]. This configuration enables a direct comparison between fault-free and faulted behavior under identical process and controller conditions [
1,
35]. The controller output was applied directly to the process through a delay buffer without modification, and white Gaussian noise was superimposed on the process variable to simulate realistic sensor disturbances [
35,
36].
4.4. Evaluation Procedure
To quantitatively evaluate the performance of a classification model, several statistical measures are employed to summarize its predictive capability. The fundamental evaluation parameters include True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) outcomes [
37]. A true positive occurs when the model correctly identifies a positive class, while a false positive represents an incorrect positive prediction. Conversely, a true negative indicates a correctly predicted negative class, and a false negative arises when the model fails to detect a positive instance. These parameters form the basis for computing key performance indicators such as accuracy, precision, recall, and F1-score, which collectively describe the classifier’s reliability and discriminative strength [
38].
To improve the robustness of the performance assessment, all experiments were repeated multiple times with different random initializations and data shuffling conditions. The reported results correspond to the mean performance across these runs, accompanied by standard deviation to quantify variability. This provides a more reliable estimate of model generalization compared to single-run evaluations.
Furthermore, comparative analysis with baseline methods was supported through statistical significance testing, where appropriate, to ensure that observed performance differences are not due to random variation. This strengthens the validity of the comparative conclusions drawn in this study.
Precision measures how reliably the model predicts the positive class, representing the ratio of correctly identified positive instances to all cases predicted as positive, as expressed in Equation (24). In essence, it reflects the model’s ability to minimize false positives and ensure that predicted positive outcomes are genuinely accurate [
39].
Recall, also known as sensitivity [
39], measures the model’s ability to correctly detect all actual positive instances. It quantifies the proportion of true positives relative to the total number of actual positives, combining both correctly identified positives and those misclassified as negatives, as expressed in Equation (25).
F1-score serves as a balanced performance indicator that combines both precision and recall into a single metric. Defined as the harmonic mean of these two measures, it effectively captures the trade-off between correctly identifying positive cases and avoiding false positives. As shown in Equation (26), the F1-score is particularly valuable when assessing models applied to datasets with unequal class distributions, offering a more representative measure of predictive reliability.
Accuracy represents the overall correctness of a classification model by measuring the ratio of correctly predicted instances (both positive and negative) to the total number of observations. It provides a general indication of model performance, as shown in Equation (27), though it may be less reliable when the dataset is imbalanced or dominated by one class.
Model validation was conducted using 3 simulated loops, 2 industrial loops from a smelter and 21 real-world cases from the International Stiction Database (ISDB) [
7], covering a broad spectrum of industrial control loops and operational conditions.
The results were further supported by confusion matrices and ROC curves to assess classification thresholds and generalization capability. This rigorous evaluation mirrors contemporary practices in industrial fault diagnosis, where ResNet50-based classifiers have consistently outperformed traditional ML methods when trained on scalogram inputs [
39]. To further evaluate the applicability of the proposed Hybrid Wavelet–CNN framework in practical scenarios, a comparative analysis was conducted between simulated control loop datasets and real-world industrial datasets. Simulated data exhibit well-defined stiction signatures with low noise, whereas real-world datasets contain higher variability and transient disturbances, resulting in shifts in the amplitude and frequency patterns of wavelet-based features. To address these differences, transfer learning was employed: the CNN backbone was first trained on simulated data and subsequently fine-tuned on a subset of real-world data. This approach improved cross-domain classification performance, demonstrating that the framework can effectively adapt to feature shifts between idealized and real operating conditions, thereby enhancing robustness and accuracy in sim-to-real deployment.
The proposed framework was evaluated on both simulated control loop data and real industrial datasets to assess its generalization capability. While the simulated data provide controlled conditions for isolating stiction effects, the industrial data introduce practical complexities such as measurement noise, process disturbances, and unmodeled dynamics.
A comparative analysis indicates that, although the overall classification performance is slightly reduced on industrial data, the model maintains consistent detection capability. This suggests that the learned feature representations capture underlying stiction characteristics that are transferable across different operating environments. However, the results also highlight the increased variability inherent in real-world conditions, and therefore, the conclusions regarding industrial applicability have been moderated accordingly.
5. Results
The simulated process represents a generic industrial pressure/flow control loop modeled as a First-Order Plus Dead Time (FOPDT) system, which is widely used to approximate real chemical process dynamics. The proposed HW-CNN framework was evaluated on both simulated control loops and real-world industrial datasets to assess diagnostic accuracy under idealized and practical operating conditions. This distinction is critical, as simulated datasets generated using the Choudhury stiction model exhibit well-separated fault signatures, whereas real-world loops introduce process disturbances, measurement noise, and actuator degradation effects that significantly complicate classification. To ensure statistical reliability, all reported performance metrics correspond to averaged results obtained from multiple training runs, with variability accounted for through standard deviation.
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11,
Figure 12 and
Figure 13 illustrate seven control loops from across the ISDB and industry through complementary representations that collectively expose nonlinear friction-induced dynamics. The CWT scalogram reveals persistent low-frequency energy bands with intermittent vertical bursts, consistent with stick–slip cycling. These quasi-periodic structures do not represent broadband disturbances but deterministic oscillations arising from actuator friction.
The OP–PV phase portrait shows asymmetric clustering and saturation-side compression, while the fitted ellipse (BIC) indicates deviation from Gaussian dispersion. The concentration of points near constant OP levels reflects deadband behavior, a classical stiction signature.
The Poincaré map further confirms structured oscillatory dynamics, with pronounced anisotropy, indicating dominant long-term deterministic variability over short-term randomness. PCA reinforces this interpretation: the first principal component explains a high percentage of the variance and reveals a nonlinear switching manifold, corresponding to alternating stick and slip regimes.
While each representation independently suggests friction-induced oscillations, their discriminative power becomes substantially amplified within the proposed HW-CNN framework. Unlike standalone CWT-CNN or MTF-CNN approaches that rely on single-domain texture encoding, HW-CNN integrates structured wavelet energy localization with transition-aware spatial encoding. This hybridization enables the network to simultaneously capture frequency persistence, state-transition regularity, and nonlinear geometric asymmetry [
40,
41]. HW-CNN confidently identifies the deterministic stick–slip signature despite geometric skewness and clustered OP saturation effects. The consistent multi-domain evidence aligns with the network’s learned hierarchical features, demonstrating superior sensitivity to friction-induced limit cycles compared with frequency-only or transition-only models. These results highlight that the proposed HW-CNN does not merely detect oscillations; it differentiates friction-driven deterministic cycling from stochastic or tuning-related oscillations, thereby providing enhanced diagnostic specificity for stiction detection.
Simulated loop results, summarized in
Table 2, indicate that classical shape-based techniques such as ellipse fitting combined with BIC achieve reasonable detection accuracy but are limited by their dependence on stationary oscillations and long data windows [
1]. Image-based learning methods, including PCA-image CNN and Poincaré plot CNN, improve performance by exploiting geometric and phase-space representations, yet remain sensitive to tuning and window selection. Among deep learning approaches, MTF-CNN and conventional CWT-CNN demonstrate strong performance due to their ability to encode temporal transitions and time–frequency energy distributions. The proposed HW-CNN achieves the highest accuracy (96.1%) and F1-score (0.96), reflecting improved sensitivity to localized stick–slip events while maintaining robustness across multiple simulated pressure loops.
In contrast, real-world industrial loop performance reveals a pronounced degradation across all benchmark methods as shown in
Table 3. Ellipse fitting suffers the largest decline, confirming its limited applicability in nonstationary industrial environments. CNN-based image encodings (PCA, Poincaré, and MTF) show moderate resilience, though their performance is affected by process-dependent PV dynamics and noise contamination. Recurrent architectures such as CNN–LSTM marginally improve recall but incur increased model complexity and training instability.
The proposed HW-CNN consistently outperforms all comparison methods on real-world industrial loops, achieving an accuracy of 90.4% and F1-score of 0.90 on the ISDB and smelter data. This improvement is attributed to the exclusive use of OP-based wavelet scalograms, which directly capture actuator-level nonlinearities while suppressing confounding process dynamics present in PV signals. These results demonstrate that HW-CNN offers superior generalization and industrial relevance, addressing a key limitation of existing CWT-CNN and image-based stiction detection frameworks.
Figure 14 presents the F1-score comparison for simulated and real-world control loops. Under simulated conditions, all image-based methods perform strongly due to structured and stationary dynamics, with deep learning approaches outperforming geometric descriptors. The proposed HW-CNN achieves the highest F1-score, indicating enhanced separability of synthetic stick–slip patterns. In real-world industrial loops, performance dispersion increases due to non-stationarity and noise. While conventional methods degrade, CNN-based models remain robust. The proposed HW-CNN maintains the highest F1-score, demonstrating superior generalization and improved sensitivity to nonlinear stiction characteristics under practical operating conditions.
Table 4 consolidates the diagnostic performance of conventional geometric techniques, image-encoding deep models, and the proposed hybrid framework across controlled simulations and real industrial loops. Under simulated conditions, all deep learning strategies demonstrate strong separability owing to structured stick–slip signatures. However, the HW-CNN achieves the highest F1-score (0.96) and near-perfect AUC (0.995), indicating superior discrimination of transient friction-induced dynamics.
Performance divergence becomes more evident in industrial datasets. Classical ellipse-based analysis displays a substantial drop in F1-score (0.75), reflecting sensitivity to noise and non-stationary disturbances. While advanced image-based CNN models retain moderate robustness, their generalization remains constrained by process dependent variability. The HW-CNN maintains the highest industrial F1-score (0.90) and accuracy (90.4%), confirming that OP-based wavelet scalograms enhance resilience to real-world disturbances and actuator nonlinearities. The summarized comparison demonstrates that the HW–CNN integration not only improves classification accuracy under idealized conditions but also preserves diagnostic reliability in practical operating environments, where robustness and generalization are critical.
Ablation Study and Component Contribution Analysis
To rigorously evaluate the contribution of each component in the proposed HW-CNN framework, an ablation study was conducted by systematically modifying key stages of the pipeline. In complex fault diagnosis systems, performance improvements are often attributed to the interaction between preprocessing, feature representation, and network architecture rather than a single dominant factor. Therefore, a controlled experimental design was adopted in which individual components were selectively removed or simplified while maintaining identical training conditions. This approach is consistent with established practices in deep learning-based condition monitoring, where ablation analysis is used to quantify the contribution of multiscale representations and hierarchical feature extraction mechanisms [
8,
40].
Four configurations were considered: (1) direct time-series input without wavelet transformation, (2) CWT-based scalograms without normalization, (3) a shallow CNN architecture replacing the residual network, and (4) the proposed complete HW-CNN framework. All models were trained and evaluated under identical conditions to ensure comparability. Performance was assessed using accuracy, precision, recall, and F1-score, with results averaged over multiple runs to reduce stochastic bias.
The ablation results, presented in
Table 5, demonstrate that the removal of the CWT stage leads to a substantial degradation in classification performance. This confirms that time–frequency localization is essential for capturing nonlinear stick–slip dynamics, which are not readily separable in the raw time domain. Similarly, the absence of normalization introduces variability in the scalogram energy distribution, resulting in reduced classification stability. Replacing the ResNet50 backbone with a shallow CNN further reduces performance, highlighting the importance of deep hierarchical feature extraction for distinguishing subtle nonlinear patterns.
Overall, the results indicate that the proposed performance gains arise from the synergistic integration of wavelet-based multiscale feature encoding and deep residual learning, rather than from any single component in isolation. This observation aligns with recent studies demonstrating that hybrid time–frequency deep learning frameworks achieve superior robustness by combining physically meaningful signal representations with data-driven feature learning [
40,
41].
6. Discussion
The results confirm that diagnostic reliability in valve stiction detection depends primarily on the physical relevance of the signal representation rather than classifier complexity alone. While classical geometric methods such as ellipse fitting with BIC provide interpretable hysteresis descriptors, their performance degrades significantly in industrial datasets due to sensitivity to non-stationarity, measurement noise, and process-dependent distortions. The reduction in F1-score from simulation to real loops highlights their limited robustness under realistic operating variability.
Image-based deep learning approaches (PCA-, Poincaré-, and MTF-based CNNs) improve discrimination by embedding temporal structure into 2D representations. However, these encodings remain indirectly related to actuator friction dynamics. PV-based mappings are influenced by plant dynamics and disturbances, which reduces separability when stiction signatures are subtle or masked by process variability.
The proposed HW-CNN addresses this limitation by focusing exclusively on OP signals and applying continuous wavelet decomposition prior to CNN classification. From a physical standpoint, stiction manifests as multiscale nonlinear behavior, low-frequency stick persistence combined with high-frequency slip-induced bursts. The wavelet transforms preserve temporal localization of these energy concentrations, enabling the CNN to learn hierarchical spatial–spectral features aligned with friction-induced dynamics. This integration results in superior class separability and improved generalization, as evidenced by the highest F1-scores and AUC values across both simulated and industrial datasets.
Importantly, the reduced performance gap between simulation and real data indicates enhanced robustness to noise and operating variability. The scalogram representation also maintains interpretability, allowing for visual correlation between wavelet energy ridges and stick–slip transitions, which is advantageous for industrial acceptance.
To further substantiate the interpretability of the proposed framework, the Gradient-weighted Class Activation Mapping (Grad-CAM) in
Figure 15 was employed to visualize the regions of the scalogram contributing to the CNN’s classification decisions. This approach enables the localization of discriminative features within the time–frequency domain and has been widely adopted for interpreting deep learning models in industrial fault diagnosis applications [
42].
The resulting activation maps indicate that the CNN consistently focuses on high-energy regions corresponding to transient slip events and low-frequency bands associated with sustained stick phases. This alignment between model attention and known physical characteristics of valve stiction confirms that the learned representations are not arbitrary but are directly linked to underlying actuator dynamics. Such behavior supports the notion that wavelet-based representations enhance interpretability by embedding physically meaningful structures within the feature space into the input space, thereby enabling more transparent decision-making in deep learning models [
43].
These activation patterns are illustrated in
Figure 15, which provides a visual interpretation of the model’s focus regions in both stiction and non-stiction conditions.
Overall, the findings demonstrate that combining physics-informed multiscale decomposition with deep hierarchical learning yields a diagnostically robust and industrially viable framework for valve stiction detection.
Comparative Analysis: Traditional Methods vs. Hybrid Wavelet–CNN (HW-CNN)
The performance gap between traditional techniques and the proposed HW-CNN arises from differences in representational depth and physical alignment with stiction dynamics.
Classical geometric methods such as ellipse fitting with BIC rely on static hysteresis descriptors derived from OP–PV phase portraits. While computationally efficient and interpretable, their diagnostic reliability depends on quasi-stationary oscillations and low noise levels. Industrial variability, drift, and asymmetric actuator behavior reduce their separability, leading to significant performance degradation outside controlled simulations [
41].
Image-encoding CNN approaches such as PCA-, Poincaré-, and MTF-based models improve discrimination by mapping temporal signals into structured 2D representations. However, these encodings are indirectly related to friction physics and often embed PV-driven plant dynamics, which introduces confounding variability. Although more robust than geometric descriptors, their generalization remains moderate under real operating conditions [
44].
Conventional CWT-CNN enhances sensitivity by incorporating time–frequency decomposition prior to classification. The wavelet transform captures oscillatory energy localization, improving detection of limit-cycle behavior. Nevertheless, when preprocessing is not strictly actuator-focused, frequency-based features can still reflect plant disturbances rather than friction-specific signatures.
The proposed HW-CNN framework integrates OP-exclusive signal selection with continuous wavelet multiscale decomposition and residual deep learning. This combination directly encodes stick–slip energy persistence and transient breakaway bursts while suppressing irrelevant plant variability. The result is improved separability, reduced simulation-to-industrial performance drop, and superior robustness across datasets.
Figure 16 visually illustrates the proposed HW-CNN method demonstrates that diagnostic superiority is achieved not by increasing model complexity alone, but by aligning multiscale signal representation with the underlying physics of valve stiction.
7. Conclusions
This study presented a HW-CNN framework for the detection of valve stiction in closed-loop control systems. The approach combines CWT-based scalogram representation with a deep residual network to capture the nonlinear and time-varying characteristics associated with stick–slip behavior. By embedding control system dynamics within the feature extraction process, the proposed method enables improved discrimination between stiction and non-stiction conditions compared to conventional signal-based or purely data-driven approaches.
The results demonstrate that time–frequency representation enhances the visibility of stiction-induced oscillatory and transient patterns, which are not readily distinguishable in the time domain alone. The integration of a fine-tuned residual network further improves classification performance through hierarchical feature learning, resulting in consistent detection performance across both simulated and industrial datasets. The inclusion of repeated experimental evaluation supports the reliability of the reported performance, while the ablation analysis confirms that the effectiveness of the framework arises from the combined contribution of wavelet-based feature encoding and deep network architecture.
Interpretability was addressed through Grad-CAM analysis, which indicates that the model focuses on physically meaningful regions within the scalograms. In particular, the activation patterns correspond to low-frequency oscillations and localized transients consistent with known stiction dynamics, providing confidence that the model captures relevant process behavior rather than spurious features. This aspect is important for practical deployment, where transparency and alignment with engineering understanding are required.
Although the proposed framework demonstrates robust performance, the evaluation also highlights the increased variability present in industrial data due to noise, disturbances, and unmodeled dynamics. Future work will focus on extending the methodology towards the quantification of valve stiction severity, exploring adaptive learning strategies, and validating the approach on larger and more diverse industrial datasets.
In summary, the proposed HW-CNN framework offers a technically robust and practically viable approach for valve stiction detection, contributing to the advancement of intelligent monitoring techniques in industrial control systems.