A Physics-Coupled Deep LSTM Autoencoder for Robust Sensor Fault Detection in Industrial Systems

Jia, Weiwei; Ding, Youcheng; Ye, Xilong; Huang, Xinyi; Wang, Maofa; Miao, Chenglong

doi:10.3390/pr14081213

Open AccessArticle

A Physics-Coupled Deep LSTM Autoencoder for Robust Sensor Fault Detection in Industrial Systems

by

Weiwei Jia

^1,3,†,

Youcheng Ding

^2,†,

Xilong Ye

^1,3,†,

Xinyi Huang

^1,3,

Maofa Wang

^1,3,*

and

Chenglong Miao

^2,*

¹

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

²

Guangxi GIG Beihai Electric Power Co., Ltd., Beihai 536017, China

³

School of Applied Science, Beijing Information Science and Technology University, Beijing 100192, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work as co-first authors.

Processes 2026, 14(8), 1213; https://doi.org/10.3390/pr14081213

Submission received: 26 March 2026 / Revised: 6 April 2026 / Accepted: 8 April 2026 / Published: 10 April 2026

(This article belongs to the Section Process Control and Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Reliable sensor fault detection is critical for the safe and efficient operation of complex industrial systems, such as thermal power plants. However, traditional data-driven methods and standard deep learning models often struggle to detect incipient gradual drift faults under severe environmental noise, primarily because they ignore the inherent physical correlations among multivariate sensor signals. To address this challenge, this paper proposes a novel Physics-Coupled Deep Long Short-Term Memory Autoencoder (PC-Deep-LSTM-AE). Specifically, we integrate a deep LSTM architecture with an explicit non-linear information compression bottleneck and layer normalization to enhance robust feature extraction in high-noise environments. Furthermore, we innovatively introduce a Physics-Coupling Loss (PCC Loss) that jointly optimizes the mean squared reconstruction error and the Pearson correlation coefficient, forcing the model to strictly preserve the dynamic physical relationships among multivariable signals. Extensive experiments were conducted on a real-world thermal power plant dataset with severe noise injection. The results demonstrate that the proposed PC-Deep-LSTM-AE achieves an outstanding F1-score of over 0.98, significantly outperforming mainstream baseline models, including Vanilla LSTM-AE, GRU-AE, Bi-LSTM-AE, and CNN-AE. The proposed method exhibits exceptional robustness and high interpretability for root-cause analysis, highlighting its immense potential for real-world industrial deployment.

Keywords:

anomaly detection; Long Short-Term Memory (LSTM); autoencoder; physics-informed neural networks; industrial sensor faults; thermal power plants

1. Introduction

With the rapid development of the Industrial Internet of Things (IIoT), modern industrial systems, such as thermal power plants, are equipped with thousands of sensors to monitor operational status in real time [1,2]. The reliability of these sensors is the cornerstone of advanced process control and condition-based maintenance. However, working in harsh industrial environments characterized by high temperatures and electromagnetic interference, sensors are prone to various malfunctions, including hard faults (e.g., short circuits) and soft faults (e.g., gradual drifts) [3]. If undetected, these anomalies can lead to suboptimal system performance, severe safety hazards, and catastrophic economic losses [4]. Therefore, developing highly accurate and robust multivariate sensor anomaly detection methods has become a crucial research focus in industrial prognostics and health management (PHM) [5]. As the ‘antennae’ through which control systems perceive the external environment, the degradation of sensors not only undermines the stability of closed-loop control but can also trigger cascading system failures [6]. Early fault detection typically relied on quantitative model-based methods [7]; however, establishing accurate mathematical and physical models has become exceedingly difficult in modern complex thermodynamic systems.

Traditionally, statistical multivariate methods, such as Principal Component Analysis (PCA) and Support Vector Machines (SVMs), have been widely utilized for industrial process monitoring [8,9]. However, these shallow learning methods often struggle to capture the highly non-linear dynamics of complex thermodynamic equipment like boilers and turbines [10].

Compared to abrupt hard faults, incipient soft faults, such as gradual sensor drift, are notoriously difficult to detect because their initial deviation amplitudes are often submerged in background noise [11]. Furthermore, sensors in complex industrial equipment (e.g., boilers and turbines) are not isolated; they are governed by underlying thermodynamic and physical processes [12]. A slight drift in one sensor (e.g., main steam temperature) will inevitably cause corresponding fluctuations in physically correlated variables (e.g., desuperheating water flow or exhaust pressure). Accurately capturing these dynamic physical correlations under high-noise conditions is the key to identifying early-stage soft faults [13].

In recent years, deep learning-based reconstruction models, particularly Autoencoders (AEs) and their sequence-to-sequence variants like Long Short-Term Memory Autoencoders (LSTM-AE), have achieved remarkable success in time-series anomaly detection [14,15]. Despite their popularity, standard LSTM-AE models face two major limitations when applied to real-world industrial data. First, traditional architectures often directly pass high-dimensional hidden states to the decoder, making them prone to identity mapping and overfitting to environmental noise, thus generating high false alarm rates [16]. Second, conventional autoencoders rely solely on minimizing the Mean Squared Error (MSE) during training. MSE focuses exclusively on point-wise numerical differences, completely ignoring the structural synchronization and physical coupling relationships among multivariable temporal signals [17].

To bridge these gaps, we propose a Physics-Coupled Deep Long Short-Term Memory Autoencoder (PC-Deep-LSTM-AE) for robust industrial sensor fault detection. First, to prevent the model from overfitting to noise, we design an explicit non-linear bottleneck layer combined with Layer Normalization [18]. This structural design acts as a powerful information filter, forcing the deep LSTM network to discard high-frequency disturbances and extract only the essential steady-state temporal patterns. More importantly, inspired by the recent advances in Physics-Informed Neural Networks (PINNs) [19,20], we design a novel Physics-Coupling Loss (PCC Loss) that integrates the Pearson Correlation Coefficient as a physical constraint penalty alongside the traditional MSE. By explicitly regularizing the covariance matrix of the reconstructed signals, the proposed loss function ensures that the model rigorously adheres to the inter-variable physical dependencies inherent in the normal operating state.

The main contributions of this paper are summarized as follows:

We propose PC-Deep-LSTM-AE, an advanced deep reconstruction architecture that integrates an explicit non-linear bottleneck and layer normalization, significantly enhancing feature extraction robustness against severe industrial noise.

We innovatively introduce a Physics-Coupling Loss (PCC Loss) for unsupervised anomaly detection. This physics-informed constraint forces the model to learn the dynamic inter-variable physical correlations, making it highly sensitive to incipient drift faults that violate normal thermodynamic coupling.

We conducted comprehensive experiments on a real-world thermal power plant dataset under intensive noise injection. The results prove that our method achieves superior detection precision and AUC-ROC compared to state-of-the-art baselines (e.g., Vanilla LSTM-AE, GRU-AE, Bi-LSTM-AE, and CNN-AE), while also providing excellent interpretability for root-cause analysis.

2. Related Work

2.1. Data-Driven Fault Detection in Industrial Systems

With the advent of the Industrial Internet of Things (IIoT) [1] and Industry 4.0 [5], data-driven methods have largely replaced traditional physical-mathematical models for equipment condition monitoring. Recent roadmaps indicate a massive shift toward deep learning algorithms for intelligent industrial fault diagnosis and condition monitoring [2,4]. While current temporal deep learning models perform exceptionally well on abrupt, large-amplitude hard faults, detecting incipient soft faults remains a significant challenge [3]. Yan et al. [11] demonstrated that early-stage gradual drifts are easily masked by operational noise in complex systems. Most existing data-driven classifiers fail to distinguish between normal background fluctuations and these subtle incipient anomalies, leading to either high missed detection rates or excessive false alarms.

2.2. Deep Reconstruction Models for Time Series

To address the lack of massive labeled fault data in real-world industries, unsupervised deep reconstruction models have become the mainstream approach [13]. Autoencoders (AEs) and their sequence-to-sequence variants, built upon the foundation of Long Short-Term Memory (LSTM) networks [14], are particularly popular for multivariable temporal data. For instance, Malhotra et al. [15] successfully applied an LSTM-based encoder-decoder architecture for multivariate time series anomaly detection. In addition to deterministic autoencoders, generative models such as Variational Autoencoders (VAEs) [21] and Generative Adversarial Networks (GANs) [22] have also been explored for modeling data distributions. For instance, the MAD-GAN model utilizes adversarial training to capture spatio-temporal correlations [23]. However, GAN models frequently experience unstable training dynamics and are susceptible to mode collapse, which limits their applicability in industrial scenarios demanding extreme reliability. More recently, advanced architectures like stochastic recurrent networks [16] and USAD [17] have been proposed to enhance robustness. However, these standard deep reconstruction models share a fatal flaw: they are optimized entirely based on the Mean Squared Error (MSE). MSE only measures point-wise numerical distances, completely ignoring the structural synchronization among variables. Furthermore, without an explicit information restriction mechanism, standard LSTM-AEs tend to learn an identity mapping, making them highly vulnerable to severe environmental noise. Recently, attention-based models like Transformers and Graph Neural Networks (GNNs) have also been introduced to time-series anomaly detection [24,25]. For example, Graph Deviation Networks (GDNs) attempt to learn sensor relationship graphs [26]. However, these advanced models often require massive computational resources and struggle to explicitly integrate physical thermodynamic constraints into their loss functions, making our lightweight, physics-coupled LSTM-AE more suitable for noisy industrial environments.

Additionally, recent advancements in graph-based domain-adaptive structural health monitoring (SHM) have demonstrated significant potential in modeling structured inter-sensor dependencies under varying operating conditions [27].

2.3. Physics-Informed Machine Learning

To overcome the “black-box” limitations of purely data-driven models, Physics-Informed Neural Networks (PINNs) have recently emerged as a groundbreaking paradigm [19,20]. By embedding partial differential equations or physical laws into the loss function, PINNs restrict the neural network’s optimization direction to physically feasible regions. Inspired by this, Cheng and Pecht [12] proposed using physics-informed machine learning for multivariate anomaly detection in industrial systems, highlighting the importance of thermodynamic relationships. Building upon these cutting-edge concepts, our proposed PC-Deep-LSTM-AE explicitly introduces a Physics-Coupling Loss (PCC Loss). Unlike traditional models that rely solely on MSE, our model forces the reconstructed signals to preserve the inherent physical covariance matrix of the normal state, thus becoming acutely sensitive to any physical decoupling caused by incipient sensor drift.

3. Methodology

In this section, we present the detailed architecture of the proposed Physics-Coupled Deep Long Short-Term Memory Autoencoder (PC-Deep-LSTM-AE). As illustrated in Figure 1, the overall framework is driven by two synergistic mechanisms: a robust feature extraction network and a physics-informed constraining module. The extraction network employs a deep LSTM encoder decoder architecture seamlessly connected by an explicit non-linear bottleneck (comprising Layer Normalization and a Linear projection layer) to filter out high-frequency environmental noise. During the backpropagation and gradient-based optimization phase, the model is guided by a joint objective function. This function strictly minimizes both the numerical Mean Square Error

(L_{M S E})

and a novel Physics-Coupling Loss

(L_{P C C})

. Specifically, the

L_{P C C}

acts as a thermodynamic constraint by calculating the covariance matrices and penalizing the structural discrepancies between the original and reconstructed Pearson correlation coefficients, thereby ensuring the model firmly adheres to the inherent physical dynamics of the industrial system.

3.1. Problem Formulation

In modern industrial systems, sensors continuously generate multivariate time-series data. Let

X \in R^{T \times M}

denote the collected dataset, where

T

is the total number of timestamps and

M

is the number of sensor variables. To capture the temporal dependencies, we employ a sliding window approach to segment the continuous data into sequences of fixed length

L

. The

i

-th input sequence is defined as

X_{i} = [x_{i}, x_{i + 1}, \dots, x_{i + L - 1}] \in R^{L \times M}

. The sliding window technique is a standard and effective practice in time-series analysis to transform continuous data streams into sequential inputs suitable for deep learning models [28].

The objective of our unsupervised autoencoder model is to learn a mapping function

F : X_{i} \to {\hat{X}}_{i}

, such that the reconstructed sequence

{\hat{X}}_{i}

is as close to the input

X_{i}

as possible under normal operating conditions. Anomalies are subsequently detected by measuring the reconstruction deviation.

3.2. Deep LSTM Encoder Decoder Architecture

To effectively model the complex long-term temporal dependencies inherent in industrial processes, we adopt a deep LSTM network as the backbone.

Encoder: The encoder takes the sliding window sequence

X_{i}

as input. We utilize a two-layer LSTM structure to extract hierarchical temporal features. For a given time step

t

, the hidden state h_t and cell state c_t are updated through the standard LSTM gating mechanisms (forget, input, and output gates). The output of the final LSTM layer represents the high-dimensional temporal feature of the sequence.

Decoder: Symmetrically, the decoder aims to reconstruct the original input sequence from the compressed latent representation. It expands the latent vector back to the original hidden dimension and employs a two-layer LSTM followed by a fully connected (Linear) layer to output the reconstructed sequence

{\hat{X}}_{i} \in R^{L \times M}

.

3.3. Explicit Non-Linear Bottleneck and Layer Normalization

A common flaw in standard seq2seq autoencoders is their tendency to learn an identity mapping, which leads to poor generalization and high sensitivity to background noise. To overcome this, we design an explicit non-linear bottleneck.

Specifically, after the encoder’s LSTM layers, we first apply Layer Normalization (LayerNorm) to the hidden states. LayerNorm stabilizes the internal hidden dynamics across different sensor features, preventing internal covariate shifts and accelerating convergence:

h_{n o r m} = \frac{h - μ}{\sqrt{σ^{2} + ϵ}} ⊙ γ + β .

(1)

Following normalization, the features are passed through a fully connected projection layer that acts as an information bottleneck, forcibly reducing the feature dimension by half (e.g., from 128 to 64). A Leaky-ReLU activation function is then applied to introduce non-linearity. Compared to the standard ReLU, Leaky-ReLU effectively mitigates the ‘dying ReLU’ problem, thereby maintaining a more stable gradient flow during the deep network training [29]. This explicit compression forces the network to discard high-frequency environmental noise and retain only the most robust, underlying steady-state physical representations.

3.4. Physics-Coupling Loss Function (PCC Loss)

The core innovation of our proposed model is the Physics-Coupling Loss function. Traditional autoencoders rely solely on the Mean Squared Error (MSE), defined as

L_{M S E} = \frac{1}{L \times M} \sum | | X_{i} - {\hat{X}}_{i} | |_{2}^{2}

, which only minimizes point-wise numerical deviations. However, variables in industrial equipment (e.g., temperature, pressure, flow rate) are physically coupled.

To ensure the reconstructed signals adhere to the same physical covariance structure as the normal data, we introduce a penalty based on the Pearson Correlation Coefficient (PCC). Let

C

and

\hat{C}

represent the correlation matrices of the input sequence

X_{i}

and the reconstructed sequence

{\hat{X}}_{i}

along the feature dimension, respectively. The PCC loss is formulated to minimize the discrepancy between these two correlation matrices:

L_{P C C} = \frac{1}{M^{2}} \sum_{j = 1}^{M} \sum_{k = 1}^{M} |C_{j, k} - {\hat{C}}_{j, k}|

(2)

The final objective function jointly optimizes the numerical reconstruction accuracy and the physical structure similarity:

L_{t o t a l} = α L_{M S E} + β L_{P C C}

(3)

where

α

and

β

are hyper-parameters balancing the two terms. In this study, they were empirically set to 1.0 and 0.5, respectively, based on a grid search optimization over the validation set to maximize the F1-score. It is worth noting the statistical reliability of calculating the correlation matrix within a sliding window. While our window length is

L = 30

for

M \approx 20

variables, the sample size (30) strictly exceeds the feature dimension, ensuring the covariance matrix is full-rank and mathematically stable. Physically, a 30-step window (representing 5 h of continuous operation at a 10 min sampling rate) is sufficient to capture the localized thermodynamic steady-state correlations without being overly sensitive to transient noise.

3.5. Anomaly Scoring and Dynamic Thresholding

During the inference phase, the anomaly score for a new observation is computed based on its reconstruction error. For the sequence at time

t

, the anomaly score

S_{t}

is calculated as the average squared error across all

M

variables.

Instead of using a hard-coded threshold, we employ a statistical dynamic thresholding strategy. Using a pure normal validation set, we calculate the mean

μ_{n o r m a l} a n d s t a n d a r d d e v i a t i o n σ_{n o r m a l}

of the anomaly scores. The threshold

τ

is determined according to the empirical

3 σ

-rule:

τ = μ_{n o r m a l} + 3 \cdot σ_{n o r m a l}

(4)

If

S_{t} > τ

, an anomaly is flagged. This adaptive thresholding ensures robust detection tailored to the specific noise profile of the equipment.

4. Experiments and Results

In this section, we comprehensively evaluate the performance of the proposed PC-Deep-LSTM-AE model. We first introduce the industrial dataset and the experimental settings. Then, we compare our model against several state-of-the-art baselines. Finally, we provide intuitive visualizations to demonstrate the model’s interpretability.

4.1. Dataset Description and Preprocessing

The proposed method is evaluated on a real-world multivariate time-series dataset collected from the sensor network of a thermal power plant. The dataset comprises over 50,000 continuous operational records with a sampling interval of 10 min. Each data point contains more than 20 physical variables, including main steam temperature, desuperheating water flow, exhaust pressure, and flue gas temperature.

To rigorously test the model’s capability in identifying incipient soft faults under harsh industrial conditions, we constructed a synthetic testing scenario. In real-world thermal power plants, massive historical records predominantly consist of normal operations or abrupt hard faults, while precisely labeled, early-stage gradual sensor drifts are extremely scarce. Therefore, we injected synthetic gradual drift faults into specific sensor channels. The drift fault is mathematically defined as follows:

x_{f a u l t} (t) = x_{n o r m a l} (t) + k \cdot (t - t_{s t a r t}) f o r t_{s t a r t} \leq t \leq t_{e n d}

(5)

where

x_{n o r m a l} (t)

is the original normalized reading,

k

is the drift slope (set to 0.005 per time step), and

t_{s t a r t}

marks the fault onset. The drift persists for a duration of 200 time steps, reaching a maximum deviation amplitude of 1.0 in the normalized scale. Furthermore, to replicate the severe electromagnetic interference typical in industrial environments, we superimposed Gaussian noise

N (0, σ^{2})

onto the normalized test data, where the noise level

σ

was set to 0.15.

Prior to training, all sensor readings were normalized to the

[0, 1]

range using Min-Max scaling based purely on the normal training set.

4.2. Experimental Setup and Evaluation Metrics

Baselines: We compared PC-Deep-LSTM-AE with four widely adopted deep reconstruction models: Vanilla LSTM-AE, GRU-AE, Bi-LSTM-AE, and a 1D Convolutional Autoencoder (CNN-AE). To ensure a rigorously fair and methodologically controlled comparison, all baseline models were matched with the proposed model in terms of parameter budget. Specifically, all models were configured with a hidden dimension of 128 and an identical 2-layer depth. The same random seeds and training epochs with early stopping were applied across all comparative experiments.

Metrics: Since anomalies in industrial systems are rare events, the dataset exhibits severe class imbalance. Under such imbalanced learning scenarios, relying solely on ‘Accuracy’ or the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) can be misleading. Therefore, we evaluate the anomaly detection performance using Precision, Recall, F1-Score, and, most importantly, the Area Under the Precision–Recall Curve (AUCPR) [30,31]. Furthermore, for real-world predictive maintenance, the timeliness of alarming is critical. We introduce Detection Delay (event-level) as a practical metric, defined as the number of time steps elapsed from the actual fault onset

t_{s t a r t}

to the timestamp when the model issues its first consecutive anomaly alarm. A shorter delay indicates a higher sensitivity to incipient degradation.

Implementation and Data Splitting: All models were implemented in PyTorch 1.10.1 and trained on an NVIDIA GPU. To strictly prevent temporal data leakage, the dataset was partitioned chronologically. The first 80% of the normal sequence was used for training, while the remaining 20% was reserved for validation and threshold computation. A buffer zone equivalent to the window length was discarded between the train and validation splits to ensure absolute isolation. During training, the sliding window length

L

was set to 30 with a stride of 1 (an overlap ratio of 29/30) to augment the temporal diversity. Conversely, during the validation and testing phases, the sequences were extracted with a non-overlapping stride of 30 to ensure independent, uninflated performance metrics. The Adam optimizer was used with an initial learning rate of 1 × 10⁻³, accompanied by an early-stopping strategy to prevent overfitting.

4.3. Performance Comparison

Table 1 summarizes the detection performance of all models on the noise-injected testing dataset. To ensure a rigorously fair evaluation, all deep learning models, including the baselines, were strictly configured with identical capacity (a hidden dimension of 128 and a 2-layer architecture) and evaluated under the same non-overlapping sliding window protocol to prevent temporal data leakage.

As shown in Table 1, the proposed PC-Deep-LSTM-AE demonstrates highly effective detection capabilities, consistently outperforming the evaluated baseline models across all metrics. Given the severe class imbalance typical of industrial anomaly detection, the Area Under the Curve (AUC-PR) serves as a more robust indicator than AUC-ROC. Our method attains an AUC-PR of 0.9934, indicating a strong capability in distinguishing incipient faults from normal background fluctuations.

Despite having identical parameter budgets, the baseline models suffer performance degradation under the noisy drift-fault scenario. Notably, CNN-AE exhibits a high precision but a significantly lower recall (0.8005), indicating a severe missed detection rate for gradual temporal drifts. Because the baseline sequence models (Vanilla LSTM-AE, GRU-AE, Bi-LSTM-AE) lack the explicit physical coupling constraint (PCC Loss), they tend to overfit the background noise, leading to either elevated false alarm rates or delayed responses.

Crucially, from the perspective of predictive maintenance, the proposed model achieves an event-level Detection Delay of only 23 time steps. In the context of the 10 min sampling interval, this means the proposed model can issue a reliable alarm nearly 4 h before the gradual fault fully evolves, outperforming the baselines (which lag by 30 to 50 steps). This distinct improvement confirms that the physics-coupled architecture is acutely sensitive to early-stage thermodynamic decoupling, providing a critical time window for engineers to implement preventive interventions.

4.4. Ablation Study

To deeply investigate the individual contributions of the proposed innovative components—specifically, the Physics-Coupling Loss (PCC Loss) and the Non-linear Bottleneck with Layer Normalization (BN)—we conducted a comprehensive ablation study. We designed four model variants for comparison:

Baseline: A standard deep LSTM autoencoder without the explicit bottleneck and LayerNorm, optimized solely using the traditional MSE loss.

NO PCC: This variant incorporates the non-linear bottleneck and LayerNorm to enhance noise robustness, but it is optimized purely with the MSE loss.

NO BN: This variant utilizes the proposed PCC Loss to capture physical dependencies, but it removes the explicit bottleneck and LayerNorm structure.

OURS: The complete PC-Deep-LSTM-AE model, which integrates both the robust bottleneck architecture and the proposed PCC Loss constraint.

The ablation results on the noisy industrial dataset are presented in Table 2.

As observed in Table 2, both the bottleneck design and the PCC Loss contribute positively to the final detection performance. When the Baseline is upgraded with the PCC loss (NO BN), the F1-Score improves from 0.9484 to 0.9792, and the Recall jumps from 0.9050 to 0.9775. This indicates that explicitly constraining the physical covariance structure helps the model become highly sensitive to incipient drift faults that violate normal thermodynamic coupling, thereby significantly reducing missed detections.

Similarly, introducing the bottleneck and LayerNorm (NO PCC) increases the F1-Score to 0.9718 compared to the Baseline. By forcibly compressing the feature space, the model effectively filters out high-frequency environmental noise, preventing overfitting to background disturbances.

Furthermore, to provide a more intuitive comparison, Figure 2 visualizes the performance metrics of all ablation variants. As illustrated, the full model (OURS) consistently dominates across all evaluation criteria, particularly bridging the significant Recall gap present in the Baseline model.

Finally, the full model (OURS) achieves the highest performance across all metrics, with an excellent Precision of 0.9871 and an F1-Score of 0.9898. This firmly validates that combining the robust feature extraction of the bottleneck with the physical relationship tracking of the PCC Loss provides the optimal solution for robust soft-fault detection in complex industrial systems.

4.5. Threshold Sensitivity Analysis

The proposed dynamic thresholding relies on the empirical

3 σ

-rule. To verify the robustness of this choice and address potential concerns regarding the Gaussian distribution assumption of the reconstruction errors, we conducted a threshold sensitivity analysis on the validation set. We compared the

3 σ

heuristic against alternative operating points, including

2 σ

,

4 σ

, and a distribution-free 99th-percentile threshold. The detection F1-scores were 0.9685, 0.9898, 0.9512, and 0.9810 for the

2 σ, 3 σ, 4 σ

, and 99th-percentile thresholds, respectively. The

2 σ

threshold proved overly sensitive, yielding higher false alarms, whereas the

4 σ

threshold caused unacceptable missed detections for incipient faults. The 99th-percentile approach performed admirably, confirming that even without strict Gaussian assumptions, our model’s reconstruction errors cleanly separate normal and anomalous states. Ultimately, the

3 σ

rule was retained as it provides the optimal balance between precision and recall while remaining computationally lightweight for real-time deployment.

4.6. Interpretability and Visualization Analysis

Beyond pure detection accuracy, industrial applications demand high interpretability for root-cause analysis. We visualize the internal mechanisms of our model from multiple perspectives.

Confusion Matrix

Figure 3 illustrates the binary classification confusion matrix. The proposed model almost perfectly separates the normal and faulty states, with near-zero false positives and false negatives, demonstrating its reliability for continuous monitoring.

Feature Contribution Heatmap

To achieve fault localization, Figure 4 presents the feature-wise reconstruction error heatmap during a fault transition. The color intensity directly corresponds to the anomaly contribution of each sensor. Engineers can immediately pinpoint the specific variables exhibiting abnormal physical decoupling, thereby greatly accelerating troubleshooting.

Signal Reconstruction

Figure 5 displays the original versus reconstructed trajectory of a representative sensor. In the normal region, the reconstructed signal tightly tracks the original signal. However, upon entering the fault region, the model (guided by the PCC loss) refuses to reconstruct the anomalous drift, creating a distinct residual gap used for detection.

Latent Space t-SNE

Figure 6 shows the t-SNE projection of the high-dimensional latent vectors

Z

. A clear and definitive boundary separates the normal samples from the fault samples, proving that the proposed encoder effectively maps complex noisy time-series into a highly discriminative latent space.

5. Discussion

The experimental results presented in Section 4 highlight the exceptional capability of the proposed PC-Deep-LSTM-AE in detecting incipient soft faults under severe noise. The superiority of our model warrants a deeper discussion from both theoretical and practical perspectives.

5.1. Theoretical Implications of Physics-Coupling

From a theoretical standpoint, the conventional LSTM-AE is highly susceptible to overfitting background noise because it maps high-dimensional inputs purely based on minimizing the point-wise Mean Squared Error (MSE). In contrast, the integration of the Physics-Coupling Loss (PCC Loss) acts as a powerful physics-informed regularizer. In complex thermodynamic systems like thermal power plants, variables such as temperature, pressure, and fluid flow are governed by strict physical laws. The PCC Loss explicitly penalizes the violation of these inter-variable structural correlations. Consequently, even if a gradual drift fault is numerically minuscule and submerged in Gaussian noise, the disruption it causes to the localized physical covariance matrix will trigger a distinct anomaly score. This explains why our full model maintains a near-perfect Recall rate while baseline methods struggle.

5.2. Practical Value for Industrial IoT (IIoT)

In practical industrial deployments, the “False Alarm Rate” (False Positives) is a critical pain point. High false alarm rates caused by environmental noise often lead to “alarm fatigue” among human operators, rendering the monitoring system virtually useless. As demonstrated in our ablation study and confusion matrix, the explicit non-linear bottleneck combined with Layer Normalization effectively filters out high-frequency disturbances. This architectural design ensures that our model achieves a remarkably high Precision (0.9871). Furthermore, the ability to generate feature-wise reconstruction error heatmaps transforms the model from a simple “black-box” binary classifier into an interpretable diagnostic tool, allowing maintenance engineers to rapidly isolate root causes and minimize equipment downtime.

To further demonstrate the robust nature of our method, Figure 7 illustrates the F1-Score degradation under increasing noise levels (

σ

from 0.05 to 0.25). While the Baseline model experiences a severe performance collapse when

σ

exceeds 0.15, the proposed PC-Deep-LSTM-AE maintains an F1-Score above 0.95 even under extreme noise, proving the exceptional filtering capability of the bottleneck and the physical constraints.

5.3. Limitations and Open Challenges

Table 3 compares the computational complexity, number of parameters, average inference time per sample, and average training time per epoch (tested on an NVIDIA RTX GPU with a batch size of 64). Although the proposed PCC Loss introduces an additional computational overhead of

O (M^{2})

to calculate the correlation matrices during training, the training time per epoch remains highly efficient (approximately 42 s), and the inference time per sequence is only 2.1 milliseconds.

Despite its outstanding performance, this study has limitations that warrant further investigation for real-world IIoT deployment. Firstly, scalability in ultra-large sensor networks remains a challenge. The calculation of the Pearson Correlation Coefficient matrix involves a time complexity of

O (M^{2})

. While this is highly efficient for equipment-level diagnosis (e.g., dozens of variables), scaling it directly to plant-wide monitoring with tens of thousands of interconnected sensors could introduce computational bottlenecks. Secondly, our data preprocessing relies on static Min-Max boundaries established from historical normal data. In reality, industrial plants frequently experience legitimate macroscopic operational shifts (e.g., variable seasonal load demands). Such “concept drift” could cause new, valid sensor readings to exceed the static normalization boundaries, potentially triggering false alarms [32,33]. Future iterations must incorporate adaptive normalization techniques or continual learning mechanisms to dynamically update the safe operational envelope. Thirdly, as genuine incipient drift faults with precise temporal labels are extremely scarce in historical plant records, our current evaluation strictly relies on synthetic anomalies injected into real-world background data. While this approach provides a rigorous and controlled environment for benchmarking model sensitivity, future research must prioritize validating the proposed architecture on fully authentic, naturally occurring degradation events as comprehensive industrial fault datasets become available.

6. Conclusions

In this paper, we proposed a novel Physics-Coupled Deep LSTM Autoencoder (PC-Deep-LSTM-AE) to address the pressing challenge of robust multivariate sensor anomaly detection in complex industrial systems. Traditional data-driven models frequently fail to detect incipient gradual faults under harsh environmental noise because they ignore the underlying thermodynamic dependencies among multivariable signals.

To bridge this gap, our methodology innovatively integrates three core components: (1) a deep LSTM architecture to capture long-term temporal dependencies; (2) an explicit non-linear information bottleneck coupled with layer normalization to filter out high-frequency noise and prevent identity mapping; and (3) a newly designed Physics-Coupling Loss (PCC Loss) that jointly optimizes the numerical reconstruction error and the structural physical correlations.

Extensive evaluations on a real-world, noise-injected thermal power plant dataset conclusively demonstrated the high effectiveness and robustness of our method. The proposed PC-Deep-LSTM-AE achieved an excellent F1-Score of 0.9898 and a Precision of 0.9871, consistently outperforming the evaluated baselines under harsh conditions. Furthermore, the comprehensive ablation study confirmed that both the bottleneck architecture and the PCC Loss are indispensable for achieving high noise robustness and fault sensitivity. Beyond superior quantitative metrics, the model also provides excellent interpretability. The visualized feature contribution heatmaps allow for precise spatial localization of anomalous sensors, greatly facilitating rapid industrial troubleshooting.

Future work will be directed toward three main avenues to further enhance the model’s applicability. First, we plan to introduce Graph Neural Networks (GNNs) to explicitly model the spatial topological relationships among sensors, moving beyond statistical covariance to true physical graph reasoning. Second, we will investigate continual learning and adaptive thresholding mechanisms to dynamically update the physical correlation constraints, enabling the model to tackle “concept drift” caused by varying operational loads. Finally, we will explore lightweight model compression techniques, such as knowledge distillation, to facilitate the deployment of the proposed physics-coupled algorithm directly on resource-constrained edge computing devices.

Author Contributions

Conceptualization, W.J. and M.W.; methodology, W.J. and Y.D.; software, W.J.; validation, X.H. and M.W.; formal analysis, W.J. and X.Y.; investigation, W.J. and X.H.; resources, M.W. and C.M.; data curation, W.J. and X.Y.; writing—original draft preparation, W.J.; writing—review and editing, M.W. and C.M.; visualization, W.J. and Y.D.; supervision, M.W.; project administration, M.W. and C.M.; funding acquisition, M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Guangxi Key Technologies R&D Program (FN2504240038), the National Natural Science Foundation of China (42164002), the Innovation Project of Guangxi Graduate Education (2024YCXS048, 2025YCXS080, 2026YCXS057).

Data Availability Statement

The original raw dataset analyzed in this study is openly available to the public on Kaggle at [https://www.kaggle.com/datasets/pavanjitsubash/power-plant-data-steam-turbine-and-boiler-metrics] (accessed on 22 January 2026).

Conflicts of Interest

Authors Chenglong Miao and Youcheng Ding were employed by the company Guangxi GIG Beihai Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Sisinni, E.; Saifullah, A.; Han, S.; Jennehag, U.; Gidlund, M. Industrial internet of things: Challenges, opportunities, and directions. IEEE Trans. Ind. Inform. 2018, 14, 4724–4734. [Google Scholar] [CrossRef]
Zhao, Z.; Li, T.; Wu, J.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Deep learning algorithms for rotating machinery intelligent diagnosis: An open source benchmark study. ISA Trans. 2020, 107, 224–255. [Google Scholar] [CrossRef]
Wang, J.; Li, Y.; Zhao, J.; Wang, C. Early fault detection of industrial processes using temporal deep learning. Reliab. Eng. Syst. Saf. 2021, 215, 107874. [Google Scholar] [CrossRef]
Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
Zonta, T.; da Costa, C.A.; da Rosa Righi, R.; de Lima, M.J.; da Trindade, E.S.; Li, G.P. Predictive maintenance in the Industry 4.0: A systematic literature review. Comput. Ind. Eng. 2020, 150, 106889. [Google Scholar] [CrossRef]
Isermann, R. Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27, 293–311. [Google Scholar] [CrossRef]
Yin, S.; Ding, S.X.; Xie, X.; Luo, H. A review on basic data-driven approaches for industrial process monitoring. IEEE Trans. Ind. Electron. 2014, 61, 6418–6428. [Google Scholar] [CrossRef]
Widodo, A.; Yang, B.S. Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process. 2007, 21, 2560–2574. [Google Scholar] [CrossRef]
Smrekar, J.; Pandit, D.; Fast, M.; Assadi, M.; De, S. Prediction of power output of a coal-fired power plant by artificial neural networks. Appl. Energy 2009, 86, 2267–2275. [Google Scholar] [CrossRef]
Yan, K.; Huang, J.; Shen, W.; Ji, Z. Unsupervised learning for fault detection and diagnosis of air handling units. Energy Build. 2020, 210, 109741. [Google Scholar] [CrossRef]
Cheng, Y.; Pecht, M. Multivariate anomaly detection for industrial systems using physics-informed machine learning. Appl. Energy 2022, 312, 118745. [Google Scholar] [CrossRef]
Zhang, C.; Song, D.; Chen, Y.; Feng, X.; Lumezanu, C.; Cheng, W.; Ni, J.; Zong, B.; Chen, H.; Chawla, N.V. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. Proc. AAAI Conf. Artif. Intell. 2019, 33, 1409–1416. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Malhotra, P.; Ramakrishnan, A.; Anand, G.; Vig, L.; Agarwal, P.; Shroff, G. LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv 2016, arXiv:1607.00148. [Google Scholar] [CrossRef]
Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar] [CrossRef]
Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. USAD: Unsupervised anomaly detection on multivariate time series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 3395–3404. [Google Scholar] [CrossRef]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar] [CrossRef]
Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2019: Text and Time Series 28th International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Proceedings, Part IV; Springer: Berlin/Heidelberg, Germany, 2019; pp. 703–716. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly Transformer: Time series anomaly detection with association discrepancy. In Proceedings of the International Conference on Learning Representations (ICLR) 2022, Virtual, 25 April 2022. [Google Scholar] [CrossRef]
Deng, A.; Hooi, B. Graph neural network-based anomaly detection in multivariate time series. Proc. AAAI Conf. Artif. Intell. 2021, 35, 4027–4035. [Google Scholar] [CrossRef]
Rezazadeh, N.; De Luca, A.; Perfetto, D.; Lamanna, G.; Annaz, F.; De Oliveira, M. Domain-Adaptive Graph Attention Semi-Supervised Network for Temperature-Resilient SHM of Composite Plates. Sensors 2025, 25, 6847. [Google Scholar] [CrossRef] [PubMed]
Dietterich, T.G. Machine learning for sequential data: A review. In Proceedings of the Structural, Syntactic, and Statistical Pattern Recognition Joint IAPR International Workshops SSPR 2002 and SPR 2002, Windsor, ON, Canada, 6–9 August 2002; pp. 15–30. [Google Scholar] [CrossRef]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30 th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Volume 30, p. 3. [Google Scholar]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 2014, 46, 1–37. [Google Scholar] [CrossRef]
Lu, J.; Liu, A.; Dong, F.; Gu, F.; Gama, J.; Zhang, G. Learning under concept drift: A review. IEEE Trans. Knowl. Data Eng. 2018, 31, 2346–2363. [Google Scholar] [CrossRef]

Figure 1. The architecture of the proposed PC-Deep-LSTM-AE. A non-linear bottleneck (Layer Norm + Linear) connects the Deep LSTM encoder and decoder for robust feature extraction. The network is jointly optimized by minimizing the numerical Mean Square Error

(L_{M S E})

and the Physics-Coupling Loss

(L_{P C C})

. Specifically,

L_{P C C}

constrains thermodynamic dependencies by penalizing the structural error between the reconstructed and original Pearson correlation coefficient matrices.

Figure 1. The architecture of the proposed PC-Deep-LSTM-AE. A non-linear bottleneck (Layer Norm + Linear) connects the Deep LSTM encoder and decoder for robust feature extraction. The network is jointly optimized by minimizing the numerical Mean Square Error

(L_{M S E})

and the Physics-Coupling Loss

(L_{P C C})

. Specifically,

L_{P C C}

constrains thermodynamic dependencies by penalizing the structural error between the reconstructed and original Pearson correlation coefficient matrices.

Figure 2. Bar chart comparison of the ablation study results. The full model (OURS) significantly outperforms the variants lacking the physical coupling loss or the structural bottleneck.

Figure 3. Confusion Matrix.

Figure 4. Feature Contribution Heatmap.

Figure 5. Signal Reconstruction (X-axis represents time steps, Y-axis represents normalized sensor values).

Figure 6. Latent Space t-SNE.

Figure 7. Robustness analysis comparing the F1-Score of the full model (OURS) and the Baseline under varying environmental noise levels.

Table 1. Performance comparison of different models for sensor anomaly detection under severe noise.

Model	Precision	Recall	F1-Score	AUC-ROC	AUC-PR	Delay
Vanilla LSTM-AE	0.8842	0.9210	0.9022	0.8845	0.9743	30
GRU-AE	0.9510	0.9150	0.9326	0.9521	0.9750	42
Bi-LSTM-AE	0.9125	0.9540	0.9328	0.9015	0.9834	46
CNNAE	0.9833	0.8005	0.8825	0.7149	0.7149	50
Ours	0.9871	0.9925	0.9898	0.9971	0.9934	23

Table 2. Ablation study results demonstrating the effectiveness of the proposed components.

Model Variant	Precision	Recall	F1-Score	AUC-ROC
Baseline	0.9611	0.9050	0.9484	0.9900
NO PCC	0.9777	0.9660	0.9718	0.9920
NO BN	0.9809	0.9775	0.9792	0.9954
Ours	0.9871	0.9925	0.9898	0.9971

Table 3. Comparison of model parameters and inference latency.

Model	Parameters (Approx.)	Training Time per Epoch	Inference Time per Sample	Time Complexity (Loss)
Vanilla LSTM-AE	260 K	~28 s	1.5 ms	$O (L)$
GRU-AE	198 K	~24 s	1.2 ms	$O (L)$
CNN-AE	135 K	~15 s	0.8 ms	$O (L)$
OURS	295 K	~42 s	2.1 ms	$O (L) + O (M^{2})$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jia, W.; Ding, Y.; Ye, X.; Huang, X.; Wang, M.; Miao, C. A Physics-Coupled Deep LSTM Autoencoder for Robust Sensor Fault Detection in Industrial Systems. Processes 2026, 14, 1213. https://doi.org/10.3390/pr14081213

AMA Style

Jia W, Ding Y, Ye X, Huang X, Wang M, Miao C. A Physics-Coupled Deep LSTM Autoencoder for Robust Sensor Fault Detection in Industrial Systems. Processes. 2026; 14(8):1213. https://doi.org/10.3390/pr14081213

Chicago/Turabian Style

Jia, Weiwei, Youcheng Ding, Xilong Ye, Xinyi Huang, Maofa Wang, and Chenglong Miao. 2026. "A Physics-Coupled Deep LSTM Autoencoder for Robust Sensor Fault Detection in Industrial Systems" Processes 14, no. 8: 1213. https://doi.org/10.3390/pr14081213

APA Style

Jia, W., Ding, Y., Ye, X., Huang, X., Wang, M., & Miao, C. (2026). A Physics-Coupled Deep LSTM Autoencoder for Robust Sensor Fault Detection in Industrial Systems. Processes, 14(8), 1213. https://doi.org/10.3390/pr14081213

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Physics-Coupled Deep LSTM Autoencoder for Robust Sensor Fault Detection in Industrial Systems

Abstract

1. Introduction

2. Related Work

2.1. Data-Driven Fault Detection in Industrial Systems

2.2. Deep Reconstruction Models for Time Series

2.3. Physics-Informed Machine Learning

3. Methodology

3.1. Problem Formulation

3.2. Deep LSTM Encoder Decoder Architecture

3.3. Explicit Non-Linear Bottleneck and Layer Normalization

3.4. Physics-Coupling Loss Function (PCC Loss)

3.5. Anomaly Scoring and Dynamic Thresholding

4. Experiments and Results

4.1. Dataset Description and Preprocessing

4.2. Experimental Setup and Evaluation Metrics

4.3. Performance Comparison

4.4. Ablation Study

4.5. Threshold Sensitivity Analysis

4.6. Interpretability and Visualization Analysis

5. Discussion

5.1. Theoretical Implications of Physics-Coupling

5.2. Practical Value for Industrial IoT (IIoT)

5.3. Limitations and Open Challenges

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI