Bridging the Gap in Chemical Process Monitoring: Beyond Algorithm-Centric Research Toward Industrial Deployment

Ji, Cheng; Ma, Fangyuan; Rao, Jingzhi; Wang, Jingde; Sun, Wei

doi:10.3390/pr13123809

Open AccessReview

Bridging the Gap in Chemical Process Monitoring: Beyond Algorithm-Centric Research Toward Industrial Deployment

by

Cheng Ji

^1,2

,

Fangyuan Ma

²

,

Jingzhi Rao

²,

Jingde Wang

^2,*

and

Wei Sun

^2,*

¹

School of Chemistry and Chemical Engineering, Huaiyin Normal University, No. 111 West Changjiang Road, Huaian 223300, China

²

College of Chemical Engineering, Beijing University of Chemical Technology, North Third Ring Road 15, Chaoyang District, Beijing 100029, China

^*

Authors to whom correspondence should be addressed.

Processes 2025, 13(12), 3809; https://doi.org/10.3390/pr13123809

Submission received: 16 September 2025 / Revised: 17 November 2025 / Accepted: 24 November 2025 / Published: 25 November 2025

(This article belongs to the Special Issue Processes in 2025)

Download

Browse Figures

Versions Notes

Abstract

Over the past few decades, industrial process monitoring has been greatly advanced by machine learning and data analytics, with sophisticated models continuing to be developed and validated on benchmark processes such as the Tennessee Eastman simulation process. Yet, these models face significant barriers in real-world industrial deployment, as their increasing complexity leads to poor generalization, limited transparency, and difficulty in simultaneously achieving fault detection and root cause diagnosis, creating a persistent academia–industry gap. To address this challenge, this review systematically synthesizes three core categories of data-driven process monitoring methods (statistical, probabilistic, and deep learning-based approaches), delves into cutting-edge techniques tailored to mitigate the aforementioned practical constraints, and conducts an in-depth analysis of current obstacles hindering industrial application. Our analysis identifies a predominant algorithm-centric bias in current research, marked by overreliance on simulated benchmarks, insufficient model interpretability, and a misalignment with industrial requirements for lightweight and operable solutions. By integrating theoretical progress with industrial priorities, this study offers a cohesive roadmap for future research, establishing a foundation for developing more efficient, reliable, and secure chemical processes that bring practical utility to both academic and industrial practice.

Keywords:

industrial process safety; Tennessee Eastman process; process information incorporation; model generalization performance; industrial interpretability of AI

1. Introduction

In the era of Industry 4.0, smart manufacturing for the chemical process industry aims to maximize economic competitiveness while significantly reducing safety incidents [1]. This evolution is achieved by the integration of cyber-physical systems, big data analytics, and artificial intelligence (AI) into traditional manufacturing frameworks. It enables real-time decision-making and adaptive control. However, chemical processes show inherent complexity, characterized by nonlinear dynamics, multivariable interactions, and stringent safety requirements. These pose unique challenges that necessitate robust monitoring strategies. Generally, a fault can be defined as at least one measurement that deviates from its pre-defined acceptable operating regions [2]. Chemical processes are designed to operate under steady-state conditions. Key parameters such as temperature, pressure, flow rates, and concentrations remain within narrow bounds to ensure product quality and safety. However, real-world production is inherently dynamic. Random factors including feedstock variability, equipment degradation, environmental fluctuations, and operational uncertainties introduce frequent process disturbances. Most of these disturbances do not immediately lead to catastrophic failures. Yet they often obscure the early detection of true faults by introducing noise or masking abnormal trends in process data. This creates a critical challenge for monitoring systems, as distinguishing between normal variability and genuine fault conditions becomes increasingly difficult. To this end, chemical process monitoring deserves substantial emphasis for industrial operations. It serves as the foundation for ensuring operational efficiency, process safety, and product quality. Process monitoring facilitates real-time evaluation of multiple measurements, enabling early detection of abnormal conditions, mitigation of potential hazards, and enhancement of production efficiency.

Over past decades, massive amounts of historical data, which contain intrinsic process operation details, have become accessible due to the widespread of Distributed Control Systems (DCS) [3,4,5]. However, as chemical facilities grow in scale and complexity, operators struggle to identify abnormal operating conditions sorely relying on raw measurement data from DCS. This delay in detection prevents timely interventions to mitigate issues at an early stage [6]. As factory digitization advances rapidly, process monitoring techniques empowered by big data analytics have gained attention within the industrial sector [7,8,9]. Data-driven process monitoring synthesizes the operating status of all available measurements into one or a few statistical metrics, which enables operators to observe abnormal process deviations promptly while effectively reducing false alarms [10].

Feature extraction is pivotal for data-driven process monitoring, as it captures critical process information from raw data to ensure robust performance. Among various data-driven feature extraction approaches, Multivariate Statistical Process Monitoring (MSPM) has emerged as a promising branch, attracting growing attention and widespread industrial application. Since Principal Component Analysis (PCA) and Partial Least Squares (PLS) were introduced to process monitoring in the early 1990s, latent variable models based on multivariate statistics have continuously evolved and given rise to a series of variants to address diverse complex data characteristics in industrial processes [11,12]. Researchers developed dynamic PCA and dynamic latent variable methods to tackle process dynamics [13,14,15,16], While kernel-based techniques were introduced to accommodate process nonlinearity [17,18,19]. Gaussian Mixture Models (GMMs) and Independent Component Analysis (ICA) have been utilized to deal with non-Gaussian processes [20,21,22,23], while stationary subspace analysis and Cointegration Analysis (CA) were adopted for non-stationary process data [24,25,26]. Despite these notable achievements, most studies focus exclusively on fault detection, whereas identifying the root causes of faults to assist operators in promptly taking measures to eliminate them is crucial.

To address this gap, MSPM methods have been integrated with probabilistic models to achieve fault detection and root cause diagnosis simultaneously [27]. As a representative probabilistic tool, Bayesian Networks (BN) excel at mapping causal relations and conditional dependencies while capturing uncertainty [28]. Hybrid modeling combining BN with traditional fault detection and diagnosis methods leverage the strengths of each method to overcome individual limitations, demonstrating promising application potential [29,30,31]. However, modern process industries feature increasingly large and complex facilities, with sensors extensively deployed to capture diverse dynamic behaviors, which leads to strong correlations among measurements. When combined with various random operational factors, multivariate statistical methods often fail to fully capture the inherent information embedded in process data.

In recent years, process monitoring approaches based on deep neural networks have emerged as a viable alternative [32,33,34,35]. With the advent of Industry 4.0, instrumentation technology has advanced significantly, making sensor data ubiquitous across industrial settings. Meanwhile, process industries have witnessed substantial growth in facility scale and complexity. In response to these trends, Artificial Intelligence (AI) methodologies have proven effective in modeling intricate nonlinear relationships among process variables [36]. Consequently, the chemical industry is increasingly incorporating AI into its digital transformation initiatives. Among AI techniques, Deep Neural Networks (DNNs) stand out as a prominent tool, exemplified by the rise of large-scale models like ChatGPT 3.5. These AI-based models hold great promise for various engineering applications, including fault detection. Equipped with numerous trainable parameters, deep learning models exhibit powerful data fitting capabilities. They have been widely applied to monitor complex chemical processes characterized by significant nonlinear and time-varying dynamics. Over the past decade, advanced deep learning architectures have been continuously proposed in fields such as time series analysis, image recognition, and natural language processing, and have achieved promising results in fault detection tasks. A summary of keywords extracted from the literature is visualized in Figure 1, which illustrates that deep learning has secured a solid position in the field of chemical process monitoring. Nevertheless, despite its remarkable capabilities, current chemical process monitoring still faces challenges that hinder the realization of its full potential.

Previous studies have comprehensively explored process monitoring, with Venkatasubramanian et al. first categorizing it into quantitative [37], qualitative [38], and process history-based models [39] in the 1990s. Subsequently, with the rapid development of DCS, data-driven process monitoring has garnered extensive research attention, where Ge et al. [40] and Yin et al. [3] separately provided comprehensive summaries of basic data-driven process monitoring models. Additionally, numerous successful applications have been gradually reported, using the benchmark industrial Tennessee Eastman (TE) process as the case study. Ge reviewed industrial process applications [4], Melo et al. aggregated benchmark platforms for process monitoring to support model validation for researchers [41]. Further, Apsemidis et al. [19] and Qin et al. [16] summarized kernel methods and latent variable methods for process monitoring, respectively. Quiñones-Grueiro et al. addressed multi-operating condition process monitoring [9], and Ji and Sun classified and reviewed process monitoring methods based on diverse complex features of industrial data [7]. However, these existing review articles primarily center on the application of advanced algorithms in process monitoring tasks, with most of their demonstrated use cases confined to simulated processes. Reports on practical engineering applications remain scarce, reflecting a growing disconnect between academic research and industrial implementation despite the rapid evolution of algorithms in this field. Notably, industrial applications face more complex and realistic challenge that have not been the focus of previous reviews, which lack a systematic examination of the shortcomings of process monitoring methods in real industrial settings.

As shown in the flowchart in Figure 2, this review aims to systematically examine the common dilemmas faced by academic research in process monitoring and fault diagnosis, revealing the growing disconnection between theoretical advancements and industrial implementation. Studies indicate that current academic efforts exhibit an excessive “algorithm-centric” bias, manifested through three critical contradictions: First, researchers prioritize constructing complex deep learning architectures or multi-algorithm fusion frameworks while neglecting industrial demands for lightweight, interpretable solutions. Second, case-specific optimizations (e.g., tailored for particular equipment or operating conditions) dominate the field, resulting in algorithms with limited generalization capabilities to address the dynamics and complexity of real-world industrial processes. Third, overreliance on simulated benchmarks like the TE dataset for validation leads to feature engineering detached from actual operational data and ignores dynamic interdependencies among process variables. These methodological biases trap academic innovations in the “laboratory efficacy” realm, failing to bridge the gap to industrial applicability. Therefore, it is imperative for future research to address these contradictions by fostering closer collaboration between academia and industry, ensuring that theoretical advancements are aligned with practical industrial needs and can be effectively translated into real-world applications.

2. Data-Driven Process Monitoring

In this section, traditional data-driven process monitoring methods, along with recent advancements, are overviewed The selection of papers discussed in this review was guided by a structured framework based on the three established methodological paradigms in data-driven process monitoring including statistical, probabilistic, and deep learning-based approaches. The literature was primarily identified through systematic searches in major academic databases. The aim is to include seminal works that founded each paradigm alongside high-impact contemporary studies that represent significant advancements or state-of-the-art performance, with a focus on research that clearly demonstrates a method’s principles, strengths, and limitations.

2.1. Statistical Process Monitoring Methods

Under ideal steady-state continuous operation, process data typically exhibit static, linear, and Gaussian characteristics. Traditional MSPM techniques, including PCA, ICA, and PLS, adequately perform feature extraction via linear projection [23,42,43]. By analyzing the variance-covariance of process variables, MSPM approaches decompose raw data into a principal component subspace and a residual subspace. Hotelling’s T² statistic and squared prediction error statistic are computed, respectively, in these subspaces to enable fault detection [44]. However, as modern industrial facilities expand in sacle, industrial processes have become highly nonlinear and dynamics. Additionally, during chemical reactions, specific measurements are also associated with spatial relationships arising from mass or heat transfer within the unit. To capture these complex features, data-driven process monitoring has witnessed rapid advancements in analytical models in last twenty years. To address nonlinearity, kernel PCA (KPCA) and kernel PLS (KPLS) have been developed by leveraging the kernel trick to map original data to a higher-dimensional space where linearly separability is achievable [45,46]. For dynamic processes, dynamic PCA (DPCA) employs data augmentation to extract autocorrelation from time series data [13]. As a hybrid of DPCA and KPCA, Dynamic Kernel PCA (DKPCA) has been deployed for monitoring nonlinear dynamic processes [47]. Nevertheless, these methods inevitably induce substantial dimensionality growth, restricting their applicability to large-scale industrial systems. Beyond DPCA, dynamic latent variable methods have been further proposed to capture process dynamics by establishing autoregressive models on latent variables. Within this category, Zhou et al. developed auto-regressive PCA [48], while Dong and Qin introduced the dynamic-inner PCA method [13]. These dynamic latent variable techniques avoid data augmentation but are limited to capturing one-step-delay autocovariance across all variables. They remain impractical for real-world applications, as different variables exhibit distinct dynamic behaviors, particularly those with periodicity, that cannot be fully captured by a unified time delay. Another class of multivariate methods designed for processes with diverse characteristics is feature-based method. Wang and He proposed Statistics Pattern Analysis (SPA) to monitor the statistical features of process variables rather than the process variables themselves [49,50]. Based on this idea, more related methods, such as statistics Mahalanobis distance and augmented kernel Mahalanobis distance [51,52,53], have been proposed for incipient fault detection. The feature-based methods can consider data characteristics comprehensively, but they could also lead to feature redundancy, resulting in poor fault detectability.

2.2. Probabilistic Process Monitoring Methods

Beyond traditional multivariate statistical techniques, probabilistic models, with Bayesian Networks (BN) as the primary representative, have been extensively documented for applications in risk analysis and process monitoring [28]. These probabilistic frameworks have been widely integrated into Risk-Based Maintenance (RBM) strategies to minimize hazards to humans and the environment caused by unexpected equipment failures [54]. Arunraj and Maiti comprehensively discussed the techniques and applications of RBM [55]. Amin et al. developed a risk-based fault detection and diagnosis method using the R-vine copula probabilistic model [56]. Yu et al. proposed a risk-based fault detection method based on self-organizing map and probabilistic analysis [57]. As another effective probabilistic model, BN excels at mapping causal relations and conditional dependencies while capturing uncertainty. It has been proven superior to methods like Markov chains and fault trees in reliability and risk analysis [58]. Khakzad demonstrated the superior performance of BN in safety analysis through a comparison with the fault tree [59]. Furthermore, BN shows promising potential in hybrid modeling for simultaneous fault detection and propagation path analysis. Amin et al. proposed a hybrid model combining PCA and BN to identify root causes and fault propagation pathways [27]. Yu et al. developed a two-stage hybrid fault diagnosis approach that utilizes modified ICA and BN modeling to effectively detect faults and trace propagation paths [60]. To enhance its performance in nonlinear processes, Gharahbagheri et al. further proposed a hybrid method that incorporated KPCA and BN [61]. Gonzalez and Huang integrated BN with kernel density estimation for industrial process monitoring [62]. Amin et al. verified that integrating multivariate fault probability into fault detection and diagnosis can enhance performance [31]. Overall, hybrid modeling of BN with traditional fault detection and diagnosis methods leverages the strengths of each approach to overcome individual limitations, demonstrating promising application prospects. However, probabilistic models require training data containing fault or disturbance information to construct accurate causal network structures for fault reasoning. In contrast, industrial process historical data are typically imbalanced, with scarce fault samples.

2.3. Deep Learning Methods

Deep learning, a subfield of machine learning, employs multi-layer neural networks to mimic human cognitive processes, extracting meaningful features from raw data for applications such as prediction, fault detection, and diagnosis. Owing to their large number of trainable parameters, deep learning models exhibit strong data-fitting capabilities and have been widely adopted for monitoring complex chemical processes with pronounced nonlinear and time-varying dynamics [63]. A common approach involves using artificial neural networks as predictive models, where residuals between predictions and actual values serve as monitoring indicators for fault detection [64,65]. Subsequent studies have further advanced these foundations. For instance, Heo and Lee applied ANN for fault detection and classification [66]. Yu and Yan developed a fault detection technique based on unstable neurons in hidden layers, aggregating multi-level features to enrich fault-related information [67]. The application of ANN in FDD for power systems and renewable energy industry has also been widely reported [68,69]. These advancements further underscore the enhanced potential of ANN in FDD applications.

Among deep learning methods, unsupervised autoencoders (AE), which is a special neural network structure with an encoder and a decoder, are mostly employed for process monitoring. Their capability to approximate complex nonlinear mappings through activation functions makes them particularly effective for feature extraction [40,70,71]. In an AE, the encoder compresses input data into latent representations, while the decoder attempts to reconstruct the original input. Monitoring is typically performed by evaluating the reconstruction error [72,73]. To enhance performance under varying data characteristics, numerous AE variants have been developed. These include adversarial AEs, which constrain latent variables to specific distributions for improved manifold representation [74]. Yu and Zhao proposed a denoising AE to extract features from noisy input data for robust process monitoring [75]. Lee et al. proposed a variational AE (VAE) based process monitoring to deal with process data with nonlinear and nonnormal features simultaneously [76]. Other innovations incorporate structural constraints to address specific limitations. Orthogonal AEs reduce feature redundancy via orthogonality regularization [77]. Orthogonal self-attentive VAE, jointly enhance detection performance and model interpretability [78], and stacked AEs (SAE) deepen the architecture to improve generalization [79]. To ensure the features extracted by the neural network conform to a normal distribution, Kingma et al. proposed a variational autoencoder (VAE) method [80].

For capturing dynamic temporal dependencies, recurrent neural networks (RNNs) are commonly employed [81]. Standard RNNs incorporate memory mechanisms to propagate temporal information across hidden states. However, they are prone to gradient instability, either explosion or vanishing, during backpropagation over long sequences [82]. To mitigate this, gated architectures such as long short-term memory (LSTM) and gated recurrent units (GRU) have been introduced. These models use gate mechanisms to regulate information retention and have demonstrated strong performance in modeling process dynamics [83,84]. Subsequent studies have further refined these approaches. Aliabadi et al. incorporated an attention mechanism into LSTM for catalyst activity prediction in methanol reactors [85]. Ren et al. combined LSTM with autoencoders (LSTM-AE) to model time-dependent behavior in batch processes [86]. Cheng et al. developed a variational recurrent AE (VRAE) that merges VAE with RNNs, enabling monitoring through latent distribution shifts [87]. Ji et al. proposed a differential RNN for quality prediction and monitoring of industrial penicillin fermentation processes, in which a difference unit is embedded to capture short-term time-varying information [88]. Furthermore, Hong et al. proposed multi-order difference embedded LSTM to capture the period information of data [89]. Additional hybrid models include the orthogonal self-attentive VAE for simultaneous detection and identification [78], dynamic inner AEs with vector autoregressive latent modeling [90], Cheng et al. developed a variational recurrent AE (VRAE) that merges VAE with RNNs, enabling monitoring through latent distribution shifts [87]. Yu et al. proposed a convolutional LSTM-AEs that leverage forget gates to capture long-term dependencies [91]. Additionally, Zhang and Qiu also proposed a semi-supervised LSTM ladder autoencoder that utilizes both labeled and unlabeled data to improve diagnostic accuracy [92].

To further consider the local and spatial features of data, Convolutional Neural Network (CNN) has also been introduced from the field of image processing technology. In CNN, the neighborhood information is extracted through the sliding of the convolution kernel, by which the spatial relationship can be considered. Ma et al. proposed a residual-based process monitoring method based on CNN and PCA [65]. To consider both temporal and spatial features simultaneously, spatiotemporal sequence forecasting neural networks have been further proposed and widely applied to video recognition. As a generalization to CNNs, three-dimensional convolution and pooling operations have been proposed to extract spatiotemporal features. However, three-dimensional CNN contains a large number of parameters, which will cost a huge memory in the network training process, limiting its efficiency in practical application. As another option, Shi et al. proposed a convolutional LSTM network (ConvLSTM) to give precise and timely prediction of rainfall intensity, and the convLSTM network has become a baseline method [93]. Based on convLSTM, Bayram et al. proposed a real-time detection method to monitor acoustic anomalies in industrial processes [94]. Wang et al. proposed a predictive recurrent neural network (PredRNN) to address the limitation of the layer-independent memory mechanism in ConvLSTM [95]. As an extension of this work, Wang et al. proposed a new neural network structure called improved predictive recurrent neural network (PredRNN++), which includes causal LSTM and Gradient highway unit to make the network deeper in time and alleviate the gradient propagation difficulties in deep predictive models [96].

To provide a systematic and comparative overview of existing process monitoring methods, Table 1 summarizes the core implementations, key findings, research gaps, and future directions of statistical, probabilistic, and deep learning-based approaches. Building on this comprehensive analysis. It is evident that process monitoring models rooted in deep learning are trending towards increasingly complex and deeper structures, aiming to thoroughly extract the intricate features embedded within process data. Meanwhile, several limitations have emerged. Specifically, the surge in training parameters, accompanied with a finite number of training samples, compromises the extrapolation performance of a model. Moreover, the opaque nature of connections between nodes, coupled with the randomness inherent in the training of neural networks within deep models, renders them as black boxes. This opacity and randomness consequently constrain the model interpretability and impede their reliability in real-world industrial settings. A thorough exploration of these specific limitations, alongside the achievements, will be elaborated in the subsequent section.

3. Current Limitations and Achievements in the Development of Data-Driven Process Monitoring Models

3.1. Overreliance on Simulated Benchmarks

The current landscape of data-driven process monitoring research is heavily characterized by an overreliance on simulated benchmarks, with the TE dataset being a prominent example. As a simulation process, the TE process provides a controlled, steady-state scenario with relatively straightforward data characteristics. This simplicity contrasts with the inherent complexity of real-world industrial processes, which are marked by significant nonlinearities, dynamic behaviors, and frequent stochastic disturbances. These real-world processes are influenced by a multitude of factors, including equipment wear and tear, environmental variations, feed uncertainty, capacity variation and human interventions, all of which contribute to their unpredictability and complexity [97]. Consequently, highly sophisticated models developed specifically for the TE dataset often yield diminishing returns when applied to industrial settings, as the idealized conditions fail to capture the intricate and evolving nature of industrial scenarios.

Consequently, the ability of a model to produce reliable results on the TE dataset does not necessarily equate to its effectiveness in practical industrial applications. While the TE dataset provides a useful starting point for model validation and comparison, it fails to capture the true essence of the challenges faced in industrial process monitoring. Its simulated nature simplifies the data generation process, overlooking the intricate interplay of variables and uncertainties that are omnipresent in real-world settings. This oversimplification leads to a situation where models trained and tested on the TE dataset may demonstrate robust performance in controlled environments but struggle to generalize to the unpredictable and varied conditions encountered in industry.

A comprehensive statistical analysis of literature published in the field of process monitoring over the past decade, as depicted in Figure 3, reveals a concerning trend: a substantial portion of research still relies on the TE dataset and simulation processes as primary benchmarks. This widespread adoption highlights a significant gap between academic research and industrial application. Academic research often prioritizes theoretical advancements and the pursuit of high accuracy on benchmark datasets, sometimes at the expense of practical relevance. In contrast, industrial applications demand models that can perform reliably in real-world scenarios, where data is noisy, incomplete, and subject to constant change. Furthermore, consistent findings with previous studies could be found that even the limited number of research that has been applied in industrial processes failed to provide their corresponding datasets to make them publicly accessible [98]. This lack of data sharing poses significant challenges for reproducing research results, conducting comparative studies, and advancing the field through cumulative knowledge building. The absence of standardized, publicly available industrial datasets perpetuates the research community’s dependence on simulated benchmarks like the TE process. The continued reliance on simulated benchmarks like the TE dataset hinders the development of models that can effectively address the real-world challenges faced by industries. It perpetuates a cycle where academic research remains disconnected from the practical needs of industry, leading to a slow adoption of data-driven process monitoring technologies in real-world settings. This gap not only limits the impact of academic research but also deprives industries of the potential benefits of advanced data-driven monitoring techniques.

To bridge this gap, there is an urgent need for researchers to shift their focus towards real-world industrial data. By incorporating data from actual industrial processes, researchers can develop models that are better equipped to handle the complexities and uncertainties inherent in these environments. Real-world data provides a richer source of information, capturing the nuances and variations that are characteristic of industrial processes. This shift in focus not only enhances the practical relevance of academic research but also accelerates the adoption of data-driven process monitoring technologies in industry. The process monitoring approach has begun to find practical applications in real-world industrial scenarios. Ma et al. proposed a CNN-based fault detection method that takes into account the spatial distribution of sensors within equipment, enabling the effective extraction of spatial features [99]. Ji et al. introduced information entropy-based methods to better capture the characteristics of normal operating conditions and the broader temporal correlations in the data, facilitating real-time diagnosis of fault propagation paths in industrial processes [100]. Additionally, our previous study has demonstrated remarkable success in various industrial units, such as cracking furnaces, continuous catalytic reforming units, and Dalian Methanol to Olefin (DMTO) processes. For example, in the case of a continuous catalytic reforming unit at a company, the application of this method has resulted in annual cost savings of approximately RMB 87 million (approximately $12.4 million) by reducing fault-related losses. Rojas et al. reviewed the application of AI-driven models in mining industry [101]. Patil et al. further summarized the potential application scenarios of fault detection methods in industrial systems [102]. Although several studies have reported on the application of process monitoring in industrial processes, none have provided sufficient application details or publicly accessible data. Moreover, no general process monitoring framework has been successfully applied in industrial settings to date. It is evident that industrial data presents both challenges and opportunities for researchers, driving further advancements in process monitoring technologies.

To address these challenges, several pathways can be explored. Promoting the creation of openly accessible industrial-inspired datasets, through anonymization and standardization efforts, would provide valuable resources for the research community. Furthermore, modifying existing benchmarks by introducing realistic disturbances, dynamic operating conditions, and equipment degradation profiles can help bridge the fidelity gap between simulation and real plant data. Such enhanced benchmarks would encourage the development of models that are not only accurate but also robust and transferable, ultimately accelerating the adoption of data-driven monitoring solutions in industrial practice.

3.2. Limitations of Deep Learning Models in Data-Driven Process Monitoring: Complexity, Interpretability, and the Path Forward

The growing complexity of deep learning models in data-driven process monitoring has increasingly become a double-edged sword. These models, characterized by intricate architectures with multiple layers and a vast number of parameters, are designed to capture subtle patterns and complex relationships inherent in data. However, such sophistication brings a significant cost by causing an exponential increase in trainable parameters [103]. This surge in complexity poses a particular challenge when dealing with limited training data, such as the mere hundreds of samples typically available from the TE process dataset. In their relentless pursuit to extract every ounce of information from the data, researchers have resorted to constructing neural networks with thousands, if not millions, of parameters. While these models may demonstrate impressive performance on the training data, they often fall short when it comes to generalization. The consequence is a model that is overfitted to the specific characteristics of the training data, rendering it unreliable for extrapolation to unseen datasets or real-world industrial applications. Such models may perform admirably within the confines of their training environment, but they fail to generalize to the broader, more varied conditions encountered in actual industrial processes. This lack of generalizability is a critical limitation, as it undermines the practical utility of these models in real-world scenarios.

Moreover, the issue of interpretability presents a second, equally formidable barrier. Deep learning models, by their very nature, are often considered as black boxes. They process inputs through numerous nonlinear transformations to produce outputs, without providing any insight into the underlying mechanisms or the relative importance of input features in the final decision. This lack of transparency is not merely an academic concern. It also strikes at the heart of effective process monitoring [104]. In this context, where diagnosing the root causes of faults or process deviations is paramount for safety and quality control, a simple prediction output is insufficient. Operators and engineers require causal understanding to make informed decisions and take timely, corrective actions. For instance, being alerted to an anomaly is of limited value if the model cannot indicate which process variables were the primary contributors. This inability to “debug” a model’s reasoning can foster mistrust and significantly hinder adoption, as the cost of an unexplained error in an industrial setting, such as an unplanned shutdown or a safety incident, can be prohibitively high. Without advancements that bridge the gap between predictive accuracy and interpretability, the widespread deployment of these powerful models in critical industrial settings is likely to remain limited.

While certain research efforts, such as Physics-Informed Neural Networks (PINNs), aim to improve model interpretability by incorporating process knowledge and mechanistic information into the model architecture or loss function, these approaches are still in their nascent stages. They often involve adding mechanistic constraints to the loss function, which, rather than effectively integrating process knowledge, can sometimes hinder the model fitting ability and overall performance. The challenge lies in finding the right balance between incorporating process knowledge and maintaining the model flexibility and adaptability. Too much constraint can lead to a model that is too rigid to capture the complexity of real-world data, while too little can result in a model that is too flexible and prone to overfitting. To address the limitations of pure data-driven deep learning models, hybrid modeling approaches have emerged as a promising avenue for future research. These approaches seek to blend the precision and pattern-recognition capabilities of data-driven methods with the interpretability and generalizability of mechanistic models. By combining the strengths of both approaches, hybrid models aim to overcome the limitations of each and provide a more comprehensive solution to process monitoring challenges.

Innovations like PINNs and Graph Neural Networks (GNNs) represent significant steps forward in this regard. PINNs, for instance, incorporate physical laws and principles into the neural network architecture, allowing the model to learn from both data and prior knowledge. This approach not only improves the model accuracy and robustness but also enhances its interpretability, as the model predictions can be traced back to the underlying physical laws. Similarly, GNNs leverage the graph structure of process data to capture the relationships and dependencies among different variables, providing a more intuitive and interpretable representation of the process [105,106,107]. For instance, Zheng et al. demonstrated the effectiveness of incorporating physical information into the loss function of a recurrent neural network, leading to better predictive performance and increased interpretability [108]. By aligning the model predictions with physical laws, they were able to improve the model generalization ability and make it more reliable for real-world applications. Similarly, Bai et al. proposed an enhanced Kalman filter combined with neural networks, leveraging their feature extraction and nonlinear fitting abilities to investigate the relationships among intermediate variables and achieve accurate predictions [109]. This approach not only improved the model accuracy but also provided insights into the underlying process dynamics, enhancing its interpretability.

However, despite these advancements, hybrid modeling approaches still face significant challenges that impact their practical implementation. They require a comprehensive understanding of the prior knowledge related to chemical process mechanisms during model development. Effectively integrating such knowledge is often complicated by its complex nature, scattered across thermodynamics, kinetics, and control theory, each with its own paradigms and terminologies. This creates a fundamental integration gap, where consolidating multidisciplinary insights into a coherent, computationally tractable model remains a major undertaking. Consequently, the development process often becomes bottlenecked not by algorithmic limitations, but by the difficulty of systematically encoding and reconciling expert knowledge from diverse sources.

Beyond knowledge integration, the practical deployment of these hybrid methods encounters several persistent obstacles. Data quality stands as a critical issue, as real-world industrial data is typically characterized by noisy, incompleteness, and inherent uncertainty. In a hybrid framework, which relies on the synergy between data-driven and mechanistic components, poor data quality can severely undermine the learning process, leading to biased parameter estimates and unreliable predictions. This issue is further compounded by the inherent complexity of hybrid models. The fusion of first-principles equations with sophisticated algorithms like deep neural networks, while powerful, often results in architectures that are computationally intensive and difficult to optimize. This complexity not only increases the computational burden but also introduces significant challenges in model training, interpretation, and maintenance.

Finally, the dynamic nature of industrial processes imposes a requirement for continuous model updating and validation, which is a challenge that is often underestimated. A hybrid model is not a static entity. It must adapt to changes in feedstock, catalyst deactivation, and evolving operating conditions. Without a robust framework for online or periodic recalibration and validation, even a well-designed model will experience performance drift over time, gradually losing its accuracy and reliability. This creates a critical need for lifecycle management strategies specifically tailored for hybrid models, ensuring their long-term viability and trustworthiness in real-world production environments.

4. Research on the Establishment of Monitoring Statistics

4.1. Establishment of Monitoring Statistics

In the intricate domain of feature extraction models, process variables undergo a transformation into a latent space, a meticulously crafted representation that segregates the data into two fundamental subspaces: the Principal Component Subspace (PCS) and the Residual Subspace (RS). This segregation is based on the variance characteristics inherent in the variables, where those exhibiting significant variance are housed within the PCS, encapsulating the lion’s share of the informational content derived from the original variables. Conversely, variables with minimal variance are relegated to the RS, which predominantly comprises noise in the data. This partitioning is pivotal for fault detection mechanisms, as it allows for targeted analysis of the data structural integrity.

For fault detection purposes, two primary statistics serve as the bedrock of dissimilarity measures: the

T^{2}

statistic and the Squared Prediction Error (SPE) statistic. The

T^{2}

statistic operates within the PCS, providing a quantitative assessment of variations that deviate from the normal behavior encapsulated by the significant principal components. On the other hand, the SPE statistic monitors the RS, alerting to anomalies that manifest as deviations in the residual noise structure.

Given a set of normalized historical data

X_{n \times m}

with

n

samples and

m

variables, the projection matrix

P_{m \times k}

can be derived through the Lagrange multiplier method,

X_{n \times m} = T {P_{m \times k}}^{T} + E

(1)

where

T = X_{n \times m} P_{m \times k}

is the score matrix,

k

is the number of the principal components, and

E

represents the residual. The score matrix

T

represents the projected coordinates of the original data in the PCS that captures the dominant variation patterns, while

E

contains the residual information not explained by the first k components. For a new sample

x

, the

T^{2}

statistic is used to measure changes in the PCS,

T^{2} = \sum_{i = 1}^{k} \frac{{({p_{i}}^{T} x)}^{2}}{λ_{i}}

(2)

where

p_{i}

denotes the projection of the

i

th PC direction, and

λ_{i}

denotes the corresponding eigenvalue. The

T^{2}

statistic essentially measures the Mahalanobis distance of the projected sample within the PCS, quantifying how far the current operation deviates from the normal variability captured by the reference model. Comparatively, the SPE statistic is employed to monitor the RS,

S P E = \sum_{i = k + 1}^{m} {({p_{i}}^{T} x)}^{2}

(3)

The SPE statistic, also known as the squared reconstruction error, measures the extent to which the sample deviates from the established principal component model, with large values indicating the presence of abnormal patterns not consistent with normal operation. Assuming the process data adhere to a multivariate Gaussian distribution, control limits

T_{α}^{2}

and

δ_{α}^{2}

are established for

T^{2}

and SPE statistics, respectively, based on statistical theory [110]. These control limits serve as thresholds to detect anomalies in the respective subspaces.

Traditional statistics like the T² and SPE are undeniably pivotal in fault detection frameworks, providing valuable insights into how process variables behave within their corresponding subspaces. Yet, a distinct limitation emerges when these statistics deliver conflicting signals when one potentially exceeding its control threshold while the other stays within normal ranges. This inconsistency complicates the process of forming a definitive judgment on whether a fault exists. For example, if the T² statistic shows no abnormalities, the source and nature of the fault may remain unclear. Such conflicts arise from multiple factors, such as the intrinsic complexity of the process, noise interference, or nonlinearities that linear principal component analysis fails to fully capture.

Additionally, these statistics differ in their sensitivity to different types of faults. The T² statistic tends to be more responsive to faults impacting multiple variables at once, while the SPE statistic excels at identifying isolated or subtle anomalies in individual variables. Thus, relying solely on either metric could result in either over-detection (false positives) or under-detection (false negatives) of faults, depending on the specific characteristics of the fault.

4.2. Fusion Statistics

In scenarios, where comprehensive fault detection is paramount, the integration of

T^{2}

and SPE statistics into a single metric becomes essential. Initially, a combined statistic

ζ

was proposed to amalgamate fault detection results,

ζ = c \frac{T^{2}}{T_{α}^{2}} + (1 - c) \frac{S P E}{δ_{α}^{2}}

(4)

where

ζ

is the combined statistic,

c

is a weight coefficient between 0 and 1, and the control limit for this statistic is 1. However, practical applications revealed limitations, particularly in cases where ζ remained below 1 despite either T² or SPE exceeding their individual thresholds, indicating potential faults that were not adequately captured.

To address this shortcoming, the combined statistic was refined to ensure a more robust fault detection mechanism,

ξ = \frac{T^{2}}{T_{α}^{2}} + \frac{S P E}{δ_{α}^{2}}

(5)

with the control limit derived under the assumption of a multivariate Gaussian distribution. This optimized statistic, ξ, serves as a comprehensive metric for fault detection through synthesizing anomaly information from both the PCS and RS, delivering a more sensitive and trustworthy signal of potential process faults. With the establishment of a suitable control threshold, it becomes feasible to effectively monitor the process and quickly detect deviations indicative of incipient faults, thus improving operational efficiency and safety.

The integration of

T^{2}

and SPE statistics into a unified metric for comprehensive fault detection has evolved significantly, driven by the need to address limitations observed in early linear combinations such as the weighted statistic ζ (Equation (4)). While ζ aimed to balance contributions from PCS and RS through a user-defined weight c, its linear structure often failed to detect faults localized in a single subspace, as anomalies in one space could be diluted by normal behavior in the other. This led to scenarios where ζ remained below the control limit even when

T^{2}

or SPE individually exceeded their thresholds. To overcome this, the additive statistic ξ (Equation (5)) was proposed, decoupling the weights and summing normalized

T^{2}

and SPE values under the assumption of multivariate Gaussianity. This refinement improved sensitivity, as faults in either subspace would directly elevate ξ, but it also introduced dependencies on distributional assumptions and control limit derivations that may not hold in real-world applications.

Subsequent research has expanded fusion strategies beyond linear combinations. For instance, nonlinear methods like kernel principal component analysis (KPCA) were employed to map

T^{2}

and SPE into high-dimensional spaces, enabling fault detection in nonlinear processes [111]. Probabilistic frameworks, such as Bayesian fusion, integrated likelihood ratios of

T^{2}

and SPE to handle uncertainty, offering a more robust approach under noisy conditions [112]. Adaptive weighting mechanisms further emerged, dynamically adjusting cc in Equation (4) using online learning or entropy-based criteria to reflect evolving process conditions [16]. Hierarchical fusion strategies were also developed, where

T^{2}

and SPE thresholds were optimized independently before aggregation, reducing interference between PCS and RS anomalies. These advancements highlighted the trade-offs between computational complexity and detection accuracy, particularly in high-dimensional systems.

The advantages of fusion statistics lie in their capacity to integrate insights from complementary subspaces. By combining

T^{2}

, which captures systematic variations in PCS, and SPE, which identifies unstructured residuals in the RS, fusion metrics like ξ offer a comprehensive perspective on process operating status. This dual focus reduces blind spots inherent to single-statistic approaches, such as SPE’s insensitivity to small shifts in PCS or

T^{2}

’s inability to detect RS-specific faults. The additive structure of ξ further enhances sensitivity, ensuring that any subspace anomaly triggers an alert, a critical feature in safety-critical systems like nuclear reactors or chemical plants. Additionally, fusion frameworks offer flexibility; for example, in semiconductor manufacturing, where RS faults (e.g., sensor drifts) dominate, the weighting in ζ can be tuned to prioritize SPE, while in batch processes with strong PCS correlations,

T^{2}

might be emphasized.

Nevertheless, fusion statistics encounter significant limitations. The multivariate Gaussian distribution assumption underlying the control limit frequently fails to hold for industrial data, which is often characterized by skewness, multimodality, or heavy tails, leading to false alarms or missed detections. Nonparametric alternatives, such as Kernel Density Estimation (KDE) or copula-based methods, have been proposed but demand substantial computational resources. High-dimensional data amplify such difficulties, as estimating control limits for both T² and SPE becomes unstable, necessitating dimensionality reduction techniques like sparse PCA. Another concern is parameter sensitivity. The weight c in ζ or the choice of fusion function (linear versus nonlinear) exerts a profound impact on performance, and suboptimal selections, typically based on historical data, may underperform under novel fault modes or dynamic process shifts. For example, in continuously adaptive catalytic cracking processes, static fusion rules may lag behind operational changes, requiring real-time recalibration. Furthermore, concurrent faults in both the PCS and RS can obscure individual contributions, complicating root cause analysis without auxiliary tools such as contribution plots or Shapley values.

Recent trends integrate machine learning to address these limitations. Deep learning architectures, such as autoencoders, extract latent features from raw data and fuse them with traditional statistics, enhancing detection in complex systems like wind turbines [113]. Reinforcement learning (RL) frameworks dynamically adjust fusion parameters based on real-time feedback, optimizing detection in transient states common in robotics or energy grids. Attention mechanisms have also been applied to weigh

T^{2}

and SPE adaptively, prioritizing anomalies correlated with critical faults. Despite these innovations, challenges persist in interpretability, as black-box models obscure the logic behind fusion decisions, hindering trust in safety-critical applications. Future research may focus on hybrid models that marry the transparency of classical statistics with the power of machine learning, or on causal fusion frameworks that distinguish root faults from propagated effects in networked systems. Edge computing implementations could further enable real-time fusion in resource-constrained Industrial IoT environments, while human-in-the-loop systems might integrate operator expertise to refine fusion rules iteratively. Ultimately, the evolution of fusion statistics will hinge on balancing sensitivity, robustness, and interpretability across diverse and dynamic industrial landscapes.

4.3. Residual Statistics of Deep Learning Models

In the realm of process monitoring models based on deep learning, the construction of monitoring statistics plays a pivotal role in ensuring the reliability and accuracy of fault detection systems. Deep learning models, particularly those employing autoencoder architectures, have gained significant attraction for their ability to extract intricate features from complex datasets and subsequently reconstruct the input data. The reconstruction error in deep learning-based monitoring, often used as the training objective, functionally corresponds to the SPE statistic in PCA. Unlike the well-defined principal components in PCA, the latent features extracted by deep learning models are often difficult to interpret. It is crucial to note that most existing works that leverage autoencoders for this purpose have primarily focused on training using reconstruction error, while overlooking the effective regularization of the distribution of latent representations. The reconstruction error serves as a direct measure of how well the autoencoder can reproduce the input data from its latent representation. In the context of fault detection, a higher reconstruction error often indicates an anomaly or deviation from the normal operating conditions, thereby signaling a potential fault. This approach has been widely adopted due to its simplicity and effectiveness in identifying deviations. Nevertheless, relying solely on reconstruction error without considering the underlying distribution of latent representations can lead to limitations in the model ability to generalize and detect subtle faults, especially in dynamic and complex systems [114,115].

The distribution of latent representations in autoencoders reflects the essence of the learned features and plays a critical role in the model performance. Effective regularization of this distribution can enhance the model robustness, improve its generalization capabilities, and enable it to better capture the underlying patterns in the data. By imposing constraints on the latent space, such as encouraging sparsity or enforcing a specific distribution (e.g., Gaussian), the model can learn more meaningful and discriminative features [77,116]. This, in turn, can lead to more accurate fault detection by reducing false positives and negatives, and by enabling the model to distinguish between normal variations and actual faults.

Despite the potential benefits of regularizing the distribution of latent representations, most existing works in the field of fault detection using autoencoders have not adequately addressed this aspect. This oversight can be attributed to various factors, including the complexity of implementing such regularization techniques and the lack of standardized methods for evaluating their effectiveness. However, as the demand for more reliable and accurate fault detection systems grows, it becomes increasingly important to explore and develop novel approaches that can effectively regularize the latent space and leverage its full potential for fault detection.

5. Evaluation Metrics for Process Monitoring Models

After establishing a process monitoring model, its performance requires evaluation using appropriate metrics. Beyond the conventional metrics of Fault Detection Rate (FDR) and False Alarm Rate (FAR), other performance indicators such as Accuracy, Precision, Recall, and F1-score are commonly employed in machine learning classification tasks. While these metrics provide valuable insights in balanced classification scenarios, process monitoring is fundamentally an anomaly detection problem where data is highly imbalanced, with normal operation samples significantly outnumbering fault samples. In such contexts, FDR, FAR offer more intuitive and operationally relevant assessments. FDR and FAR are defined as follows,

F D R = \frac{T P}{T P + F N}

(6)

F A R = \frac{F P}{F P + T N}

(7)

where

T P

is the abbreviation of true positives, which means the number of fault samples that are correctly detected.

F N

is the abbreviation of false negatives that represents the number of fault samples that are failed to detect.

F P

corresponds to false positives, representing the number of normal samples that are wrongly detected as a fault, and

T N

is the abbreviation of true negatives that represents the number of normal samples that are correctly identified to be normal.

FDR (equivalent to Recall) indicates the model’s ability to capture true faults, while FAR reflects the rate of false alarms during normal operation. These two metrics often exhibit a trade-off: excessive pursuit of high FDR may lead to an unacceptably high FAR, rendering the model impractical in industrial settings where frequent false alarms can erode operator trust and cause unnecessary disruptions. Moreover, in industrial practice, determining the precise onset time of a fault is often challenging. It is recommended that future studies estimate the Fault Detection Time (FDT) based on factory operating records or DCS event logs, which document operator interventions and system alarms, to establish a consistent and realistic benchmark for evaluating detection timeliness. FDT measures how quickly a model can detect a fault after it occurs, providing a more practical assessment of its effectiveness in real-world scenarios.

Unfortunately, most current studies fail to consider FDT in their evaluations, and a significant portion even omits FAR reporting. This incomplete assessment undermines model reliability and makes it challenging to fully understand practical applicability. The absence of these critical metrics, combined with the inherent scarcity of fault data in industrial environments, where rare fault conditions and collection costs limit comprehensive data acquisition further complicates model validation. Additionally, dynamic process variations and unexpected disturbances mean that historical data often cannot cover all potential fault scenarios, leading to potential misdiagnosis of novel faults during online operation.

To address these challenges, future research should focus on developing fault detection models that simultaneously achieve high FDR, low FAR, and minimal FDT. This requires incorporating continuous learning mechanisms to adapt to evolving process conditions, alongside advanced data analytics techniques to handle real-world variability. Furthermore, efforts should be made to enhance data collection strategies and explore synthetic data generation methods to improve model robustness and generalizability across diverse industrial scenarios.

6. Future Research Suggestions

Despite the remarkable advancements of deep learning in process monitoring, key challenges such as inadequate model interpretability, limited generalization to industrial scenarios, and superficial mechanism-data fusion remain unresolved. To address these gaps and bridge the academia–industry divide, three targeted future research directions, as shown in Figure 4, are proposed below, focusing on latent space optimization, model structure innovation, and process information integration.

6.1. Advanced Latent Space Constraints for Deep Learning Models

In most applications of deep learning models for process monitoring, the AE structure is adopted to use reconstruction error as a constraint for the model to ensure that the latent features extracted by the model can reconstruct the training data. However, the distribution of the latent representation has not been effectively regularized, such that the latent features extracted by the model are not interpretable or even involved in fault detection.

For fault detection tasks, the latent space can be further constrained to improve the feature extraction capability and fault detection performance of the model. On the one hand, orthogonal constraints can be introduced to make the latent features extracted by the model as independent as possible, thereby removing redundancy in the latent space and capturing more distinctive features [77]. On the other hand, the Siamese network structure can be adopted, which employs a multi-input structure with shared parameters [115]. In this way, training data is input into the network in pairs for training, which not only expands the number of available training samples without increasing the model complexity and the number of parameters but also facilitates the construction of a contrastive loss to constrain the range of normal data samples in the feature space, thereby improving the fault detectability. These strategies of further constraining the latent space hold great promise for advancing fault detection in process monitoring with deep learning models. Future investigations could delve deeper into combining these approaches with other emerging techniques to achieve even more robust and interpretable process safety management systems.

6.2. Advancements in Deep Learning Model Structure for Process Monitoring

On the other hand, process monitoring models based on deep learning are developing towards deeper networks and more complex models. In particular, advanced algorithms from the fields of natural language processing, image recognition, and time series prediction are continuously being introduced. Theoretically, given a certain depth with sufficient neurons, a neural network model can capture almost any complex relationship among measurements. The feature processing capability of the model improves with increasing complexity. However, the significant increase in model parameters imposes higher requirements on the quantity and quality of training samples and also increases the randomness of model training, making the already uninterpretable “black-box models” even more pronounced.

In fact, flexibly adjusting the structure of the neural network model for fault detection tasks, rather than increasing the depth of the model and the dimensions of hidden nodes, may more effectively improve the process monitoring performance. For example, the recently proposed Kolmogorov–Arnold Network (KAN) may provide new solutions for the interpretability research of data-driven models [117]. This neural network can directly obtain the formula mapping relationship between the model input and output variables through pruning and symbolic fitting regression. At the same time, the modeling process is similar to data-driven modeling and does not require prior knowledge of the chemical process. After establishing the model, pruning and symbolic regression are used to explore the mapping relationship between inputs and outputs. Such advancement may provide certain support for large-scale and complex chemical process monitoring.

6.3. Generalized Models Incorporating Process Information

Based on the above analysis, deep learning-based process monitoring models have become a research hotspot in this domain. However, the complexity of their structures and the nature of being “black-box models” still restrict their application to specific data ranges and prevent them from being generalized to practical industrial applications. Current research has also begun to attempt to incorporate process information into deep learning models, such as physics-informed neural networks to consider physical constraints in the process and graph neural networks to preserve the topological structural relationships in the process [105,108,118,119]. Nevertheless, such methods only achieve shallow coupling of mechanism information, mostly by adding empirical formulas or partial mechanism formulas as constraints to the loss function of neural networks. They still do not belong to true mechanism-data fusion modeling and may even impose additional constraints that limit the convergence and feature extraction performance of data-driven models. Overall, the consideration of process information in existing data-driven modeling efforts remains limited, highlighting an urgent need for a more direct and reasonable approach to achieve deep integration of data-driven models with process information/mechanisms.

Constructing secondary variables combined with chemical engineering mechanism knowledge could be one of the feasible solutions. This is because the original variables collected from factories, such as temperature, pressure, liquid level, and flow rate, may not fully represent the various complex physical and chemical changes in chemical process operations. Taking the classic unit operation of distillation in chemical processes as an example, these variables alone are insufficient to express the mass transfer process at the microscopic level. If mechanism knowledge such as temperature distribution, pressure distribution, reflux ratio, draw ratio, heat transfer, and the ideal gas law on the basis of these original variables could be integrated to establish a paradigm for constructing secondary variables with engineering significance, it is expected to improve the strong applicability of data-driven models in complex industrial environments, thereby effectively enhancing the generalization performance and interpretability of AI models in practical industrial applications.

The proposed future research directions have the potential to bridge the gap between process monitoring theory and industrial practice, with far-reaching implications for research, practice, economic development, public policy, and society. For research, they break the existing “algorithm-centric” bias by promoting cross-integration of chemical engineering mechanism, artificial intelligence, and process safety, providing new theoretical frameworks and research paradigms for solving complex industrial monitoring challenges and enriching the academic body of knowledge and guiding subsequent studies toward more industry-relevant topics. In practice, the focus on interpretable model enhanced generalization, and deep mechanism-data fusion directly addresses industrial pain points such as distrust in “black-box” AI, scarcity of labeled data, and poor adaptability to dynamic processes. Their application can significantly reduce fault-related downtime and production losses, as demonstrated by our prior industrial validation achieving RMB 87 million annual cost saving, while commercially driving the development of tailored smart monitoring systems for chemical, energy, and carbon capture, utilization, and storage sectors, boosting the competitiveness of industrial enterprises and creating new market opportunities for technology providers. For public policy, the research aligns with national strategies for smart manufacturing, industrial safety, and carbon neutrality. Reliable monitoring systems support stricter enforcement of safety and environmental regulations in high-risk industries, while generalized models for carbon capture, utilization, and storage can inform policies on long-term carbon sequestration safety, facilitating the achievement of emission reduction targets. Societally, the advancements reduce the risk of industrial accidents to protect worker and community safety, while improved production efficiency and resource utilization contribute to green and low-carbon development, and the emphasis on interpretable AI also promotes responsible innovation, addressing public concerns about “black-box” technologies and fostering trust in smart manufacturing. A structured summary of these interconnected directions and their specific methodologies is provided in Table 2, which outlines a cohesive path forward.

7. Conclusions

Process monitoring is crucial for ensuring industrial safety and efficiency by enabling the prompt detection and mitigation of potential faults. In recent years, the advent of data-driven models, particularly those leveraging deep learning, has shown significant promise in enhancing fault detection capabilities. Previous research has underscored the effectiveness of these models in specific benchmark processes; however, despite their advancements, several challenges remain. These challenges encompass issues related to model interpretability, the ability to generalize to real-world industrial settings, and the concurrent achievement of fault detection alongside root cause diagnosis. Addressing these challenges is essential to fully realize the potential of data-driven process monitoring in practical industrial applications.

To this end, this work provides a comprehensive review of the current state-of-the-art in data-driven process monitoring, analyzing the three prevalent categories of methods and their respective strengths and limitations. While these methods have demonstrated significant potential, our analysis reveals that current research often relies heavily on simulation processes, such as the TE process, which may not fully capture the complexities and variabilities inherent in real industrial environments. This over-reliance underscores the need for more practical and diverse case studies to validate the effectiveness of these models in real-world scenarios. Finally, this review also discusses challenges and outlines future directions concerning the utilization of AI algorithms in industrial process monitoring, which could lay the groundwork for the future development of process monitoring methods targeted at practical industrial applications. Such advancements will not only benefit academic research by promoting innovation and a deeper understanding but also enhance industrial operations by enabling more effective fault detection and management strategies, ultimately contributing to safer and more efficient industrial practices.

It is worth noting that several aspects related to this topic could be further explored in future work, which also reflect the potential extensions of this review. First, future research could expand the literature scope to include more regional studies, and emerging subfield publications beyond mainstream databases, to achieve a more holistic overview of global research progress in process monitoring. Second, considering the distinct characteristics of different industrial sectors, subsequent investigations could conduct targeted analysis of method applicability and optimization strategies for specific industries, to enhance the practical guidance of review findings. Third, with the gradual standardization of evaluation metrics in the field, future reviews may incorporate systematic quantitative synthesis to objectively compare the performance of different process monitoring methods. Finally, as emerging techniques such as Kolmogorov–Arnold Networks and physics-informed neural networks continue to evolve, future studies could supplement empirical validation and industrial case studies to deeply explore their practical application potential, which could further enrich the discussion on mechanism-data fusion in process monitoring. Given the inherent constraints associated with literature retrieval methodologies, this review is predicated on the current state of our understanding of the topic and thus may not achieve comprehensive coverage of all pertinent studies in the field. Furthermore, potential inconsistencies or oversights may exist in certain analytical viewpoints presented herein. We anticipate that this work will offer a novel perspective for scholarly discourse on the subject and serve as a valuable point of reference for fellow researchers in the domain.

Author Contributions

Conceptualization, C.J. and W.S.; methodology, C.J. and F.M.; software, C.J.; validation, F.M. and C.J.; formal analysis, W.S.; investigation, C.J. and W.S.; resources, J.W.; data curation, C.J. and J.R.; writing—original draft preparation, C.J.; writing—review and editing, J.W. and W.S.; visualization, C.J. and J.R.; supervision, J.W. and W.S.; project administration, J.W. and W.S.; funding acquisition, W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant numbers: 22278018) and the Natural Science Foundation of Huai’an, China (grant numbers: HAB2025014).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dai, Y.; Wang, H.; Khan, F.; Zhao, J. Abnormal situation management for smart chemical process operation. Curr. Opin. Chem. Eng. 2016, 14, 49–55. [Google Scholar] [CrossRef]
Isermann, R.; Ballé, P. Trends in the application of model-based fault detection and diagnosis of technical processes. Control Eng. Pract. 1997, 5, 709–719. [Google Scholar] [CrossRef]
Yin, S.; Ding, S.X.; Xie, X.; Luo, H. A Review on Basic Data-Driven Approaches for Industrial Process Monitoring. IEEE Trans. Ind. Electron. 2014, 61, 6418–6428. [Google Scholar] [CrossRef]
Ge, Z. Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemom. Intell. Lab. Syst. 2017, 171, 16–25. [Google Scholar] [CrossRef]
Alauddin, M.; Khan, F.; Imtiaz, S.; Ahmed, S. A Bibliometric Review and Analysis of Data-Driven Fault Detection and Diagnosis Methods for Process Systems. Ind. Eng. Chem. Res. 2018, 57, 10719–10735. [Google Scholar] [CrossRef]
Zhang, H.-D.; Zheng, X.-P. Characteristics of hazardous chemical accidents in China: A statistical investigation. J. Loss Prev. Process Ind. 2012, 25, 686–693. [Google Scholar] [CrossRef]
Ji, C.; Sun, W. A Review on Data-Driven Process Monitoring Methods: Characterization and Mining of Industrial Data. Processes 2022, 10, 335. [Google Scholar] [CrossRef]
Jiang, Q.; Yan, X.; Huang, B. Review and Perspectives of Data-Driven Distributed Monitoring for Industrial Plant-Wide Processes. Ind. Eng. Chem. Res. 2019, 58, 12899–12912. [Google Scholar] [CrossRef]
Quiñones-Grueiro, M.; Prieto-Moreno, A.; Verde, C.; Llanes-Santiago, O. Data-driven monitoring of multimode continuous processes: A review. Chemom. Intell. Lab. Syst. 2019, 189, 56–71. [Google Scholar] [CrossRef]
Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. [Google Scholar] [CrossRef]
Kresta, J.V.; MacGregor, J.F.; Marlin, T.E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35–47. [Google Scholar] [CrossRef]
MacGregor, J.F. Statistical Process Control of Multivariate Processes. IFAC Proc. Vol. 1994, 27, 427–437. [Google Scholar] [CrossRef]
Dong, Y.; Qin, S.J. A novel dynamic PCA algorithm for dynamic data modeling and process monitoring. J. Process Control 2018, 67, 1–11. [Google Scholar] [CrossRef]
Ku, W.; Storer, R.H.; Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 1995, 30, 179–196. [Google Scholar] [CrossRef]
Li, G.; Liu, B.; Qin, S.J.; Zhou, D. Dynamic latent variable modeling for statistical process monitoring. IFAC Proc. Vol. 2011, 44, 12886–12891. [Google Scholar] [CrossRef]
Qin, S.J.; Dong, Y.; Zhu, Q.; Wang, J.; Liu, Q. Bridging systems theory and data science: A unifying review of dynamic latent variable analytics and process monitoring. Annu. Rev. Control 2020, 50, 29–48. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.; Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef]
Cheng, H.; Liu, Y.; Huang, D.; Cai, B.; Wang, Q. Rebooting kernel CCA method for nonlinear quality-relevant fault detection in process industries. Process Saf. Environ. Prot. 2021, 149, 619–630. [Google Scholar] [CrossRef]
Apsemidis, A.; Psarakis, S.; Moguerza, J.M. A review of machine learning kernel methods in statistical process monitoring. Comput. Ind. Eng. 2020, 142, 106376. [Google Scholar] [CrossRef]
Choi, S.W.; Park, J.H.; Lee, I.-B. Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis. Comput. Chem. Eng. 2004, 28, 1377–1387. [Google Scholar] [CrossRef]
Liu, J.; Liu, T.; Chen, J. Sequential local-based Gaussian mixture model for monitoring multiphase batch processes. Chem. Eng. Sci. 2018, 181, 101–113. [Google Scholar] [CrossRef]
Tong, C.; Lan, T.; Shi, X. Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring. Control Eng. Pract. 2017, 58, 34–41. [Google Scholar] [CrossRef]
Lee, J.-M.; Yoo, C.; Lee, I.-B. Statistical process monitoring with independent component analysis. J. Process Control 2004, 14, 467–485. [Google Scholar] [CrossRef]
Chen, J.; Zhao, C. Exponential Stationary Subspace Analysis for Stationary Feature Analytics and Adaptive Nonstationary Process Monitoring. IEEE Trans. Ind. Inf. 2021, 17, 8345–8356. [Google Scholar] [CrossRef]
Yu, W.; Zhao, C.; Huang, B. Recursive cointegration analytics for adaptive monitoring of nonstationary industrial processes with both static and dynamic variations. J. Process Control 2020, 92, 319–332. [Google Scholar] [CrossRef]
Rao, J.; Ji, C.; Wang, J.; Sun, W.; Romagnoli, J.A. High-Order Nonstationary Feature Extraction for Industrial Process Monitoring Based on Multicointegration Analysis. Ind. Eng. Chem. Res. 2024, 63, 9489–9503. [Google Scholar] [CrossRef]
Amin, M.T.; Imtiaz, S.; Khan, F. Process system fault detection and diagnosis using a hybrid technique. Chem. Eng. Sci. 2018, 189, 191–211. [Google Scholar] [CrossRef]
Weber, P.; Medina-Oliva, G.; Simon, C.; Iung, B. Overview on Bayesian networks applications for dependability, risk analysis and maintenance areas. Eng. Appl. Artif. Intell. 2012, 25, 671–682. [Google Scholar] [CrossRef]
Amin, M.T.; Khan, F.; Imtiaz, S. Fault detection and pathway analysis using a dynamic Bayesian network. Chem. Eng. Sci. 2019, 195, 777–790. [Google Scholar] [CrossRef]
Amin, M.T.; Khan, F.; Ahmed, S.; Imtiaz, S. A data-driven Bayesian network learning method for process fault diagnosis. Process Saf. Environ. Prot. 2021, 150, 110–122. [Google Scholar] [CrossRef]
Amin, M.T.; Khan, F.; Ahmed, S.; Imtiaz, S. A novel data-driven methodology for fault detection and dynamic risk assessment. Can. J. Chem. Eng. 2020, 98, 2397–2416. [Google Scholar] [CrossRef]
Kong, X.; Ge, Z. Deep Learning of Latent Variable Models for Industrial Process Monitoring. IEEE Trans. Ind. Inf. 2021, 18, 6778–6788. [Google Scholar] [CrossRef]
Yuan, X.; Li, L.; Shardt, Y.A.W.; Wang, Y.; Yang, C. Deep Learning with Spatiotemporal Attention-Based LSTM for Industrial Soft Sensor Model Development. IEEE Trans. Ind. Electron. 2021, 68, 4404–4414. [Google Scholar] [CrossRef]
Arunthavanathan, R.; Khan, F.; Ahmed, S.; Imtiaz, S. A deep learning model for process fault prognosis. Process Saf. Environ. Prot. 2021, 154, 467–479. [Google Scholar] [CrossRef]
Adedigba, S.A.; Khan, F.; Yang, M. Dynamic failure analysis of process systems using neural networks. Process Saf. Environ. Prot. 2017, 111, 529–543. [Google Scholar] [CrossRef]
Venkatasubramanian, V. The promise of artificial intelligence in chemical engineering: Is it here, finally. AlChE J. 2019, 65, 466–478. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of process fault detection and diagnosis Part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27, 293–311. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S.N. A review of process fault detection and diagnosis Part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27, 313–326. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S.N.; Yin, K. A review of process fault detection and diagnosis Part III: Process history based methods. Comput. Chem. Eng. 2003, 27, 327–346. [Google Scholar] [CrossRef]
Ge, Z.; Song, Z.; Gao, F. Review of Recent Research on Data-Based Process Monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543–3562. [Google Scholar] [CrossRef]
Melo, A.; Câmara, M.M.; Clavijo, N.; Pinto, J.C. Open benchmarks for assessment of process monitoring and fault diagnosis techniques: A review and critical analysis. Comput. Chem. Eng. 2022, 165, 107964. [Google Scholar] [CrossRef]
Lee, J.M.; Qin, S.J.; Lee, I.B. Fault detection of non-linear processes using kernel independent component analysis. Can. J. Chem. Eng. 2007, 85, 526–536. [Google Scholar] [CrossRef]
Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a new soft sensor method using independent component analysis and partial least squares. AlChE J. 2009, 55, 87–98. [Google Scholar] [CrossRef]
Kano, M.; Hasebe, S.; Hashimoto, I.; Ohno, H. Statistical process monitoring based on dissimilarity of process data. AlChE J. 2002, 48, 1231–1240. [Google Scholar] [CrossRef]
Fazai, R.; Mansouri, M.; Abodayeh, K.; Nounou, H.; Nounou, M. Online reduced kernel PLS combined with GLRT for fault detection in chemical systems. Process Saf. Environ. Prot. 2019, 128, 228–243. [Google Scholar] [CrossRef]
Deng, X.; Tian, X. Nonlinear process fault pattern recognition using statistics kernel PCA similarity factor. Neurocomputing 2013, 121, 298–308. [Google Scholar] [CrossRef]
Choi, S.W.; Lee, I.-B. Nonlinear dynamic process monitoring based on dynamic kernel PCA. Chem. Eng. Sci. 2004, 59, 5897–5908. [Google Scholar] [CrossRef]
Zhou, L.; Li, G.; Song, Z.; Qin, S.J. Autoregressive Dynamic Latent Variable Models for Process Monitoring. IEEE Trans. Control Syst. Technol. 2017, 25, 366–373. [Google Scholar] [CrossRef]
Wang, J.; He, Q.P. Multivariate statistical process monitoring based on statistics pattern analysis. Ind. Eng. Chem. Res. 2010, 49, 7858–7869. [Google Scholar] [CrossRef]
He, Q.P.; Wang, J. Statistics pattern analysis: A new process monitoring framework and its application to semiconductor batch processes. AlChE J. 2011, 57, 107–121. [Google Scholar] [CrossRef]
Ji, H. Statistics Mahalanobis distance for incipient sensor fault detection and diagnosis. Chem. Eng. Sci. 2021, 230, 116233. [Google Scholar] [CrossRef]
Shang, J.; Chen, M.; Zhang, H. Fault detection based on augmented kernel Mahalanobis distance for nonlinear dynamic processes. Comput. Chem. Eng. 2018, 109, 311–321. [Google Scholar] [CrossRef]
Ji, C.; Ma, F.; Wang, J.; Sun, W. Orthogonal projection based statistical feature extraction for continuous process monitoring. Comput. Chem. Eng. 2024, 183, 108600. [Google Scholar] [CrossRef]
Khan, F.; Haddara, M. Risk-based maintenance (RBM): A quantitative approach for maintenance/inspection scheduling and planning. J. Loss Prev. Process Ind. 2003, 16, 561–573. [Google Scholar] [CrossRef]
Arunraj, N.S.; Maiti, J. Risk-based maintenance—Techniques and applications. J. Hazard. Mater. 2007, 142, 653–661. [Google Scholar] [CrossRef] [PubMed]
Amin, M.T.; Khan, F.; Ahmed, S.; Imtiaz, S. Risk-based fault detection and diagnosis for nonlinear and non-Gaussian process systems using R-vine copula. Process Saf. Environ. Prot. 2021, 150, 123–136. [Google Scholar] [CrossRef]
Yu, H.; Khan, F.; Garaniya, V. Risk-based fault detection using Self-Organizing Map. Reliab. Eng. Syst. Saf. 2015, 139, 82–96. [Google Scholar] [CrossRef]
Khakzad, N.; Khan, F.; Amyotte, P. Safety analysis in process facilities: Comparison of fault tree and Bayesian network approaches. Reliab. Eng. Syst. Saf. 2011, 96, 925–932. [Google Scholar] [CrossRef]
Khakzad, N.; Khan, F.; Amyotte, P. Risk-based design of process systems using discrete-time Bayesian networks. Reliab. Eng. Syst. Saf. 2013, 109, 5–17. [Google Scholar] [CrossRef]
Yu, H.; Khan, F.; Garaniya, V. Modified Independent Component Analysis and Bayesian Network-Based Two-Stage Fault Diagnosis of Process Operations. Ind. Eng. Chem. Res. 2015, 54, 2724–2742. [Google Scholar] [CrossRef]
Gharahbagheri, H.; Imtiaz, S.A.; Khan, F. Root Cause Diagnosis of Process Fault Using KPCA and Bayesian Network. Ind. Eng. Chem. Res. 2017, 56, 2054–2070. [Google Scholar] [CrossRef]
Gonzalez, R.; Huang, B.; Lau, E. Process monitoring using kernel density estimation and Bayesian networking with an industrial case study. ISA Trans. 2015, 58, 330–347. [Google Scholar] [CrossRef] [PubMed]
Bi, X.; Wu, D.; Xie, D.; Ye, H.; Zhao, J. Large-scale chemical process causal discovery from big data with transformer-based deep learning. Process Saf. Environ. Prot. 2023, 173, 163–177. [Google Scholar] [CrossRef]
Cervantes-Bobadilla, M.; García-Morales, J.; Saavedra-Benítez, Y.I.; Hernández-Pérez, J.A.; Adam-Medina, M.; Guerrero-Ramírez, G.V.; Escobar-Jímenez, R.F. Multiple fault detection and isolation using artificial neural networks in sensors of an internal combustion engine. Eng. Appl. Artif. Intell. 2023, 117, 105524. [Google Scholar] [CrossRef]
Ma, F.; Ji, C.; Wang, J.; Sun, W. Early identification of process deviation based on convolutional neural network. Chin. J. Chem. Eng. 2023, 56, 104–118. [Google Scholar] [CrossRef]
Heo, S.; Lee, J.H. Fault detection and classification using artificial neural networks. IFAC PapersOnLine 2018, 51, 470–475. [Google Scholar] [CrossRef]
Yu, J.; Yan, X. Whole Process Monitoring Based on Unstable Neuron Output Information in Hidden Layers of Deep Belief Network. IEEE Trans. Cybern. 2020, 50, 3998–4007. [Google Scholar] [CrossRef]
Barreto, N.E.M.; Rodrigues, R.; Schumacher, R.; Aoki, A.R.; Lambert-Torres, G. Artificial Neural Network Approach for Fault Detection and Identification in Power Systems with Wide Area Measurement Systems. J. Control. Autom. Electr. Syst. 2021, 32, 1617–1626. [Google Scholar] [CrossRef]
Li, B.; Delpha, C.; Diallo, D.; Migan-Dubois, A. Application of Artificial Neural Networks to photovoltaic fault detection and diagnosis: A review. Renew. Sustain. Energy Rev. 2021, 138, 110512. [Google Scholar] [CrossRef]
Cordoni, F.; Bacchiega, G.; Bondani, G.; Radu, R.; Muradore, R. A multi–modal unsupervised fault detection system based on power signals and thermal imaging via deep AutoEncoder neural network. Eng. Appl. Artif. Intell. 2022, 110, 104729. [Google Scholar] [CrossRef]
Han, S.; Yang, L.; Duan, D.; Yao, L.; Gao, K.; Zhang, Q.; Xiao, Y.; Wu, W.; Yang, J.; Liu, W.; et al. A novel fault detection and identification method for complex chemical processes based on OSCAE and CNN. Process Saf. Environ. Prot. 2024, 190, 322–334. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Sakurada, M.; Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast, Australia, 2 December 2014; pp. 4–11. [Google Scholar]
Jang, K.; Hong, S.; Kim, M.; Na, J.; Moon, I. Adversarial Autoencoder Based Feature Learning for Fault Detection in Industrial Processes. IEEE Trans. Ind. Inf. 2022, 18, 827–834. [Google Scholar] [CrossRef]
Yu, W.; Zhao, C. Robust Monitoring and Fault Isolation of Nonlinear Industrial Processes Using Denoising Autoencoder and Elastic Net. IEEE Trans. Control Syst. Technol. 2020, 28, 1083–1091. [Google Scholar] [CrossRef]
Lee, S.; Kwak, M.; Tsui, K.-L.; Kim, S.B. Process monitoring using variational autoencoder for high-dimensional nonlinear processes. Eng. Appl. Artif. Intell. 2019, 83, 13–27. [Google Scholar] [CrossRef]
Cacciarelli, D.; Kulahci, M. A novel fault detection and diagnosis approach based on orthogonal autoencoders. Comput. Chem. Eng. 2022, 163, 107853. [Google Scholar] [CrossRef]
Bi, X.; Zhao, J. A novel orthogonal self-attentive variational autoencoder method for interpretable chemical process fault detection and identification. Process Saf. Environ. Prot. 2021, 156, 581–597. [Google Scholar] [CrossRef]
Wan, F.; Guo, G.; Zhang, C.; Guo, Q.; Liu, J. Outlier Detection for Monitoring Data Using Stacked Autoencoder. IEEE Access 2019, 7, 173827–173837. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1310–1318. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Aliabadi, M.M.; Emami, H.; Dong, M.; Huang, Y. Attention-based recurrent neural network for multistep-ahead prediction of process performance. Comput. Chem. Eng. 2020, 140, 106931. [Google Scholar] [CrossRef]
Ren, J.; Ni, D. A batch-wise LSTM-encoder decoder network for batch process monitoring. Chem. Eng. Res. Des. 2020, 164, 102–112. [Google Scholar] [CrossRef]
Cheng, F.; He, Q.P.; Zhao, J. A novel process monitoring approach based on variational recurrent autoencoder. Comput. Chem. Eng. 2019, 129, 106515. [Google Scholar] [CrossRef]
Ji, C.; Ma, F.; Wang, J.; Sun, W. Profitability related industrial-scale batch processes monitoring via deep learning based soft sensor development. Comput. Chem. Eng. 2023, 170, 108125. [Google Scholar] [CrossRef]
Hong, F.; Ji, C.; Rao, J.; Chen, C.; Sun, W. Hourly ozone level prediction based on the characterization of its periodic behavior via deep learning. Process Saf. Environ. Prot. 2023, 174, 28–38. [Google Scholar] [CrossRef]
Zhang, S.; Qiu, T. A dynamic-inner convolutional autoencoder for process monitoring. Comput. Chem. Eng. 2022, 158, 107654. [Google Scholar] [CrossRef]
Yu, J.; Liu, X.; Ye, L. Convolutional Long Short-Term Memory Autoencoder-Based Feature Learning for Fault Detection in Industrial Processes. IEEE Trans. Instrum. Meas. 2021, 70, 1–15. [Google Scholar] [CrossRef]
Zhang, S.; Qiu, T. Semi-supervised LSTM ladder autoencoder for chemical process fault diagnosis and localization. Chem. Eng. Sci. 2022, 251, 117467. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; Woo, W.-C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar]
Bayram, B.; Duman, T.B.; Ince, G. Real time detection of acoustic anomalies in industrial processes using sequential autoencoders. Expert Syst. 2021, 38, e12564. [Google Scholar] [CrossRef]
Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Adv. Neural Inf. Process. Syst. 2017, 30, 4965597. [Google Scholar]
Wang, Y.; Zhang, J.; Zhu, H.; Long, M.; Wang, J.; Yu, P.S. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9154–9162. [Google Scholar]
Ji, C.; Ma, F.; Wang, J.; Sun, W.; Zhu, X. Statistical method based on dissimilarity of variable correlations for multimode chemical process monitoring with transitions. Process Saf. Environ. Prot. 2022, 162, 649–662. [Google Scholar] [CrossRef]
Leite, D.; Andrade, E.; Rativa, D.; Maciel, A.M.A. Fault Detection and Diagnosis in Industry 4.0: A Review on Challenges and Opportunities. Sensors 2024, 25, 60. [Google Scholar] [CrossRef]
Ma, F.; Ji, C.; Xu, M.; Wang, J.; Sun, W. Spatial Correlation Extraction for Chemical Process Fault Detection Using Image Enhancement Technique aided Convolutional Autoencoder. Chem. Eng. Sci. 2023, 278, 118900. [Google Scholar] [CrossRef]
Ji, C.; Ma, F.; Wang, J.; Wang, J.; Sun, W. Real-Time Industrial Process Fault Diagnosis Based on Time Delayed Mutual Information Analysis. Processes 2021, 9, 1027. [Google Scholar] [CrossRef]
Rojas, L.; Peña, Á.; Garcia, J. AI-Driven Predictive Maintenance in Mining: A Systematic Literature Review on Fault Detection, Digital Twins, and Intelligent Asset Management. Appl. Sci. 2025, 15, 3337. [Google Scholar] [CrossRef]
Patil, A.; Soni, G.; Prakash, A. Data-driven approaches for impending fault detection of industrial systems: A review. Int. J. Syst. Assur. Eng. Manag. 2022, 15, 1326–1344. [Google Scholar] [CrossRef]
Aghaee, M.; Mishra, A.; Krau, S.; Tamer, I.M.; Budman, H. Artificial intelligence applications for fault detection and diagnosis in pharmaceutical bioprocesses: A review. Curr. Opin. Chem. Eng. 2024, 44, 101025. [Google Scholar] [CrossRef]
Bi, J.; Wang, H.; Yan, E.; Wang, C.; Yan, K.; Jiang, L.; Yang, B. AI in HVAC fault detection and diagnosis: A systematic review. Energy Rev. 2024, 3, 100071. [Google Scholar] [CrossRef]
Liu, Y.; Jafarpour, B. Graph attention network with Granger causality map for fault detection and root cause diagnosis. Comput. Chem. Eng. 2024, 180, 108453. [Google Scholar] [CrossRef]
Lv, F.; Bi, X.; Xu, Z.; Zhao, J. Causality-embedded reconstruction network for high-resolution fault identification in chemical process. Process Saf. Environ. Prot. 2024, 186, 1011–1033. [Google Scholar] [CrossRef]
Yao, K.; Shi, H.; Song, B.; Tao, Y. A Temporal correlation topology graph model based on Mode-intrinsic and Mode-specific characterization using differentiated decoupling module with Distribution independence constraints for Multimode adaptive process monitoring. Process Saf. Environ. Prot. 2025, 199, 107207. [Google Scholar] [CrossRef]
Zheng, Y.; Hu, C.; Wang, X.; Wu, Z. Physics-informed recurrent neural network modeling for predictive control of nonlinear processes. J. Process Control 2023, 128, 103005. [Google Scholar] [CrossRef]
Bai, Y.-t.; Wang, X.-y.; Jin, X.-b.; Zhao, Z.-y.; Zhang, B.-h. A Neuron-Based Kalman Filter with Nonlinear Autoregressive Model. Sensors 2020, 20, 299. [Google Scholar] [CrossRef]
Jackson, J.E.; Mudholkar, G.S. Control Procedures for Residuals Associated with Principal Component Analysis. Technometrics 1979, 21, 341–349. [Google Scholar] [CrossRef]
Zhang, H.; Tian, X.; Deng, X. Batch Process Monitoring Based on Multiway Global Preserving Kernel Slow Feature Analysis. IEEE Access 2017, 5, 2696–2710. [Google Scholar] [CrossRef]
Li, G.; Qin, S.J.; Yuan, T. Data-driven root cause diagnosis of faults in process industries. Chemom. Intell. Lab. Syst. 2016, 159, 1–11. [Google Scholar] [CrossRef]
Jiang, Q.; Wang, Z.; Yan, S.; Cao, Z. Data-driven Soft Sensing for Batch Processes Using Neural Network-based Deep Quality-Relevant Representation Learning. IEEE Trans. Artif. Intell. 2022, 4, 602–611. [Google Scholar] [CrossRef]
Zhou, X.; Liang, W.; Shimizu, S.; Ma, J.; Jin, Q. Siamese Neural Network Based Few-Shot Learning for Anomaly Detection in Industrial Cyber-Physical Systems. IEEE Trans. Ind. Inf. 2021, 17, 5790–5798. [Google Scholar] [CrossRef]
Ji, C.; Ma, F.; Wang, J.; Sun, W.; Palazoglu, A. Industrial Process Fault Detection Based on Siamese Recurrent Autoencoder. Comput. Chem. Eng. 2025, 192, 108887. [Google Scholar] [CrossRef]
Shao, J.-D.; Rong, G.; Lee, J.M. Generalized orthogonal locality preserving projections for nonlinear fault detection and diagnosis. Chemom. Intell. Lab. Syst. 2009, 96, 75–83. [Google Scholar] [CrossRef]
Koenig, B.C.; Kim, S.; Deng, S. KAN-ODEs: Kolmogorov–Arnold network ordinary differential equations for learning dynamical systems and hidden physics. Comput. Methods Appl. Mech. Eng. 2024, 432, 117397. [Google Scholar] [CrossRef]
Shen, S.; Lu, H.; Sadoughi, M.; Hu, C.; Nemani, V.; Thelen, A.; Webster, K.; Darr, M.; Sidon, J.; Kenny, S. A physics-informed deep learning approach for bearing fault detection. Eng. Appl. Artif. Intell. 2021, 103, 104295. [Google Scholar] [CrossRef]
Jia, M.; Hu, J.; Liu, Y.; Gao, Z.; Yao, Y. Topology-Guided Graph Learning for Process Fault Diagnosis. Ind. Eng. Chem. Res. 2023, 62, 3238–3248. [Google Scholar] [CrossRef]

Figure 1. Keywords extracted from literature investigated in this work.

Figure 2. The structure and logical flow of this review paper.

Figure 3. A statistical overview of the literature on chemical process monitoring and the case studies adopted in the work over the past decade.

Figure 4. Process flow chart of future research direction in deep learning-based process monitoring.

Table 1. Cross-analysis table of process monitoring methods reviewed in this work.

Category	Specific Method Implementation	Key Findings	Research Gaps	Way Forward
Statistical Process Monitoring Methods	PCA (linear projection via variance-covariance analysis)	Effective for steady, linear, Gaussian-distributed process data	Fails to handle nonlinearity, dynamics, and non-Gaussian data	Integrate kernel tricks or dynamic constraints for complex data characteristics
	ICA (linear projection for extracting independent components)	Superior to PCA for non-Gaussian process data	Limited to linear relationships; unable to capture process dynamics	Combine with dynamic modeling to address time-series autocorrelation
	PLS (linear projection focusing on input-output variable correlation)	Reliable for feature extraction in input-output associated processes	Inadequate for nonlinear industrial processes	Apply kernel transformation to extend to nonlinear scenarios
	KPCA (kernel trick to map data to high-dimensional linear space)	Addresses nonlinearity in process data	Causes significant dimensionality increase; unsuitable for large-scale processes	Optimize kernel function to reduce computational complexity
	KPLS (kernel trick integrated with PLS for nonlinear input-output correlation)	Enhances nonlinear feature extraction for input-output processes	High computational cost limits real-time industrial application	Design lightweight kernel variants for efficient online monitoring
	DKPCA (hybrid of DPCA and KPCA for nonlinear dynamic processes)	Simultaneously handles process nonlinearity and dynamics	Dimensionality explosion in large-scale industrial systems	Incorporate feature selection to reduce redundant dimensions
	Auto-regressive PCA (autoregression model built on latent variables)	Captures process dynamics without data augmentation	Only considers unified time delay; ignores variable-specific dynamic differences	Adapt time-delay parameters for variables with distinct dynamic/periodic features
	DPCA (data augmentation to capture process autocorrelation)	Improves monitoring of dynamic processes via time-series extension	Limited to 1-step delay; unable to model complex dynamics	Extend to multi-step delay adaptation for periodic processes
	Dynamic-inner PCA (autoregression model on latent variables)	Avoids data augmentation while capturing dynamics	Fails to adapt to variable-specific dynamic performances	Develop variable-adaptive autoregressive coefficients
	SPA (Statistics Pattern Analysis, monitoring statistical features of variables)	Enables incipient fault detection via feature-level monitoring	Prone to feature redundancy; reduces fault detectability	Integrate feature selection to eliminate redundancy and enhance sensitivity
	Statistics Mahalanobis distance; Augmented kernel Mahalanobis distance	Improves incipient fault detection accuracy	Feature redundancy compromises monitoring performance	Optimize feature fusion strategies to balance comprehensiveness and conciseness
Probabilistic Process Monitoring Methods	BN (Bayesian Network for risk-based maintenance, RBM)	Excels at uncertainty handling and risk quantification	Requires fault/disturbance data for causal network construction	Develop unsupervised BN training for imbalanced industrial data
	RBM methodology (risk estimation, evaluation, maintenance planning)	Provides systematic risk-based process management framework	Lacks direct integration with real-time fault detection	Integrate real-time data streams for dynamic risk updating
	RBM applied to power-generating plants	Validates RBM feasibility in energy sector processes	Limited generalizability to other industrial domains (e.g., chemical)	Customize risk metrics for domain-specific process characteristics
	RBM applied to ethylene oxide production facilities	Demonstrates RBM effectiveness in chemical process safety management	Relies on sufficient fault data for risk model calibration	Incorporate transfer learning to address fault data scarcity
	R-vine copula (probabilistic model for risk-based FDD)	Enhances risk-based fault detection/diagnosis (FDD) accuracy	Requires sufficient fault samples for model training	Develop semi-supervised copula models for imbalanced data scenarios
	Self-organizing map, probabilistic analysis (risk-based fault detection)	Combines clustering and probability for risk-oriented fault detection	Limited adaptability to multimodal process data	Integrate adaptive clustering to handle multimodal industrial dynamics
	Hybrid PCA and BN	Identifies root causes and fault propagation pathways	Shallow fusion of statistical and probabilistic advantages	Develop deep fusion frameworks for enhanced fault tracing
	Modified ICA and BN (two-stage hybrid FDD)	Improves fault detection and propagation tracing accuracy	Limited to linear process features (via ICA)	Combine nonlinear feature extractors (e.g., KICA) with BN
	Hybrid KPCA and BN	Addresses nonlinear processes in probabilistic FDD	Computational complexity of KPCA limits real-time application	Optimize KPCA-BN integration for efficient online monitoring
	Hybrid HMM and BN for fault prognosis	Enables fault progression prediction via temporal probabilistic modeling	HMM suffers from gradient issues in long-time-series training	Integrate LSTM units to enhance temporal feature learning
	Multivariate fault probability integration in FDD	Improves FDD performance via comprehensive probability quantification	Over-reliance on fault data for probability calibration	Develop unsupervised probability estimation for unlabeled industrial data
Deep Learning Methods	Deep learning (multi-layer neural networks for nonlinear/time-varying chemical processes)	Strong data fitting ability for complex industrial processes	High parameter demand; requires large-scale training data	Design lightweight networks to reduce data/parameter dependency
	ANN (residue-based fault detection via neural network prediction)	Enables fault detection via prediction residual analysis	Poor interpretability; “black-box” limitation	Incorporate attention mechanisms to enhance feature interpretability
	Residual-based CNN and PCA	Captures local spatial features and enables dimensionality reduction	CNN’s local feature focus ignores global process relationships	Combine CNN with global feature extractors (e.g., transformer)
	ANN with unstable hidden layer neurons (multi-layer feature integration)	Enhances fault detection via comprehensive multi-layer feature fusion	Unstable neurons increase model training randomness	Optimize hidden layer activation functions for stability
	Hybrid KPCA and DNN	Combines nonlinear feature extraction (KPCA) and deep fitting (DNN)	High computational cost of hybrid model	Simplify model structure while retaining complementary advantages
	Adaptive BN and ANN	Tackles nonlinear/non-Gaussian/multimodal process FDD	Adaptive mechanism lacks robustness to extreme process deviations	Improve adaptability via online parameter updating
	AE (encoder–decoder structure for nonlinear feature extraction)	Models almost any nonlinearity via activation functions	Relies on reconstruction error; limited interpretability	Introduce orthogonal constraints to enhance feature interpretability
	AE (process monitoring via reconstruction error measurement)	Enables effective fault detection via input-output reconstruction	Insensitive to subtle incipient faults	Optimize loss function to enhance sensitivity to minor deviations
	Adversarial AE (latent variables constrained to specific distribution)	Improves latent feature representation of original data manifold	Adversarial training increases computational complexity	Simplify adversarial mechanism for industrial applicability
	Denoising AE (feature extraction from noisy input data)	Enhances robust monitoring in noisy industrial environments	Denoising process may lose fault-related weak signals	Balance denoising and fault signal preservation
	VAE (variational AE for nonlinear/nonnormal process data)	Simultaneously handles process nonlinearity and nonnormality	Complex variational inference limits real-time monitoring	Optimize inference algorithm for efficient online application
	Orthogonal AE (orthogonal constraints in loss function)	Reduces feature redundancy via orthogonalization	Orthogonal constraints may increase training difficulty	Adjust constraint strength to balance redundancy reduction and training stability
	Orthogonal self-attentive VAE	Enhances model interpretability and fault identification via self-attention	Complex structure increases computational cost	Simplify attention mechanism for lightweight deployment
	SAE (Stacked Autoencoder, multilayer structure for generalization)	Improves model generalization via deep feature extraction	Deep structure exacerbates “black-box” problem	Incorporate symbolic regression (e.g., KAN) for interpretability
	VAE (latent features constrained to normal distribution)	Stabilizes latent feature distribution for reliable monitoring	Normal distribution assumption may mismatch complex industrial data	Adopt flexible distribution constraints (e.g., nonparametric)
	RNN (memory unit for capturing time-series dynamic features)	Models process dynamics via temporal state transfer	Prone to gradient explosion/vanishing in long time-series	Replace with LSTM/GRU to address gradient issues
	LSTM (gate units for gradient preservation in dynamic feature extraction)	Solves RNN gradient issues; captures long-term temporal dependencies	High computational cost for large-scale time-series	Optimize LSTM cell structure for lightweight deployment
	GRU (simplified gate units for dynamic feature extraction)	Balances monitoring performance and computational efficiency	Less effective for ultra-long time-series than LSTM	Extend with attention to enhance long-range dependency capture
	Attention-based LSTM (catalyst activity prediction for methanol reactor)	Improves targeted feature extraction via attention mechanism	Limited to specific process (catalyst activity prediction)	Generalize attention weights for diverse industrial processes
	LSTM-AE (LSTM integrated with AE for batch process time dependencies)	Captures temporal dependencies in batch process data	Less effective for continuous process dynamics	Adapt to continuous processes via sliding window time-series processing
	VRAE (Variational Recurrent AE, VAE + RNN/GRU for nonlinearity/dynamics)	Simultaneously handles process nonlinearity and temporal dynamics	Complex structure increases training complexity	Simplify variational component for industrial applicability
	Differential RNN (embedded difference unit for short-term time-varying info)	Captures short-term dynamic changes in industrial processes	Ignores long-term temporal dependencies	Combine with LSTM to balance short/long-term dynamic feature extraction
	Multi-order difference embedded LSTM	Enhances capture of data periodicity in dynamic processes	Multi-order difference increases computational load	Optimize difference order selection for efficiency-performance balance
	Dynamic inner AE (vector autoregressive model on latent variables)	Captures process dynamics via latent variable autoregression	Limited to linear autoregressive relationships	Extend to nonlinear autoregression for complex dynamics
	Convolutional LSTM AE (forget gates for long-term time dependence)	Extracts long-term temporal dependencies via gated convolution	High parameter count increases memory consumption	Prune redundant parameters for lightweight deployment
	Semi-supervised LSTM ladder AE (labeled + unlabeled data training)	Boosts diagnostic accuracy via semi-supervised learning	Relies on sufficient labeled data for performance	Enhance unsupervised learning capability for scarce labeled data scenarios
	ConvLSTM (convolution + LSTM for spatiotemporal feature extraction)	Captures both spatial relationships and temporal dynamics	3D convolution variant has excessive parameters	Optimize convolution kernel size to reduce parameter count
	ConvLSTM (real-time acoustic anomaly detection in industrial processes)	Validates ConvLSTM for industrial anomaly detection	Limited to acoustic data; lacks generalization to multi-variable processes	Extend input adaptation for diverse industrial sensor data
	PredRNN (addresses layer-independent memory in ConvLSTM)	Improves spatiotemporal feature integration via enhanced memory mechanism	Deep structure increases training difficulty	Simplify memory mechanism for stable industrial application
	PredRNN++ (causal LSTM + Gradient highway unit for deep time modeling)	Alleviates gradient propagation issues in deep spatiotemporal models	High complexity limits real-time deployment	Optimize highway unit for efficient gradient flow and low latency

Table 2. A framework for future research directions in industrial process monitoring.

Research Direction	Key Challenges	Potential Solutions	Expected Outcomes
Latent Space Optimization	Avoiding feature redundancy; Defining constrained latent features without fault labels	Orthogonal constraints; Siamese network structures	To extract more discriminative and interpretable features for improved fault detection
Model Structure Innovation	High computational cost; Limited interpretability	KAN; Symbolic regression and pruning	To develop inherently interpretable deep learning models
Process-Information Fusion	Systematic encoding of domain knowledge; Balancing physical constraints with data-driven flexibility	Construction of secondary variables; PINN; GNN	To create models with superior generalization for industrial environments

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ji, C.; Ma, F.; Rao, J.; Wang, J.; Sun, W. Bridging the Gap in Chemical Process Monitoring: Beyond Algorithm-Centric Research Toward Industrial Deployment. Processes 2025, 13, 3809. https://doi.org/10.3390/pr13123809

AMA Style

Ji C, Ma F, Rao J, Wang J, Sun W. Bridging the Gap in Chemical Process Monitoring: Beyond Algorithm-Centric Research Toward Industrial Deployment. Processes. 2025; 13(12):3809. https://doi.org/10.3390/pr13123809

Chicago/Turabian Style

Ji, Cheng, Fangyuan Ma, Jingzhi Rao, Jingde Wang, and Wei Sun. 2025. "Bridging the Gap in Chemical Process Monitoring: Beyond Algorithm-Centric Research Toward Industrial Deployment" Processes 13, no. 12: 3809. https://doi.org/10.3390/pr13123809

APA Style

Ji, C., Ma, F., Rao, J., Wang, J., & Sun, W. (2025). Bridging the Gap in Chemical Process Monitoring: Beyond Algorithm-Centric Research Toward Industrial Deployment. Processes, 13(12), 3809. https://doi.org/10.3390/pr13123809

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bridging the Gap in Chemical Process Monitoring: Beyond Algorithm-Centric Research Toward Industrial Deployment

Abstract

1. Introduction

2. Data-Driven Process Monitoring

2.1. Statistical Process Monitoring Methods

2.2. Probabilistic Process Monitoring Methods

2.3. Deep Learning Methods

3. Current Limitations and Achievements in the Development of Data-Driven Process Monitoring Models

3.1. Overreliance on Simulated Benchmarks

3.2. Limitations of Deep Learning Models in Data-Driven Process Monitoring: Complexity, Interpretability, and the Path Forward

4. Research on the Establishment of Monitoring Statistics

4.1. Establishment of Monitoring Statistics

4.2. Fusion Statistics

4.3. Residual Statistics of Deep Learning Models

5. Evaluation Metrics for Process Monitoring Models

6. Future Research Suggestions

6.1. Advanced Latent Space Constraints for Deep Learning Models

6.2. Advancements in Deep Learning Model Structure for Process Monitoring

6.3. Generalized Models Incorporating Process Information

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI