Variable Structure Learning-Based Spatio-Temporal Graph Convolutional Networks for Chemical Process Quality Prediction with SHAP-Enhanced Interpretability

Tang, Siyuan; Zhu, Zheren; Zhou, Yuanqiang; Shen, Bingbing; Shen, Ziyan; Yang, Zeyu; Yao, Le

doi:10.3390/pr13113751

Open AccessArticle

Variable Structure Learning-Based Spatio-Temporal Graph Convolutional Networks for Chemical Process Quality Prediction with SHAP-Enhanced Interpretability

by

Siyuan Tang

¹,

Zheren Zhu

¹,

Yuanqiang Zhou

²

,

Bingbing Shen

¹

,

Ziyan Shen

^3,4,

Zeyu Yang

^5,*

and

Le Yao

^1,*

¹

School of Mathematics, Hangzhou Normal University, Hangzhou 311121, China

²

College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China

³

Baima Lake Laboratory Hydrogen Energy (ChangXing) Co., Ltd., Huzhou 313117, China

⁴

Zhejiang Baima Lake Laboratory Co., Ltd., Hangzhou 310051, China

⁵

Huzhou Key Laboratory of Intelligent Sensing and Optimal Control for Industrial Systems, School of Engineering, Huzhou University, Huzhou 313000, China

^*

Authors to whom correspondence should be addressed.

Processes 2025, 13(11), 3751; https://doi.org/10.3390/pr13113751

Submission received: 22 October 2025 / Revised: 15 November 2025 / Accepted: 19 November 2025 / Published: 20 November 2025

(This article belongs to the Special Issue Innovative Approaches to Modeling, Optimization, Control, and Monitoring in Industrial Processes)

Download

Browse Figures

Versions Notes

Abstract

Product quality control in chemical processes faces challenges from dynamic non-stationary data, underutilized variable spatial correlations, and overreliance on prior knowledge. This paper addresses these issues by proposing an enhanced Spatio-Temporal Graph Convolutional Networks (STGCN) for chemical process soft sensing. In this method, the spatio-temporal graph attention mechanism is integrated into the Graph Convolutional Networks, enabling dynamic weighting of neighboring nodes to improve spatiotemporal feature mining and accelerate convergence. Unlike traditional STGCN models that rely on predefined graph structures and prior domain knowledge, this paper proposes the Variable Structure Learning-based Spatio-Temporal Graph Convolutional Networks (VSL-STGCN), which autonomously learns variable relational structures via end-to-end gradient descent and uses SHAP algorithm to select critical variables, reducing computational burden and overfitting risks. Finally, the proposed VSL-STGCN is validated on two real chemical processes, outperforming baseline models in prediction accuracy. Based on the experimental results, the proposed VSL-STGCN achieves about 15% lower RMSE and about 10% higher R² compared to baseline STGCN models. The learned adjacency matrix aligns with actual process mechanisms, ensuring interpretability.

Keywords:

data-driven soft sensing; spatio-temporal graph convolutional network; variable structure learning; SHAP mechanism; chemical quality prediction

1. Introduction

Product quality is a critical factor in chemical processes, significantly impacting market competitiveness, brand value, and customer satisfaction. High-quality products not only enhance a company’s reputation but also drive economic growth and industrial upgrading. In the current global market, consumers increasingly demand products that meet stringent quality standards, making quality control an essential aspect of manufacturing processes [1]. Traditional quality control methods, such as sampling inspection and destructive testing, have several limitations. Sampling inspection provides only a snapshot of product quality and may miss defects in the overall production batch. Destructive testing is not suitable for all products, especially those that cannot be damaged. Furthermore, these methods often rely on manual operations, which are time-consuming, labor-intensive, and prone to human error. Soft sensing technology has emerged as an innovative solution to overcome the limitations of traditional quality control methods [2]. It utilizes data-driven models and machine learning algorithms to predict product quality in real-time without physical measurements [3]. By integrating various data sources, such as process parameters and sensor data, soft sensing enables continuous monitoring and timely detection of quality deviations. It enhances the efficiency and accuracy of quality control, reducing costs and improving product consistency [4].

Data-driven soft sensor modeling has emerged as a powerful tool in industrial quality control, particularly for complex processes where traditional measurement methods are either impractical or too costly. These models leverage historical process data to predict quality indicators without requiring direct physical measurements, thus offering significant advantages in terms of cost, speed, and adaptability. Recent advancements in machine learning and data analytics have further enhanced the capabilities of data-driven soft sensors, enabling them to handle increasingly complex and high-dimensional datasets [5]. Generally, there are not only temporal correlations among the variables in industrial processes, but also spatio-temporal correlation characteristics. Traditional machine learning or deep learning methods can only model data based on the temporal correlations of variables and are unable to effectively extract spatio-temporal correlation characteristics. In recent years, Graph Neural Networks (GNNs), as an effective spatio-temporal feature extraction model, have been widely applied in industrial processes, have shown remarkable potential in addressing the challenges posed by the spatial-temporal dynamics inherent in many chemical/industrial processes [6]. GNNs excel at capturing the intricate relationships and dependencies within graph-structured data, making them particularly suited for scenarios where process variables are interconnected in complex ways. For instance, in semiconductor manufacturing, GNNs can model the interactions between various parameters across different stages of the production process, thereby improving the accuracy of quality predictions [7]. Similarly, in chemical processes, GNNs can effectively handle the spatial correlations between sensors distributed throughout a reactor, providing more reliable estimates of critical quality attributes.

To address the challenges of complex chemical process data with spatial-temporal dynamic features for GNNs, several strategies are employed. Firstly, data preprocessing and feature extraction are essential to reduce noise and extract meaningful features. Advanced techniques like variational mode decomposition (VMD) can be used to preprocess raw data, capturing both temporal variations and spatial correlations [8,9]. Secondly, integrating physical models and domain knowledge into GNNs can enhance their ability to handle complex processes. Dynamic graph construction methods are also crucial, as they allow GNNs to adaptively update graph structures based on real-time data [10,11]. Additionally, hybrid model architectures combining GNNs with other techniques like RNNs or attention mechanisms can better capture temporal dynamics. Transfer learning and domain adaptation leverage knowledge from related domains, while self-learning and online updating enable continuous model refinement [12]. Finally, benchmarking and validation establish standards for evaluating GNN performance in industrial settings. These strategies collectively improve the effectiveness and reliability of GNNs in modeling complex industrial processes with spatial-temporal dynamics.

Despite the progress made in data-driven soft sensor modeling, several gaps remain. Existing models often struggle with the dynamic and non-stationary nature of industrial processes, requiring frequent retraining to maintain performance [13,14]. This limitation highlights the need for self-learning models that can autonomously update and refine their structures in response to changing process conditions. Self-learning graph neural networks, capable of continuous adaptation, would significantly enhance the practicality and reliability of soft sensors in real-world industrial settings. Furthermore, the selection of appropriate input variables is crucial for the effectiveness of soft sensor models [15]. Irrelevant or redundant variables can introduce noise and reduce prediction accuracy. Current methods for input selection include statistical analysis, such as correlation coefficients and principal component analysis, which help identify the most influential process variables [16]. Machine learning-based approaches, including feature importance ranking using random forests or gradient boosting, have also gained popularity [17]. Moreover, domain knowledge and expert judgment play vital roles in guiding the selection process, ensuring that the chosen variables are not only statistically significant but also physically meaningful within the context of the specific chemical processes [18].

This paper summarizes the current state and challenges of industrial soft sensing research. It uses Graph Convolutional Networks (GCNs) to extract spatio-temporal features from industrial data, which is dynamic and nonlinear. Aiming at the high nonlinearity in quality variable detection during industrial production and the neglect of spatial distribution features among variables, this paper optimizes the traditional GCN model by introducing convolution kernels and proposes an industrial quality prediction model. It can effectively capture and utilize the spatial correlations among variables. Furthermore, to address the imperfection of relying solely on current time point information for modeling in practical industrial processes, this paper enhances the traditional GCN model by incorporating graph attention mechanisms, resulting in a spatio-temporal graph attention (STGAT) based industrial quality prediction model.

In modern chemical settings, obtaining prior knowledge about sensor relationships is difficult due to complex working conditions. Constructing variable graphs also requires subjective prior knowledge and mechanisms. To tackle these issues, this paper proposes a self-learning spatio-temporal GCN (SLSTGCN) model for industrial quality prediction with a Shapley value-based variable structure learning (VSL) method. This model reveals variable relationships directly from data using an end-to-end gradient descent algorithm, without needing prior knowledge. It should be mentioned that the SHAP framework was chosen over alternative interpretability methods such as Local Interpretable Model-agnostic Explanations (LIME) [19] and Integrated Gradients (IG) [20], primarily due to its foundation in cooperative game theory. This theoretical basis provides a consistent and fair attribution of feature importance across coalitions, offering both local and global explanations that are inherently additive. In contrast to the perturbation-sensitive local approximations characteristic of LIME or the path-dependent nature of Integrated Gradients, SHAP’s values demonstrate superior stability. These attributes make SHAP particularly well-suited for elucidating complex variable relationships within high-dimensional, graph-structured datasets common in chemical processes, thereby mitigating potential interpretation bias. Finally, the model is trained and tested on the ammonia synthesis chemical process of a real industrial high-low temperature transformer unit and a pre-decarburization unit, confirming its effectiveness and reliability in soft sensing of quality variables.

The main contribution of the present work can be summarized into three aspects. The first contribution is the optimization of the traditional GCN architecture, which can effectively capture the spatial distribution features among variables that are often neglected by conventional methods, thereby significantly improving feature extraction fidelity. Building on this, the STGAT-based quality prediction model incorporates a sophisticated graph attention mechanism, enabling the model to leverage rich temporal history and achieve a comprehensive understanding of spatio-temporal dynamics, which leads to superior prediction accuracy. Our most substantial innovation, however, is the development of the SLSTGCN model, empowered by a novel Shapley value-based Variable Structure Learning (VSL) method. This pioneering approach allows the model to autonomously identify and construct the variable graph directly from raw data, entirely eliminating the need for subjective and often unobtainable prior knowledge about sensor topology and correlations. Collectively, these advancements provide an unprecedentedly powerful and practical tool for product quality prediction in complex chemical processes.

The remainder of this article is organized as follows. Section 2 introduces the preliminaries on the GCNs. In Section 3, the proposed VSL-STGCN model is introduced in detail. Then, the proposed model is applied to build the chemical soft sensing model in Section 4. Section 5 provides two real chemical processes, including the high-low transformer unit and the pre-decarburization unit process, to verify the validity of the proposed model. Finally, Section 6 summarizes the full text and proposes future research directions.

2. Preliminaries

2.1. Graph Convolutional Neural Networks (GCNs)

Graph Convolutional Neural Networks (GCNs) [21] have emerged as a powerful class of deep learning models designed to handle graph-structured data. Unlike traditional Convolutional Neural Networks (CNNs) that operate on grid-like data structures such as images, GCNs are tailored to work with non-Euclidean data (refers to information structured as graphs, e.g., networks of interconnected variables), making them suitable for a wide range of applications including social networks, molecular structures, and, as detailed in this work, industrial process systems [22]. To apply this powerful framework to the domain of industrial process modeling, a critical step is to formalize the physical plant as a graph. An industrial process can be formally represented as a graph

G = (V, E)

, where the node set

V = \{v_{1}, \dots, v_{n}\}

corresponds to n distinct process variables (e.g., temperature, pressure, or flowrate sensors). Each node

v_{i}

is associated with a feature vector

x_{i} \in ℝ^{d}

. These features constitute a node feature matrix

X \in ℝ^{n \times d}

, where each row represents the measurements for the variable at

v_{i}

, such as sensor readings at a specific time interval. The edge set

E \subseteq V \times V

encapsulates the known correlations or physical dependencies between variables. The structure of the graph is described by an adjacency matrix

A \in ℝ^{n \times n}

, where

A_{i j} > 0

if a significant dependency exists from

v_{i}

to

v_{j}

, and

A_{i j} = 0

otherwise. This linkage demonstrates how GCN operations aggregate these interconnected variables to model real-world chemical dynamics.

The architecture of GCNs (shown in Figure 1) typically consists of multiple layers, each responsible for refining node embeddings (vector representations that encode a node’s features and relationships for machine learning processing) by aggregating information from neighbors. The core of this operation is implemented in the graph convolution layers. A single-layer GCN propagation rule is defined as:

H = σ (\tilde{A} X W)

(1)

In this formulation,

\tilde{A}

is the symmetrically normalized adjacency matrix,

\tilde{A} = {\tilde{D}}^{- 1 / 2} (A + I) {\tilde{D}}^{- 1 / 2}

, where

I

is the identity matrix and

\tilde{D}

is the diagonal degree matrix of

(A + I)

. This normalization ensures stable aggregation by accounting for varying node degrees. The product

\tilde{A} X

performs the core aggregation step, where each node’s feature vector is updated by combining its own features with those of its immediate neighbors. This mathematically represents how a process variable’s state is influenced by its directly correlated counterparts.

W \in ℝ^{d \times d^{'}}

is a trainable weight matrix that linearly transforms the aggregated features.

σ

is a nonlinear activation function (e.g., ReLU), applied to the output of each convolutional layer to introduce nonlinearity into the model.

The process begins with an Input Layer that initializes the node features from the raw measurement data (the matrix

X

). The Graph Convolutional Layers are then stacked sequentially. After k layers, a node’s representation incorporates information from its k-hop neighborhood, thereby modeling system-wide effects critical for capturing the true behavior of an industrial chemical process. Depending on the task, optional Pooling Layers can be used to reduce the graph’s dimensionality and capture hierarchical structures. Finally, the Output Layer produces the final node embeddings or predictions for tasks such as fault diagnosis or product quality regression.

2.2. Graph Attention Mechanism

The attention mechanism is a technique used in sequence models to enhance model performance by enabling the model to focus on the most important parts of the input data [23]. The core idea of the attention mechanism is to assign a weight to each element in the input sequence of the model, indicating the importance of each element for the current task. In practice, these weights are typically obtained through a learning process, allowing the model to adaptively focus on the key information in the input sequence. Given a Query and a set of key-value pairs (Key, Value) in the source, the essence of the attention mechanism is to compute a weighted sum based on the query and the corresponding keys, calculated as follows:

A t t e n t i o n (Q u e r y, S o u r c e) = \sum_{i = 1}^{L_{x}} S i m i l a r i t y (Q u e r t, K e y_{i}) * V a l u e_{i}

(2)

Here,

L_{x}

is the length of the sequence. First, the similarity between the query and each key is calculated, which is referred to as the attention coefficient. By performing a weighted sum using these coefficients, the query result can be obtained. The graph Attention (GAT) Networks update the representation of each node based on the attention weights assigned to its neighboring nodes (shown in Figure 2). After introducing the attention mechanism, the attention computation is only related to adjacent nodes, i.e., nodes that share an edge, without the need for information from the entire graph [24]. For the GAT, consider a graph

G = (V, E)

, where

V

is the set of nodes and

E

is the set of edges. For a node

i \in V

, its feature representation is

h_{i} \in ℝ^{F}

, where

F

is the feature dimension. The attention coefficient

α_{i j}

from node

i

to its neighboring node

j

is calculated by the following formula:

α_{i j} = \frac{\exp (L e a k y Re L U (a^{T} [W h_{i} | | W h_{j}]))}{\sum_{k \in N_{i}} \exp (L e a k y Re L U (a^{T} [W h_{i} | | W h_{j}]))}

(3)

where

W \in ℝ^{F^{'} \times F}

is the weight matrix that maps input features from

F

dimensions to

F^{'}

dimensions,

a \in ℝ^{2 F^{'}}

is the parameter vector of the attention mechanism, LeakyReLU is the leaky rectified linear unit,

N_{i}

is the set of neighboring nodes of node

i

, and

[\cdot | | \cdot]

denotes the vector concatenation operation. The new feature representation

h_{i}^{'}

of node

i

is obtained by weighted summation of the features of neighboring nodes:

h_{i}^{'} = \sum_{j \in N_{i}} α_{i j} W h_{j}

(4)

To address the limitations of traditional models in capturing temporal dynamics, the integration of Long Short-Term Memory (LSTM) networks with graph-based models has gained attention. LSTMs are capable of learning long-term dependencies in sequential data, making them suitable for modeling time-series information in industrial processes. By incorporating LSTM networks, the spatio-temporal graph attention model can effectively capture temporal variations and evolution patterns. Additionally, the introduction of the Graph Attention Mechanism further enhances the model’s flexibility and expressive power. GAT allows nodes to assign different weights to their neighbors through an attention mechanism, enabling the model to focus on the most relevant nodes for each prediction task. This attention-based weighting scheme not only improves the model’s ability to handle complex graph structures but also provides interpretability by highlighting important relationships within the data.

3. Methodology

3.1. Spatial Temporal GCN with Variable Structure Learning (VSL)

In chemical processes, the relationships between variables are often complex and dynamic, making it challenging to construct an accurate and objective graph structure that captures these relationships. Traditional methods for building such structures rely heavily on prior knowledge and domain expertise, which can introduce subjectivity and limit the model’s adaptability to new or evolving processes. Moreover, obtaining reliable prior knowledge about the relationships between variables can be difficult, especially in highly complex and nonlinear industrial systems. This challenge underscores the need for a more data-driven and adaptive approach to graph structure learning.

To solve these problems, this paper proposes the variable structure learning Spatial Temporal GCN (VSL-STGCN) model. The diagram of the VSL-STGCN model is shown in Figure 3. It includes the Embedding mapping layer, variable graph structure learning module, graph attention network, the residual connection, the STGCN module, and the final output layer. In this model, the Adjacency Matrix

A

is calculated according to the similarity of the Embedding vector of each node. In a non-Euclidean Graph structure, adjacent nodes have similar graph-embedding representation. The adjacency matrix of the closer nodes

(i, j)

has a larger value

A_{i, j}

. First, randomly initialize embedding matrices of two nodes,

Ψ_{1}, Ψ_{2} \in ℝ^{N \times c}

, and

N

for the node number,

c

for the Embedding dimension. Then, the adjacency matrix

{\tilde{A}}_{a d p}

can be learned as follows:

\begin{array}{l} A_{m i d} = Ψ_{1} Ψ_{2}^{T} \\ {\tilde{A}}_{a d p} = softmax (ReLU (A_{m i d})) \end{array}

(5)

where the

A_{m i d}

is calculated by

A_{m i d i, j} = \sum_{k = 1}^{c} Ψ_{1 i, k} Ψ_{2 k, j}

. The ReLU activation function and the softmax function are utilized for the normalization of the adjacency matrix. In the self-learning module of the graph structure, it is necessary to learn the Embedding vector matrix

Ψ_{1}

of the source node and

Ψ_{2}

of the end node by the gradient descent algorithm. It should be mentioned that the GAT in the VSL-STGCN can be separated into three steps:

(1) Calculate the attention coefficients. In order to obtain better expression ability, the

L - 1

layer node characteristics

h_{i}^{L - 1}

will first be mapped by shared weights

W_{L}

. Then, the attention coefficient

e_{i j}

between neighbor nodes

j

to

i

can be calculated as:

e_{i j} = φ (W_{L} h_{i}^{L - 1}, W_{L} h_{j}^{L - 1})

(6)

where the mapping function

φ

is a shared attention mechanism (set as a fully connected network).

(2) Normalize attention coefficients. The normalization is realized by the softmax function:

α_{i j} = s o f t \max_{j} (e_{i j}) = \frac{\exp (e_{i j})}{\sum_{k \in N (i)} \exp (e_{i k})}

(7)

(3) Obtain the aggregation features. The characteristic representation of node

i

at the

L

layer can be obtained by the weighted summation of neighbor node features:

h_{i}^{L} = σ (\sum_{j \in N (i)} α_{i j} W_{L} h_{j}^{L - 1})

(8)

Here, a residual connection is implemented for the STGCN module. It is implemented as additive layers, where the output of each graph attention or STGCN module is added to its input via skip connections, helping mitigate vanishing gradients in deep training. As a regression problem, the loss function of the proposed model is defined as follows:

\min_{θ} L o s s = \frac{1}{N} \sum_{i = 1}^{N} ({‖{\hat{y}}_{i} - y_{i}‖}_{F}^{2})

(9)

where

y_{i}

and

{\hat{y}}_{i}

are real and predicted values of quality variables, respectively, and

θ

is denoted as the network parameters.

While building the soft sensing model, the input time sequence data will be transferred to the first layer of the VSL-STGCN model. Then the spatial-temporal features and the variable relation will be learned for a describable relationship between variables. Finally, the output prediction

{\hat{y}}_{i}

will be obtained by the last layer.

3.2. Variable Importance Based on SHAP Algorithm

In industrial processes, a multitude of process variables are typically involved, each capturing different aspects of the process dynamics. While these variables provide a comprehensive view of the process, their sheer number poses significant challenges. Utilizing all available variables not only increases the computational burden on the model but also risks overfitting, as the model may capture noise rather than underlying patterns. This issue is particularly pronounced in complex industrial settings where data dimensionality is high, and the relationships between variables are intricate. To address these challenges, it is essential to identify and retain only the most informative variables that contribute significantly to the model’s predictive power. This selective approach not only reduces computational complexity but also enhances the model’s generalizability by focusing on the most relevant features.

To enhance the interpretability of the proposed VSL-STGCN model, the SHAP (SHapley Additive exPlanations) algorithm is employed to evaluate the importance of each input variable [25,26]. The flowchart is given in Figure 4. The SHAP algorithm leverages the concept of Shapley values from cooperative game theory to provide local, model-agnostic explanations for the model’s predictions. This approach helps identify which parts of the graph are most influential in driving the model’s output. It can provide local, model-agnostic explanations for GNNs from cooperative game theory to see which parts of the graph are most influential for the model predictions. It should be mentioned that this study employs SHAP instead of alternative methodologies such as LIME [19] or IG [20], primarily owing to its rigorous theoretical underpinnings in Shapley values. SHAP provides consistent and additive feature attributions that precisely quantify marginal contributions across all possible feature coalitions, rendering it particularly suited for identifying critical variables within complex graph structures. Conversely, LIME generates exclusively local approximations that exhibit significant sensitivity to sampling variations, while permutation importance frequently fails to capture higher-order interactions inherent in nonlinear models. Consequently, SHAP demonstrates superior suitability for delivering robust, model-agnostic analytical insights in chemical process investigations, aligning precisely with the requirements of this research domain.

The Shapley value for a feature or node

j

, denoted as

ϕ_{j}

, is given by the average marginal importance of

j

over all possible coalitions of features and nodes:

ϕ_{j} = \sum_{S \subseteq {1, \dots, F} ∖ {j}} \frac{| S |! (F - | S | - 1)!}{F!} (\frac{f (X_{S} \cup {j}) - f (X_{S})}{f (X_{S})})

(10)

where

F

is the total number of features (including nodes and edges),

X_{S}

is the feature vector with all features in

S

set to 1 and the rest to 0, and

f

is the prediction function of the proposed model. While implementing the above Shapley value for GNNs, a perturbation step can be included by modifying the original dataset by adding noise. It can help to isolate the effect of individual nodes on the predictions. The algorithm ensures that the Shapley values computed are fair, meaning they satisfy the properties of efficiency, dummy, symmetry, and additivity. These properties are fundamental to the concept of Shapley values and ensure that the explanation is coherent and unbiased. Furthermore, visualization tools can be combined to help understand the importance of nodes in the proposed VSL-STGCN-based soft sensor.

4. Soft Sensing of Quality Variable Based on the VSL-STGCN

In this section, we provide a detailed description of the Variable Structure Learning-based Spatio-Temporal Graph Convolutional Network (VSL-STGCN) model structure and explain the end-to-end gradient descent algorithm used for training. The VSL-STGCN model is designed to address the challenges of traditional quality control methods by leveraging advanced machine learning techniques to predict industrial quality variables accurately and efficiently. The core of the model lies in its ability to identify the most critical sensor data for the prediction task through feature importance learning and the introduction of Shapley values. Without relying on prior knowledge, it actively discovers and learns the complex spatio-temporal relationships between variables in the data using an end-to-end gradient descent algorithm.

4.1. VSL-STGCN Model Structure

The proposed Variable Structure Learning-based Spatio-Temporal Graph Convolutional Network (VSL-STGCN) is designed to capture dynamic spatio-temporal dependencies in industrial processes for accurate quality prediction. The model consists of the following key components:

(1): Process variable selection module

By calculating the Shapley values between all input data variables, the relative importance of each input variable to the model output is obtained. The Shapley value for a feature or node

i

is calculated by Equation (10).

(2): Embedding Mapping Layer

The input data is first passed through the embedding mapping layer, which maps the raw features into a higher-dimensional space. This step can be represented as:

E = f_{_e m b e d} (X)

(11)

where

X

is the input feature matrix, is

f_{_e m b e d}

the embedding function, and

E

is the resulting embedding matrix. This step helps the model capture complex relationships between variables and lays the foundation for subsequent graph structure learning.

(3): Variable Graph Structure Learning Module

In the variable graph structure learning module, the model dynamically constructs the adjacency matrix based on the similarity of the embedding vectors. The adjacency matrix is computed by Equation (5). This process relies on no prior knowledge but learns the relationships between variables directly from the data. The construction of the adjacency matrix is key to the model’s adaptive capture of industrial process dynamics.

(4): Graph Attention Network

The Graph Attention Network (GAT) allows the model to assign different attention weights to the neighbors of each node, focusing on the most critical information for the prediction task. The GAT operates by Equations (6)–(8). This mechanism enhances the model’s ability to capture important features and improves prediction accuracy.

(5): Residual Connection

To address the vanishing gradient problem in deep network training, the model incorporates residual connections. These connections facilitate the flow of information through the network, making the training of deep networks more stable.

(6): STGCN Module

The STGCN module integrates spatial and temporal features, effectively capturing the complex dynamics of industrial processes. This module uses the learned graph structure and attention mechanisms to model spatio-temporal relationships.

4.2. Explanation of the End-to-End Gradient Descent Algorithm Used for Training

The VSL-STGCN model is trained using an end-to-end gradient descent algorithm, which optimizes the model parameters to minimize prediction errors. The training process involves the following steps:

(1): Initialization

The model parameters, including the embedding matrices and the weights of the graph attention network, are randomly initialized.

(2): Forward Propagation

The input data passes through various layers of the model, including the embedding mapping layer, graph structure learning module, graph attention network, residual connections, and STGCN module, ultimately generating predictions in the output layer.

(3): Loss Calculation

The model’s predictions are compared with the actual values of the quality variables to calculate the loss function (typically Mean Squared Error, MSE).

(4): Backpropagation

The gradients of the loss function with respect to the model parameters are calculated using the backpropagation algorithm by

\nabla θ L = \partial L o s s / \partial θ

, where

θ

represents the network parameters.

(5): Parameter Update

The model parameters are updated using the calculated gradients through the gradient descent algorithm to reduce prediction errors, given as follows:

θ = θ - η * \nabla_{θ} L o s s

(12)

where

η

is the learning rate.

(6): Iteration

The steps of forward propagation, loss calculation, backpropagation, and parameter update are repeated until model converges or reaches a preset number of training epochs.

The Flowchart of the training process of the proposed VSL-STGCN is shown in Figure 5. Through this end-to-end training approach, the VSL-STGCN model can learn the optimal parameters for predicting quality variables directly from the data, effectively capturing the dynamic characteristics of industrial processes and adapting to the unique attributes of specific industrial processes. In order to facilitate the implementation of the algorithm proposed in this paper, the pseudo-code of the training process of the VSL-STGCN model has been presented in Algorithm 1. It should be mentioned that the proposed VSL-STGCN model involves O(N²) computational complexity in graph structure learning due to adjacency matrix computation. In the case studies, we have added a comparison in analyzing the time and space complexity for the training time of the presented models. The evaluation metrics Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Coefficient of Determination (R²) are used to quantify the prediction performance of the proposed models, and the specific equations are as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(13)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}|

(14)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(15)

where

N

denotes the number of samples,

y_{i}

is the actual value of the quality variable,

{\hat{y}}_{i}

is the model-predicted value, and

\bar{y}

represents the average of the actual quality variable values.

Algorithm 1: Training Process of the VSL-STGCN Model

Input:
Process data

X = \{x_{1}, x_{2}, \dots, x_{T}\}

, where

x_{t} \in ℝ^{N}

.
Quality variable data

Y

.
Number of epochs

E

, learning rate

η

.
Output:
A trained VSL-STGCN model with optimized parameters

θ

.
/* 1. Variable Selection (Offline) */
1: Calculate feature importance for all variables in

X

using SHAP algorithm.
2: Select top-k most important variables to form the input data

X_{s}

.
/* 2. Model Initialization */
3: Initialize model parameters

θ

, including node embedding matrices

Ψ_{1}

,

Ψ_{2}

, and weights for GNN modules.
/* 3. End-to-End Training */
4: for epoch = 1 to

E

do
/* Forward Propagation */
5: // Variable Graph Structure Learning
6:

A_{m i d}

←

Ψ_{1} Ψ_{2}^{T}

7:

{\tilde{A}}_{a d p}

←

softmax (ReLU (A_{m i d}))

8: // Spatio-Temporal Feature Extraction
9:

H_{e m b e d}

←

f_{_e m b e d} (X_{s})

10:

H_{g a t}

←

G A T (H_{e m b e d}, {\tilde{A}}_{a d p})

11:

H_{s t g a c n}

←

STGCN (H_{g a t}, {\tilde{A}}_{a d p})

12:

\hat{y}

←

OutputLayer (H_{s t g c n})

/* Loss Calculation */
13: Loss ←

(1 / N) \sum {‖\hat{y} - y‖}^{2}

/* Backpropagation and Parameter Update */
14: Compute gradients

\nabla_{θ} L o s s

.
15: Update all trainable parameters

θ

(including

Ψ_{1}

,

Ψ_{2}

) using gradient descent:
16:

θ

←

θ - η \nabla_{θ} L o s s

17: end for
18: return Trained model with parameters

θ

.

5. Case Studies

5.1. The High-Low Transformer Unit Case

The High-Low Transformer (HLT) stands as a linchpin in ammonia synthesis chemical process, a component directly adapted from industrial production configurations to address the critical challenge of converting recalcitrant CO to CO₂, an intermediate seamlessly absorbed in downstream CO₂ capture units [27]. Its operational efficacy hinges on minimizing residual CO in process gas while sustaining energy efficiency, making real-time monitoring of outlet CO content (a key quality metric) indispensable. However, industrial practice relies on offline laboratory analysis, which suffers from inherently low sampling frequency, creating a pressing need for robust soft sensing solutions. A schematic of the HLT unit, encompassing 26 process variables (U1–U26), is depicted in Figure 6, with detailed specifications provided in Table 1.

To develop and validate the proposed approach, 4000 historical industrial data samples were acquired, partitioned into training (3000 samples), validation (500 samples), and testing (500 samples) subsets. Recognizing the dynamic nature of industrial processes, a moving window technique was employed to transform static raw data into time-series sequences of uniform length, enabling the capture of temporal dependencies critical for accurate soft sensing. The architectural parameters of the VSL-STGCN model, foundational to subsequent experiments, are detailed in Table 2.

A three-stage iterative optimization framework is designed to enhance model efficiency and interpretability. (1) Baseline Model (VSL-STGCNv1): Initialized with all 26 variables to establish a performance benchmark, leveraging the full input space for training. (2) Variable Pruning via SHAP (VSL-STGCNv2): Variable importance is quantified using SHAP values, visualized in Figure 7, which plots mean SHAP values for the top 20 variables and their sample-level contribution patterns. A threshold of 0.0003 is applied to retain 13 high-importance variables, with the model retrained on this subset to learn an initial variable interaction graph. (3) Iterative Refinement (VSL-STGCNv3): To further optimize performance, variables with progressively lower importance are iteratively excluded, and the model is retrained at each step. This process converged to VSL-STGCNv3, the configuration yielding optimal validation performance, alongside its learned variable interaction graph. Consistent training protocols are applied across all models, i.e., a batch size of 256, the Adam optimizer, and a fixed learning rate of 0.001. These settings are mirrored for comparative models (STGCN, GCN, and LSTM) to ensure fair performance evaluation.

With the key insights emerged from the optimization process, the learned adjacency matrices for VSL-STGCNv2 and VSL-STGCNv3, visualized as heatmaps in Figure 8 and Figure 9, respectively, revealed meaningful variable interaction patterns. Notably, these data-driven graphs aligned with the actual industrial process flow, validating the model’s ability to capture mechanistic relationships. Prediction results for the test set (Figure 10, Figure 11 and Figure 12) demonstrated that superfluous variables hindered performance, while strategic pruning via SHAP and iterative refinement yielded significant improvements.

To benchmark superiority, the proposed VSL-STGCNv3 is compared against STGCN, GCN, and LSTM models using RMSE, MAE, and R² metrics (Table 3). Results confirmed that VSL-STGCNv3, equipped with the optimally pruned variable set and learned interaction graph, outperformed all counterparts, underscoring the value of integrating dynamic variable selection with graph-structured learning in industrial soft sensing. The application of SHAP pruning technique successfully reduced the feature variables from 20 to 11 by eliminating low-importance variables. This optimization strategy effectively mitigated model overfitting and reduced computational load, ultimately achieving 10% improvement in RMSE and 15% enhancement in training efficiency.

In this case, all model structures are implemented using Python 3.9 with PyTorch 1.2.0 and executed on a computing system equipped with dual RTX 3080Ti GPUs, then the Number of total parameters, the training time for the compared models are presented in Table 3. It can be seen that the models with fewer inputs can be trained with high efficiency.

5.2. The Pre-Decarburization Unit Case

The Pre-decarburization (PD) unit is also a critical production component derived from an actual ammonia synthesis chemical process [28,29]. Its primary function is to maximize the removal of carbon dioxide (CO₂) from the original process gas. The core reaction occurs in a CO₂ absorption column: as the process gas flows through the column, CO₂ is absorbed by the amine liquid. Therefore, the primary and most important step is to measure the residual CO₂ content at the unit’s outlet pipe, which serves as a key quality variable for production. In practical operations, residual CO₂ content is measured using a costly online process analyzer. Our objective is to develop a predictive model to replace this analyzer. Based on the process design, a simplified flowchart of the Pre-decarburization unit, including all process instruments, is presented in Figure 13. In this figure, 20 process variables are marked with light green boxes, while the quality variable (residual CO₂ content) is labeled with a yellow box. Detailed descriptions of the process variables in the flowchart are provided in Table 4, respectively.

A total of 10,000 historical data samples were collected from an industrial database. These samples are partitioned into three subsets: the first 7000 samples for model training, the subsequent 2500 samples for validation, and the remaining 500 samples for testing. To capture dynamic characteristics, a moving window technique is employed to extract sequences of uniform length. Table 5 outlines the network structure parameters and experimental settings for the proposed soft sensing model.

Initially, the VSL-STGCN model (denoted as VSL-STGCNv1) is trained using all variables from the original dataset. Variable importance is then evaluated using the SHAP algorithm, with results visualized in Figure 14. A threshold is set to select variables with higher importance, and the VSL-STGCN model is retrained using these selected variables (VSL-STGCNv2), enabling the learning of a variable structure graph. Further, variables with lower importance are iteratively removed, and the model is retrained until the variant with optimal validation performance (VSL-STGCNv3) is obtained, along with its learned variable structure graph. For model training, a batch size of 256 is adopted, and the Adam optimizer is used with a learning rate of 0.001. These hyperparameters are consistent across all VSL-STGCN variants, as well as the comparative models.

For VSL-STGCNv1 (trained on all input variables), SHAP values are computed for all samples. Figure 14 displays the mean SHAP values of the input variables, focusing on the top 20 variables, which indicates their relative importance to the model output. Additionally, the figure illustrates the impact of each input variable on predictions by quantifying SHAP values for individual samples and their contribution to the output. A SHAP value threshold of 0.0002 is applied to select the top 14 variables with the highest importance, which are used to train VSL-STGCNv2. The learned adjacency matrix for VSL-STGCNv2 is visualized via a heatmap in Figure 15, and the corresponding variable structure graph is derived from this matrix. Through iterative removal of variables with lower importance, VSL-STGCNv3 (exhibiting optimal validation performance) is obtained. The adjacency matrix heatmap and variable structure graph for VSL-STGCNv3 are presented in Figure 16. Notably, the model-learned variable relationships demonstrate strong alignment with fundamental physicochemical principles. Specifically, the pronounced pressure-temperature correlations observed in both processes accurately reflect thermodynamic coupling mechanisms. This aligns with core thermodynamic principles, which establishes an intrinsic relationship between pressure and temperature in the PD systems. The model’s successful capture of this fundamental physical law validates its capacity to identify essential thermodynamic characteristics.

Prediction results on the test set for the three VSL-STGCN variants are illustrated in Figure 17, Figure 18 and Figure 19, respectively. Analysis of these figures reveals that superfluous variables failed to enhance model performance; instead, selecting the most important variables improved soft sensing accuracy. Further refinement of the variable set yielded the optimal prediction performance for VSL-STGCNv3. To validate the superiority of the proposed method, comparative experiments are conducted with baseline deep learning models for soft sensing. Table 6 summarizes the prediction performance of all models, evaluated using RMSE, MAE and R². Results indicate that the proposed VSL-STGCN model with the optimally selected variable set (VSL-STGCNv3) achieved the best performance. In this case, the number of parameters, the training time for the compared models are presented in Table 6. It can be seen that the proposed VSL-STGCNv2 with optimal model structure can be trained with highest efficiency.

6. Conclusions and Future Work

This paper tackles critical challenges in chemical process quality prediction, including the inherently dynamic and nonlinear nature of industrial data, the underutilization of spatial correlations among chemical process variables, and overreliance on domain-specific prior knowledge. To address these issues, three enhanced GNN-based models are proposed, each with distinct innovations: First, a GCN is introduced to explicitly model spatial interdependencies between variables, overcoming the limitations of traditional methods in capturing complex nonlinear features within industrial processes. Second, a graph attention mechanism is integrated to replace static convolutional operations in conventional GCNs, enabling dynamic weighting of neighboring nodes. This enhancement not only improves spatiotemporal feature mining capability but also accelerates model convergence and enhances scalability for large-scale variable systems. Third, the VSL-STGCN model represents a key advancement: it integrates feature importance analysis via SHAP values and learns variable relational structures through end-to-end gradient descent. This eliminates the need for predefined prior knowledge, enabling data-driven discovery of meaningful variable interactions. Experimental validation on an ammonia synthesis chemical process demonstrated the superior performance of VSL-STGCN. Notably, the learned adjacency matrix of VSL-STGCN aligned closely with the actual process mechanism, confirming both its predictive effectiveness and physical interpretability. However, the constraints of the proposed VSL-STGCN still exist. For instance, the scalability challenges in ultra-large graphs (e.g., >1000 variables), the trade-offs where SHAP computations increase time for interpretability, and the reliance on high-quality sensor data to avoid noise propagation. These issues are the main constraints that can be researched to seek high performance of the proposed model.

Our future research will focus on three directions on advanced industrial soft sensing: (1) Multimodal Data Fusion: Integrating heterogeneous data streams (e.g., sensor signals, visual monitoring images, and textual process logs) into GCN frameworks to enhance prediction robustness in complex industries. (2) Interpretability Enhancement: Developing fine-grained explainable mechanisms to clarify model decision logic, thereby strengthening trust in predictions for industrial practitioners. (3) Robustness and Generalization: Improving model resilience against noisy/missing data and graph structure anomalies, while enhancing adaptability across diverse industrial datasets to facilitate broader practical deployment. In addition, the transfer learning for adapting models across plants (e.g., fine-tuning on new datasets) can also be implemented on the proposed model.

Author Contributions

Conceptualization, S.T., Z.Y. and L.Y.; methodology, S.T., Z.Z. and L.Y.; software, S.T., Z.Y. and L.Y.; validation, Z.Z. and B.S.; resources, Y.Z. and Z.S.; writing—original draft preparation, S.T. and L.Y.; writing—review and editing, S.T., Z.S. and L.Y.; visualization, Z.S. and L.Y.; supervision, L.Y. and Z.Z.; funding acquisition, Y.Z., Z.Z. and L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by the National Natural Science Foundation of China (NSFC) under Grant 62473121, 62403416 and 62403359, in part by Zhejiang Provincial Natural Science Foundation of China under Grant No. LZ25F030006, and in part by the Natural Science Foundation of Shanghai under Grant 24ZR1472600.

Data Availability Statement

The datasets presented in this article are not readily available due to confidentiality restrictions. It should be explained that the data used in this thesis is derived from an actual chemical enterprise. As this data contains important production process information of the enterprise, which has privacy and confidentiality requirements and is protected by law, we have signed a confidentiality agreement with the enterprise. Currently, during the project implementation period, the research results obtained cannot be disclosed in real data.

Conflicts of Interest

Author Ziyan Shen was employed by the Baima Lake Laboratory Hydrogen Energy (ChangXing) Co., Ltd. and Zhejiang Baima Lake Laboratory Co., Ltd. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lyu, Y.; Zhou, L.; Cong, Y.; Zheng, H.; Song, Z. Multirate mixture probability principal component analysis for process monitoring in multimode processes. IEEE Trans. Autom. Sci. Eng. 2023, 21, 2027–2038. [Google Scholar] [CrossRef]
Zhai, R.; Zhang, X.; Song, Z.; Kano, M. Enhancing reliability of data-driven soft sensors with stable loss function and sample graph. Comput. Chem. Eng. 2025, 202, 109303. [Google Scholar] [CrossRef]
Jia, M.; Yang, C.; Pan, Z.; Liu, Q.; Liu, Y. Adversarial relationship graph learning soft sensor via negative information exclusion. J. Process Control 2025, 145, 103354. [Google Scholar] [CrossRef]
Shen, B.; Jiang, X.; Yao, L.; Zeng, J. Gaussian mixture TimeVAE for industrial soft sensing with deep time series decomposition and generation. J. Process Control 2025, 147, 103355. [Google Scholar] [CrossRef]
Yeo, W.S.; Saptoro, A.; Kumar, P.; Kano, M. Just-in-time based soft sensors for process industries: A status report and recommendations. J. Process Control 2023, 128, 103025. [Google Scholar] [CrossRef]
Rittig, J.G.; Ben Hicham, K.; Schweidtmann, A.M.; Dahmen, M.; Mitsos, A. Graph neural networks for temperature-dependent activity coefficient prediction of solutes in ionic liquids. Comput. Chem. Eng. 2023, 171, 108153. [Google Scholar] [CrossRef]
Guo, J.; Sun, M.; Zhao, X.; Shi, C.; Su, H.; Guo, Y.; Pu, X. General graph neural network-based model to accurately predict cocrystal density and insight from data quality and feature representation. J. Chem. Inf. Model. 2023, 63, 1143–1156. [Google Scholar] [CrossRef]
Ahmed, M.J.; Mozo, A.; Karamchandani, A. A survey on graph neural networks, machine learning and deep learning techniques for time series applications in industry. PeerJ Comput. Sci. 2025, 11, e3097. [Google Scholar] [CrossRef]
Simão, C.; Hugo, T. GNN-Representation Enabled Adaptive Weighting Algorithm for Mechanical Performance Tuning. J. Comput. Methods Eng. Appl. 2024, 4, 1–15. [Google Scholar] [CrossRef]
Bui, K.H.N.; Cho, J.; Yi, H. Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues. Appl. Intell. 2022, 52, 2763–2774. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Wan, H.; Li, X.; Cong, G. Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting. IEEE Trans. Knowl. Data Eng. 2021, 34, 5415–5428. [Google Scholar] [CrossRef]
Gupta, V.; Liao, W.-K.; Choudhary, A.; Agrawal, A. Combining transfer learning and representation learning to improve predictive analytics on small materials data. In Proceedings of the 2024 International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 18–20 December 2024; pp. 981–984. [Google Scholar] [CrossRef]
Wang, X.; Zhang, H.; Zhang, Y.; Wang, M.; Song, J.; Lai, T.; Khushi, M. Learning nonstationary time-series with dynamic pattern extractions. IEEE Trans. Artif. Intell. 2022, 3, 778–787. [Google Scholar] [CrossRef]
Kim, C.S.; Kim, H.B.; Lee, J.M. Self-Explanatory Fault Diagnosis Framework for Industrial Processes Using Graph Attention. IEEE Trans. Ind. Inform. 2025, 21, 3396–3405. [Google Scholar] [CrossRef]
Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
Omuya, E.O.; Okeyo, G.O.; Kimwele, M.W. Feature selection for classification using principal component analysis and information gain. Expert Syst. Appl. 2021, 174, 114765. [Google Scholar] [CrossRef]
Alsahaf, A.; Petkov, N.; Shenoy, V.; Azzopardi, G. A framework for feature selection through boosting. Expert Syst. Appl. 2022, 187, 115895. [Google Scholar] [CrossRef]
Graziani, S.; Xibilia, M.G. Multiple correlation analysis for finite-time delay estimation for soft sensor design in the presence of noise. IEEE Trans. Instrum. Meas. 2023, 72, 3307748. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
Qi, Z.; Khorram, S.; Fuxin, L. Visualizing Deep Networks by Optimizing with Integrated Gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11890–11898. [Google Scholar] [CrossRef]
Winkler, N.P.; Neumann, P.P.; Albizu, N.; Schaffernicht, E.; Lilienthal, A.J. GNN-DM: A Graph Neural Network Framework for Real-World Gas Distribution Mapping. IEEE Sens. J. 2025, 25, 42171–42179. [Google Scholar] [CrossRef]
Phan, H.T.; Nguyen, N.T.; Hwang, D. Aspect-level sentiment analysis: A survey of graph convolutional network methods. Inf. Fusion 2023, 91, 149–172. [Google Scholar] [CrossRef]
Sun, C.; Li, C.; Lin, X.; Zheng, T.; Meng, F.; Rui, X.; Wang, Z. Attention-based graph neural networks: A survey. Artif. Intell. Rev. 2023, 56 (Suppl. 2), 2263–2310. [Google Scholar] [CrossRef]
Shao, P.; He, J.; Li, G.; Zhang, D.; Tao, J. Hierarchical graph attention network for temporal knowledge graph reasoning. Neurocomputing 2023, 550, 126390. [Google Scholar] [CrossRef]
Antwarg, L.; Miller, R.M.; Shapira, B.; Rokach, L. Explaining anomalies detected by autoencoders using Shapley Additive Explanations. Expert Syst. Appl. 2021, 186, 115736. [Google Scholar] [CrossRef]
Yao, L.; Yang, Z.; Zhang, Z.; Tang, S.; Shen, B.; Zeng, J. Input factor selection based on interpretable neural network for industrial virtual sensing application. IEEE Trans. Instrum. Meas. 2023, 72, 3323006. [Google Scholar] [CrossRef]
Li, Y.; Han, W.; Shao, W.; Zhao, D. Virtual sensing for dynamic industrial process based on localized linear dynamical system models with time-delay optimization. ISA Trans. 2023, 133, 505–517. [Google Scholar] [CrossRef]
Yao, L.; Ge, Z. Distributed parallel deep learning of hierarchical extreme learning machine for multimode quality prediction with big process data. Eng. Appl. Artif. Intell. 2019, 81, 450–465. [Google Scholar] [CrossRef]
Zhou, L.; Zheng, J.; Ge, Z.; Song, Z.; Shan, S. Multimode process monitoring based on switching autoregressive dynamic latent variable model. IEEE Trans. Ind. Electron. 2018, 65, 8184–8194. [Google Scholar] [CrossRef]

Figure 1. Illustration of the Graph Convolutional Neural Networks.

Figure 2. Illustration of the Multi-Head Graph Attention Mechanism.

Figure 3. Spatial Temporal GCN with Variable Structure Learning (VSL).

Figure 4. The proposed VSL-STGCN with variable selection mechanism.

Figure 5. Flowchart of the training process of the proposed VSL-STGCN.

Figure 6. The flowchart of the High-Low Transformer (HLT) Unit.

Figure 7. The mean SHAP value of the input variables (top 20) and the impacts on predictions of input variables in the HLT case.

Figure 8. Heatmap of adjacent matrix for selected variable set and graph structure for selected variable set in the HLT case.

Figure 9. Heatmap of adjacent matrix for optimal variable set and graph structure for optimal variable set in the HLT case.

Figure 10. Predicted plots by VSL-STGCNv1 in the HLT case.

Figure 11. Predicted plots by VSL-STGCNv2 in the HLT case.

Figure 12. Predicted plots by VSL-STGCNv3 in the HLT case.

Figure 13. The Flowchart of the Pre-decarburization (PD) Unit.

Figure 14. The mean SHAP value of the input variables (top 20) and the impacts on predictions of input variables in the PD case.

Figure 15. Heatmap of adjacent matrix for selected variable set and graph structure for selected variable set in the PD case.

Figure 16. Heatmap of adjacent matrix for optimal variable set and graph structure for optimal variable set in the PD case.

Figure 17. Predicted plots by VSL-STGCNv1 in the PD case.

Figure 18. Predicted plots by VSL-STGCNv2 in the PD case.

Figure 19. Predicted plots by VSL-STGCNv3 in the PD case.

Table 1. Descriptions of the 26 process variables in the HLT Unit.

Tags	Descriptions	Tags	Descriptions
U1	Flowrate to HTT	U14	Temp. of HTT down level
U2	Content of Ar to HTT	U15	Pressure at the exit of LTT
U3	Content of CO to HTT	U16	Exit process gas temp. of HTT
U4	Content of CH₄ to HTT	U17	Temp. of BFW at E2
U5	Content of H₂ to HTT	U18	Exit process gas temp. of E2
U6	Flowrate to LTT	U19	Temp. of LTT up level
U7	Content of Ar to LTT	U20	Temp. of LTT middle level
U8	Content of CO₂ to LTT	U21	Temp. of LTT down level
U9	Content of CH₄ to LTT	U22	Level of E3
U10	Content of H₂ to LTT	U23	Pressure of process gas of exit
U11	Content of N₂ to LTT	U24	Exit process gas temp. of LTT
U12	Temp. of HTT up level	U25	Temp. of recycled N₂ at condenser
U13	Temp. of HTT middle level	U26	Entrance process gas temp. of LTT

Table 2. The settings of the network structure for the HLT case study.

Hyper-Parameters	Value	Tags	Descriptions
Graph-Embedding Dim	16	Channels	3, 3
Node-Embedding Dim	8	Convolution Kernels	4, 4
Heads	8	Convolution Stride	1, 1
Nodes	26	Pooling Kernels	2, 2
Learning Rate	0.001	Epoch Num.	200
Batch Size	256	Optimizer	Adam

Table 3. Prediction performances of the compared models.

Index\ Models	VSL-STGCNv1	VSL-STGCNv2	VSL-STGCNv3	STGCN	GCN	LSTM
Num. of Paras.	15,638	12,445	9114	7091	1253	745
RMSE	0.00381 (±0.0006)	0.00345 (±0.0004)	0.00317 (±0.0002)	0.00392 (±0.0012)	0.00425 (±0.0016)	0.00484 (±0.0010)
MAE	0.00317 (±0.0011)	0.00298 (±0.0007)	0.00279 (±0.0004)	0.00326 (±0.0014)	0.00375 (±0.0013)	0.00413 (±0.0013)
R²	0.6453 (±0.0210)	0.6972 (±0.0171)	0.7740 (±0.0123)	0.6277 (±0.0211)	0.5982 (±0.0118)	0.5250 (±0.0125)
Training Time (s)	646 (±12)	559 (±10)	475 (±10)	373 (±6)	216 (±5)	94 (±5)

Table 4. Descriptions of the process variables in the Pre-decarburization Unit.

Tags	Descriptions	Tags	Descriptions
U1	Flow-rate of Feed Natural Gas	U11	Temperature of Process Gas at Absorption Column
U2	Level of Feed Gas Separator	U12	Level #1 of Absorption Column
U3	Pressure Difference of Feed Gas Separator	U13	Pressure of Process Gas to Absorption Column
U4	Pressure of Feed NG	U14	Level #2 of Absorption Column
U5	Temperature of Feed NG	U15	Temperature in the Middle of Absorption Column
U6	Level of Process Gas Separator	U16	Level #3 of Absorption Column
U7	Pressure Difference of Absorption Column	U17	Pressure of Process Gas at the Top of Absorption Column
U8	Pressure of Feed Gas Separator	U18	Temperature of Amine Liquor to Absorption Column
U9	Temperature of Process Gas Separator	U19	Temperature of Process Gas at the Top of Absorption Column
U10	Pressure of Process Gas Separator	U20	Level of Regeneration Column
Y	Content of Residual CO₂ in the Process Gas

Table 5. The settings of the network structure for the PD case study.

Hyper-Parameters	Value	Tags	Descriptions
Graph-Embedding Dim	10	Channels	3, 3
Node-Embedding Dim	6	Convolution Kernels	3, 3
Heads	4	Convolution Stride	1, 1
Nodes	20	Pooling Kernels	2, 2
Learning Rate	0.001	Epoch Num.	200
Batch Size	256	Optimizer	Adam

Table 6. Prediction performances of the compared models in the PD case.

Index\ Models	VSL-STGCNv1	VSL-STGCNv2	VSL-STGCNv3	STGCN	GCN	LSTM
Num. of Paras.	14,442	11,507	8423	5898	1135	697
RMSE	0.0261 (±0.0012)	0.0195 (±0.0010)	0.0163 (±0.0009)	0.0305 (±0.0021)	0.0384 (±0.0026)	0.0391 (±0.0016)
MAE	0.0202 (±0.0007)	0.0149 (±0.0006)	0.0127 (±0.0005)	0.0256 (±0.0017)	0.0291 (±0.0022)	0.0312 (±0.0011)
R²	0.9567 (±0.0025)	0.9757 (±0.0020)	0.9831 (±0.0018)	0.9489 (±0.0028)	0.9312 (±0.0032)	0.9265 (±0.0019)
Training Time (s)	636 (±8)	594 (±6)	429 (±5)	336 (±5)	248 (±3)	82 (±3)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, S.; Zhu, Z.; Zhou, Y.; Shen, B.; Shen, Z.; Yang, Z.; Yao, L. Variable Structure Learning-Based Spatio-Temporal Graph Convolutional Networks for Chemical Process Quality Prediction with SHAP-Enhanced Interpretability. Processes 2025, 13, 3751. https://doi.org/10.3390/pr13113751

AMA Style

Tang S, Zhu Z, Zhou Y, Shen B, Shen Z, Yang Z, Yao L. Variable Structure Learning-Based Spatio-Temporal Graph Convolutional Networks for Chemical Process Quality Prediction with SHAP-Enhanced Interpretability. Processes. 2025; 13(11):3751. https://doi.org/10.3390/pr13113751

Chicago/Turabian Style

Tang, Siyuan, Zheren Zhu, Yuanqiang Zhou, Bingbing Shen, Ziyan Shen, Zeyu Yang, and Le Yao. 2025. "Variable Structure Learning-Based Spatio-Temporal Graph Convolutional Networks for Chemical Process Quality Prediction with SHAP-Enhanced Interpretability" Processes 13, no. 11: 3751. https://doi.org/10.3390/pr13113751

APA Style

Tang, S., Zhu, Z., Zhou, Y., Shen, B., Shen, Z., Yang, Z., & Yao, L. (2025). Variable Structure Learning-Based Spatio-Temporal Graph Convolutional Networks for Chemical Process Quality Prediction with SHAP-Enhanced Interpretability. Processes, 13(11), 3751. https://doi.org/10.3390/pr13113751

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Variable Structure Learning-Based Spatio-Temporal Graph Convolutional Networks for Chemical Process Quality Prediction with SHAP-Enhanced Interpretability

Abstract

1. Introduction

2. Preliminaries

2.1. Graph Convolutional Neural Networks (GCNs)

2.2. Graph Attention Mechanism

3. Methodology

3.1. Spatial Temporal GCN with Variable Structure Learning (VSL)

3.2. Variable Importance Based on SHAP Algorithm

4. Soft Sensing of Quality Variable Based on the VSL-STGCN

4.1. VSL-STGCN Model Structure

4.2. Explanation of the End-to-End Gradient Descent Algorithm Used for Training

5. Case Studies

5.1. The High-Low Transformer Unit Case

5.2. The Pre-Decarburization Unit Case

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI