A Hybrid Multi-Step Forecasting Approach for Methane Steam Reforming Process Using a Trans-GRU Network

Zhang, Qinwei; Han, Xianyao; Zhang, Jingwen; Qin, Pan

doi:10.3390/pr13072313

Open AccessArticle

A Hybrid Multi-Step Forecasting Approach for Methane Steam Reforming Process Using a Trans-GRU Network

by

Qinwei Zhang

¹,

Xianyao Han

²,

Jingwen Zhang

^3,* and

Pan Qin

⁴

¹

China Fire and Rescue Institute, Beijing 102202, China

²

State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China

³

School of Electrical and Control Engineering, Shenyang Jianzhu University, Shenyang 110819, China

⁴

Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education and the School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(7), 2313; https://doi.org/10.3390/pr13072313

Submission received: 23 June 2025 / Revised: 14 July 2025 / Accepted: 17 July 2025 / Published: 21 July 2025

(This article belongs to the Section Chemical Processes and Systems)

Download

Browse Figures

Versions Notes

Abstract

During the steam reforming of methane (SRM) process, elevated CH₄ levels after the reaction often signify inadequate heat supply or incomplete reactions within the reformer, jeopardizing process stability. In this paper, a novel multi-step forecasting method using a Trans-GRU network was proposed for predicting the methane content outlet of the SRM reformer. First, a novel feature selection based on the maximal information coefficient (MIC) was applied to identify critical input variables and determine their optimal input order. Additionally, the Trans-GRU network enables the simultaneous capture of multivariate correlations and the learning of global sequence representations. The experimental results based on time-series data from a real SRM process demonstrate that the proposed approach significantly improves the accuracy of multi-step methane content prediction. Compared to benchmark models, including the TCN, Transformer, GRU, and CNN-LSTM, the Trans-GRU consistently achieves the lowest root mean squared error (RMSE) and mean absolute error (MAE) values across all prediction steps (1–6). Specifically, at the one-step horizon, it yields an RMSE of 0.0120 and an MAE of 0.0094. This high performance remains robust across the 2–6-step predictions. The improved predictive capability supports the stable operation and predictive optimization strategies of the steam reforming process in hydrogen production.

Keywords:

feature extraction; MIC; Trans-GRU network; multi-step forecasting; SRM process

1. Introduction

Hydrogen energy is recognized as one of the potential energy carriers with clean combustion, high energetic efficiency, renewability, and environmentally friendly advantages [1,2]. The development and utilization of hydrogen fosters a shift away from dependence on depending on fossil fuels (coal, oil, and natural gas), contributing to a transformation of the energy structure to realize net-zero carbon emissions in the context of global carbon neutrality [3]. The steam reforming of methane (SRM) process is a significant industrial method for hydrogen production, which dominates 48 percent of the global hydrogen manufacturing market [4,5]. The SRM process involves two fundamental reactions, exhibiting overall endothermic and reverse features, which include the steam reforming methane reaction as in Equation (1) and water–gas shift reaction as in Equation (2) [6]. Additionally, some side reactions (i.e., methane cracking, carbon deposition, and carbon monoxide disproportionation) also occur in the SRM reformer [7].

CH₄ + H₂O ↔ 3H₂ + CO ΔH = +204 kJ/mol

(1)

CO + H₂O ↔ CO₂ + H₂ ΔH = −41 kJ/mol

(2)

The steam methane reformer serves as a central device in the SRM process, which contains hundreds of reforming tubes and a combustion chamber [8,9]. Feed gas (e.g., methane) and overheated steam react under high temperature (800~1000 °C) and high pressure (1.5~2.0 MPa) with nickel-based materials catalyzing in the reforming tubes [10]. In addition, the required thermal energy of the SRM process is gained from the combustion of fuel and oxidizer streams outside the tubes in the combustion chamber. Many operating parameters, such as the reaction temperature, pressure, steam-to-carbon ratio, and fuel stream concentrations, show extremely key impacts on the SRM reaction efficiency [5,11,12,13,14]. Herein, the CH₄ content at the reformer outlet is selected as a crucial monitoring variable, determining whether the operating parameters are in the normal ranges or not [11,15,16]. Accompanied by the increasing trend of CH₄ outlet content, it indicates the presence of a reduction in CH₄ conversion in the upstream reformer tubes and a decrease in the effective production of H₂ in the downstream pressure swing adsorption (PSA) units, which indicates that some operating parameters are currently undergoing substantial fluctuations or are in an abnormal state. At present, first-line plant operators’ judgment of whether methane exceeds its limiting value mainly relies on their operating experience, understanding of the process, awareness of risks, etc., which will lead to hysteresis in operations [17,18,19,20]. Therefore, it is necessary to conduct advanced (multi-step) predictions of methane content in the hydrogen production process so that operators can take necessary measures promptly. Given that hydrogen production is a highly complex process with a prolonged duration and a large number of operating parameters, it resembles a “black box”—a nonlinear system with an unknown structure and input order—making it challenging to characterize using a physical model [20,21].

In recent years, with the development of data acquisition systems, deep learning methods have provided new approaches for long-term time-series modeling in the above-mentioned black box systems [22]. For complex industrial processes in fields such as chemical engineering, energy, and environmental science, researchers have proposed various deep learning-based predictive models to enhance system stability and optimize operations. Yang et al. propose an attention-based Bi-LSTM algorithm model to realize a real-time prediction in chemical processes, verifying via a Tennessee Eastman process (TEP) [23]. Meanwhile, a DiPLS-LSTM method was applied to predict the time-series data of a 660 MW coal-fired boiler, effectively incorporating the significant historical data features from long ago [24]. Zhao et al. design an operating variable-based graph convolutional network (OV-GCN), taking into account the correlations among operating variables. It presents superior prediction accuracy in the simulated moving bed (SMB) for p-xylene separation [25]. Liu et al. improve a key sample location and distillation Transformer network (KSLD-TNet) based on a traditional Transformer method to realize the multi-step-ahead prediction in mixed potassium washing process and hydrocracking processes [26]. Yuan et al. proposed a multiscale attention-based CNN (MSACNN) soft sensing model for extracting multiscale local spatiotemporal features in chemical process data. Its effectiveness was validated by predicting the quality variables in the hydrocracking and debutanizer column process [27]. Li et al. introduced a Light Attention-Mixed Based Target Autoregression Unit (LAMB-TAU) network for high-accuracy multi-step prediction of hard-to-measure variables, with applications in esterification and formaldehyde production processes [28]. To tackle the feature selection for unequal-length process variable series, the Correlation-Similarity Conjoint Algorithm (CSCA) approach was developed in [29], realizing the multi-step prediction of flow rates, pressures, and temperatures in the actual de-ethanization process. However, the multi-step-ahead forecasting faces the challenge of aliasing long-term spatial–temporal correlation and accumulating errors. Hybrid models integrating different neural network architectures have been widely adopted for multi-step prediction across other fields to tackle the issues. For example, Liu et al. propose a novel deep convolutional recurrent network method, the K-shape and K-means guided convolutional neural network integrating gated recurrent units (KK-CNN-GRU), for short-term multi-step-ahead predictions of wind turbine power generations in [30]. Zhou et al. propose the seq2seq long- and short-term memory model (seq2seq-LSTM) with an attention mechanism (seq2seq-at-LSTM), and a Transformer model, which consists only of the attention mechanism to forecast multi-step-ahead solar radiation [31]. Dai et al. developed a multi-step prediction model that integrates reversible instance normalization (RevIN), a CNN, and a Transformer algorithm for predicting the furnace temperature in the regenerative aluminum smelting process [32].

In practice, a Transformer model leverages the attention mechanism to assign varying weights to input variables, effectively capturing the intricate relationships between inputs and outputs in complex dynamic systems [33]. The gated recurrent unit (GRU), a variant of the recurrent neural network (RNN), effectively addresses the challenges of gradient vanishing and gradient explosion commonly encountered in RNNs by utilizing gated units for long-term time-series processing [34]. Its internal update and reset gates effectively retain critical historical information, such as the influence of a device’s prior operational status on its current state. Moreover, the GRU model reduces the number of gating units by one, compared to the LSTM model. While maintaining similar temporal modeling capabilities, it uses fewer parameters and trains faster, making it more suitable for practical industrial needs. In summary, the Transformer and GRU are effective for long-term time-series modeling and multi-step forecasting [35,36]. However, the unknown order of model inputs can introduce additional uncertainty, potentially affecting predictive performance. To address this, the maximal information coefficient (MIC) method was applied to determine the optimal input order for the data-driven model, as demonstrated in demand forecasting for the fused magnesia smelting process [37].

This work presents novel contributions to multi-step forecasting of methane content in the hydrogen production process via steam reforming. Our contributions are summarized as follows:

We introduce a novel feature selection method based on the MIC. This approach effectively identifies key input variables and their optimal input order, significantly improving both the efficiency and accuracy of feature selection compared to traditional methods.
A novel Trans-GRU network is proposed for multi-step forecasting. This network combines the strengths of Transformer models in capturing long-sequence dependencies with the capabilities of GRU models in modeling local temporal features. The Trans-GRU network enables the simultaneous capture of multivariate correlations and the learning of global sequence representations, addressing the limitations of GRU models in modeling long sequences and overcoming the weakness of Transformer models in industrial forecasting applications.
The proposed approach significantly enhances the accuracy of multi-step methane content prediction. This improved predictive capability supports the stable operation and optimization of the steam reforming process in hydrogen production. Additionally, it provides a valuable reference for multi-step prediction of dynamic, non-stationary industrial data in various industrial processes.

2. Process Description

This paper considered a practical SRM process, as shown in Figure 1. Feed gas, composed of natural gas and refinery gas, is introduced and used to realize hydrogenation reactions, dechlorination reactions, and desulfurization reactions in feed gas purification units. The steam methane reformer serves as a core reaction device, equipped with hundreds of reformer tubes and a combustion chamber. The mixture gas from feed gas purification units and the overheated stream is transported and reacted in the pre-reformer to enhance the CH₄ concentration of the mixture gas. It is delivered and obtains heat through the convection section of the reformer. Then, the mixed gas and overheated vapor with a certain proportion are entered into the reformer tubes and undergo complicated SRM reactions, which generally exhibit endothermic properties. The high-temperature flue gas, generated via fuel gas burning in the combustion chamber, supplies the required heat for the SRM reactions through heat radiation and convection. Finally, the gas products are transported into the shift reactor and pressure swing adsorption (PSA) units to separate the H₂ products.

Herein, a series of process parameters are monitored and collected in real-time, whose locations are marked in Figure 1. For example, feed flow rates of natural gas, refinery gas, and overheated stream are recorded once per second via a distributed control system. In addition, large quantities of process parameters, which are distributed over the reformer unit, are monitored in real-time, such as the top temperature at reformer tubes, the inlet temperature of reaction gas, and the temperature of exhaust flue gas. Importantly, CH₄ content, which stems from the product flow of the SRM reformer and is located at the heat exchanger outlet, is selected as the judgment variable to determine whether the process parameters are in a fluctuating or abnormal state. Two examples of excessive CH₄ content are listed to explain the reasons that the CH₄ content outlet is chosen as a multi-step forecasting variable. The decrease in combustion gas directly leads to the deficiency of heat as fuel combustion could not provide enough energy to support the SRM reaction, resulting in high methane content at the heat exchanger outlet. Moreover, owing to the low steam-to-carbon ratio, carbon deposition reaction inevitably occurs in the reformer tubes, which could cover the active sites and reduce the activity of the catalysts. It also exhibits the high methane content at the heat exchanger outlet.

Based on the actual SRM process, 25 variables of real-time acquisition are set as x_i, which is used as the input variable. CH₄ content at the outlet of the heat exchanger is set as the output variable y. As listed in Table 1, x₁ and x₂ are the feed gas volume flow rates of refinery gas and natural gas, respectively. x₃ and x₄ represent the mass flow rate of overheated steam, which is before and subsequent to the pre-reforming reactors. x₅ is the volume flow rate of fuel methane in combustion gas. x₆ is the oxygen content monitoring in the combustion chamber. x₇–x₉ represent the furnace temperature at different locations. x₁₀ is the inlet temperature of the input reactants into the reformer tubes. x₁₁–x₁₅ are the temperatures, separately located in different regions at the top of the reformer. x₁₆–x₂₅ individually represent the temperatures of flue gas generated from the combustion chamber and monitored in each row (1–10) of the SRM reformer.

3. Intelligent Model for Multi-Step Forecasting

3.1. Structure Overview

To tackle multi-step forecasting, we propose a novel deep learning framework that integrates the MIC with a hybrid Transformer-GRU model. As shown in Figure 2, the MIC optimizes the input sequence order, enhancing the model’s capability to capture dynamic dependencies. The Transformer-based encoder is then used to extract intricate cross-correlations from long-sequence inputs, allowing the model to learn complex multivariate relationships. Finally, a GRU model is employed to perform multi-step forecasting, effectively predicting future values.

3.2. Model Order Determination

Methane steam reforming is a complex, nonlinear process involving multiple reaction steps that evolve over time. The system’s state is influenced by both current operating conditions and historical processes, resulting in non-stationary dynamics in the methane content at the reformer outlet. This dynamic relationship is highly nonlinear and time-varying due to the intricate internal mechanisms of the reaction process and frequent external disturbances. The MIC, a statistical method based on information theory, effectively measures the nonlinear correlations between variables. By calculating the MIC value between each input sequence and the prediction target, we can quantify their interdependence and identify historical data that significantly affects the prediction target. This MIC-based feature selection method extracts dynamic features from the data, providing more representative inputs for deep learning models and improving the accuracy in predicting variations in methane content.

Specifically, we employ the MIC value as a crucial metric to assess the contribution of historical data to future predictions. By ranking the MIC values, we can identify the historical moments with the strongest correlation to the prediction target and select them as input features for the model. Given inputs

x_{o} \in R^{(o + 1) \times 1}

and

y_{o} \in R^{(o + 1)}

of model order, the specific calculation of MIC is as follows [38]:

{mic}_{o} = \max \{\frac{I (x_{o}, y_{o})}{\log_{2} \min {n_{x_{o}}, n_{y_{o}}}}\}

(3)

where

I (x_{o}, y_{o}) = \sum_{j = 1}^{n_{x}} (x_{o}^{j}) \log_{2} \frac{1}{p (x_{o}^{j})} + \sum_{q = 1}^{n_{y}} p (y_{o}^{q}) \log_{2} \frac{1}{p (y_{o}^{q})} - \sum_{j = 1}^{n_{x}} \sum_{q = 1}^{n_{y}} p (x_{o}^{j} y_{o}^{q}) \log_{2} \frac{1}{p (x_{o}^{j} y_{o}^{q})}

(4)

where

n_{x} \cdot n_{y} < {(n + 1)}^{0.6}

,

I (x_{o}, y_{o})

represents the maximal information between x_i and y_i. The parameter

(o + 1)

denotes the number of data points and the order of the model.

n_{x_{o}}

and

n_{y_{o}}

represent the number of bins for x_o and y_o, respectively. For different values of o, the corresponding MIC values,

{mic}_{o}

, can be calculated under various model orders.

When the model order is

(o + 1)

, for the m input variables, the average MIC value across all m variables is calculated. This average MIC value serves as the basis for selecting the optimal model order, calculated as

M I C_{o} = \frac{1}{m} \sum_{j = 1}^{m} m i c_{o}^{j}

(5)

where

m i c_{o}^{j}

represents the MIC value of the jth input variable under the

(o + 1)

th model order.

The optimal input order for the model is determined by MIC_o, and the input to the subsequent Trans-GRU model is

X (k) \in R^{(n + 1) \times (m + 1)}

constructed as

X (k) = [X_{1} (k), \dots, X_{m} (k), Y (k)]

(6)

where

Y (k) = {[y (k - n), \dots, y (k - 1), y (k)]}^{T}

and

X_{v} (k) = {[x_{v} (k - n), \dots, x_{v} (k - 1), x_{v} (k)]}^{T}

,

v = 1, 2, \dots, m

.

3.3. Feature Extraction Based on Transformer-Encoding Modeling

Given the input

X (k)

, the first step is to perform embedding, where we use a multi-layer perceptron (MLP) as the embedding function:

X_{i n} (k) = MLP (X (k))

(7)

where

X_{i n} (k) \in R^{q \times d}

,

q < (n + 1)

and

d < (m + 1)

.

Then, the embedded representation

X_{i n} (k)

is fed into a multi-head self-attention network:

\{\begin{cases} Q_{i} = X_{i n} (k) {W_{i}}^{Q} \\ K_{i} = X_{i n} (k) {W_{i}}^{K} \\ V_{i} = X_{i n} (k) {W_{i}}^{V} \end{cases}, f o r i = 1, \dots, p_{h}

(8)

where

W^{Q}, W^{K}, W^{V} \in R^{d \times d_{k}}

denote the weight matrix for the ith head,

p_{h}

represents the number of attention heads, and d_k is the projected dimension. The

Q_{i}, K_{i}, V_{i} \in R^{q \times d_{k}}

are the linear transformations of

X_{i n} (k)

performed

p_{h}

times.

The output of the attention mechanism for the ith head is

h e a d_{i} = s o f t m a x (\frac{Q_{i} {K_{i}}^{T}}{\sqrt{d_{k}}}) V_{i}

(9)

The multi-head attention output is then obtained by concatenating the results from all attention heads and applying a linear transformation:

H_{h} (k) = C o n c a t [h e a d_{1}, \dots, h e a d_{h}] W^{O}

(10)

where

W^{O} \in R^{(p_{h} d_{k}) \times d}

is the output weight matrix, and

H_{h} \in R^{q \times d}

.

To improve training stability, residual connections and layer normalization are subsequently applied to

X_{i n} (k)

and

H_{h}

, yielding

H_{h}^{o u t} (k) = L a y e r N o r m (H_{i n} (k) + H_{h} (k)) = \frac{H_{i n} (k) + H_{h} (k) - E [H_{i n} (k) + H_{h} (k)]}{\sqrt{V a r [H_{i n} (k) + H_{h} (k)] + ε}} \times γ + β

(11)

where

ε

is a small constant to prevent division by zero, and γ and β are learnable parameters.

The output is then passed through a feed-forward network with ReLU activation:

H_{FFN} (k) = ReLU (H_{h}^{o u t} (k) W_{1}^{F} + b_{1}^{F}) W_{2}^{F} + b_{2}^{F}

(12)

where

W_{1}^{F}

,

W_{2}^{F}

,

b_{1}^{F}

, and

b_{2}^{F}

are the weight and bias parameters of the feed-forward network.

Finally, another residual connection and layer normalization are applied to the output of the feed-forward network:

X_{j}^{G} (k) = LayerNorm (H_{h} (k) + H_{FFN} (k))

(13)

where j is the number of Transformer-encoding model layers,

j = 1, 2, \dots, L_{1}

. When

j = 1

, the multi-head attention mechanism network input is

H_{i n} (k)

, the initial embedding of the input data. When

j > 1

, the input is the output of the previous Transformer layer,

X_{j - 1}^{G} (k)

.

At the L1th Transformer layer, the final output

X^{G} \in R^{q \times d}

represents the extracted feature sequence of variable length, obtained through the inverse Transformer-based encoding mechanism.

3.4. Multi-Step Forecasting Based on GRU Model

Given the

X^{G} (k)

as

X^{G} (k) = [X_{1}^{G}, X_{2}^{G}, \dots, X_{d}^{G}]

(14)

where

X_{t}^{G} = {[x_{t_{1}}^{G}, x_{t_{2}}^{G}, \dots, x_{t_{q}}^{G}]}^{T}, t = 1, 2, \dots, d

is the input of the GRU, which is obtained from the Transformer-encoding mentioned above.

As shown in Figure 3, each GRU with L₂ layers can be characterized as follows:

\{\begin{cases} Z_{t}^{l} = σ (W_{z}^{l} [h_{t - 1}^{l}, X_{t}^{G l}]) \\ r_{t}^{l} = σ (W_{r}^{l} [h_{t - 1}^{l}, X_{t}^{G l}]) \\ {\tilde{h}}_{t - 1}^{l} = \tanh (W_{r}^{l} [r_{t}^{l} ⊙ h_{t - 1}^{l}, X_{t}^{G l}]) \\ h_{t}^{l} = (1 - Z_{t}^{l}) ⊙ h_{t - 1}^{l} + h_{t - 1}^{l} ⊙ {\tilde{h}}_{t - 1}^{l} \end{cases}

(15)

where

l = 1, 2, \dots, L_{2}

represents the layers of the GRU. If

l = 1

, then

X_{t}^{G 1} = X_{t}^{G}

; otherwise,

X_{t}^{G l} = h_{t}^{l - 1}

.

σ (\cdot)

is a logistic sigmoid function and

⊙

is an element-wise multiplication.

The output of

N_{g}

GRUs is

h_{t}^{L} \in R^{N_{g} \times 1}

. The multi-step forecasting model based on GRU at time k of CH₄ content is as follows:

\hat{Y} (k + 1) = W_{d} h_{t}^{L_{2}}

(16)

where

\hat{Y} (k + 1) = {[y (k + 1), y (k + 2), \dots, y (k + s)]}^{T}

and

W_{d} \in R^{s \times N_{g}}

.

The loss function is defined as

L = \frac{1}{N_{n}} \sum_{k = 1}^{N_{n}} {‖\hat{Y} (k) - Y (k)‖}_{2}

(17)

where

N_{n}

is the number of the training data.

4. Results and Discussion

4.1. Data Description

This study utilizes a comprehensive dataset spanning six months of real-world industrial data, collected at a one-minute frequency from a hydrogen production process via steam reforming. The dataset consists of 260,640 data points. A critical operational constraint for this process is maintaining the CH₄ content at the converter outlet below 7% to ensure the stability of downstream processes and the completion of upstream reactions.

Statistical analysis of the historical dataset reveals frequent exceedances of the CH₄ content limit of 7%, as shown in Figure 4. Over the past six months, the limit was exceeded more than 120 times per month, averaging 4–5 instances per day. This significant number of exceedances highlights a critical operational challenge.

Following each exceedance of the 7% CH₄ threshold, on-site operators manually intervene to adjust the process and restore normal operation. The distribution of adjustment times over the past six months is illustrated in the boxplot shown in Figure 5. Statistical analysis indicates that over 80% of exceedances are corrected within six minutes. Based on this observation, a multi-step forecasting horizon of six minutes (s = 6) is selected for this study. All experiments were conducted on an NVIDIA GeForce RTX 4080 GPU with 16 GB of memory.

4.2. Model Order Selection Mechanism

To determine the optimal model order, we employ the MIC for feature selection. Initially, the MIC values for all 25 input variables are calculated at order 0, as shown in Figure 6, with values ranging from 0.1 to 0.7, confirming their correlation with the output variable.

Subsequently, the model order is systematically selected from 1 to 1000, which means MIC_o in Equation (1) is evaluated for o = 0, 1, 2, …, 999. As depicted in Figure 7, the average MIC exhibits an initial increase followed by a decrease, with a peak at an order of 640, which is marked using a red six-pointed star pattern. This indicates that a model order of 640 maximizes the mutual dependence between the input and output data, and is therefore selected as the optimal order.

4.3. Implementation Details

To evaluate the effectiveness of the proposed method, we used the root mean squared error (RMSE) and mean absolute error (MAE) as performance metrics. These metrics are defined as follows:

RMSE = \sqrt{\frac{1}{N_{n}} \sum_{k = 1}^{N_{n}} {(y (k) - \hat{y} (k))}^{2}}

(18)

MAE = \frac{1}{N_{n}} \sum_{k = 1}^{N_{n}} |y (k) - \hat{y} (k)|

(19)

where

y (k)

and

\hat{y} (k)

represent the true and predicted values, respectively, and

N_{n}

is the total number of training data.

For comparative analysis, we conducted experiments using several benchmark models, including temporal convolutional networks (TCNs) [39], gated recurrent units (GRUs) [40], CNN-LSTM [41], and Transformers [33]. The input sequence length was set to 640. To ensure the robustness and reliability of the results, each model was trained for 100 epochs and evaluated 20 times. The average performance across all evaluations was reported. Hyperparameter tuning for all models was performed using a grid search approach to identify optimal settings. The hyperparameter search range for convolutional models, including TCNs and CNNs, covered the convolutional kernel size (3–10) and the stride with a value range of 1 to 10. Recurrent neural network models, such as LSTM and GRUs, adjusted the number of hidden units to a range between 50 and 200, and the number of layers to between 1 and 5. The adjustable parameters of the Transformer model include the number of attention heads (2, 4, 8), the number of encoder layers (1 to 5), and the hidden dimension (64, 128, 256). After hyperparameter optimization, the benchmark models were configured as follows: the TCN model adopts a six-layer structure with a convolution kernel size of 5 and a stride of 2; the GRU model consists of two layers with 160 hidden units each; the CNN-LSTM model includes one convolutional layer (kernel size 9 and stride 3) followed by two LSTM layers with 120 hidden units each; and the Transformer model comprises two encoder layers with an eight-head multi-head attention mechanism, with each head having a dimension of 256. Our proposed architecture utilizes an input sequence length of 640, three stacked Transformer encoder layers, eight multi-head self-attention mechanisms (each with a head dimension of 128), and two-layer GRUs with 160 hidden units in the first layer and 100 hidden units in the second. The final output is generated through a dense layer with six units for multi-step prediction. This architecture is designed to effectively capture long-range dependencies and temporal patterns inherent in multi-rate industrial time-series data. The total number of parameters in the model is 578,406.

4.4. Multi-Step-Ahead Forecasting Results

This section analyzes the multi-step forecasting performance of various methods for industrial data. Figure 8 and Figure 9 present the RMSE and MAE comparison results of different methods at prediction steps 1 to 6. Figure 8 illustrates the RMSE values for each method at each prediction step, while Figure 9 shows the corresponding MAE values. Detailed numerical results are provided in Table 2.

Experimental results demonstrate the superiority of the Trans-GRU model for multi-step CH₄ content prediction in industrial processes. Compared to benchmark models, including TCNs, Transformers, and CNN-LSTM, the Trans-GRU consistently achieves the lowest RMSE and MAE values across all prediction steps (1–6). Specifically, the Trans-GRU outperforms the best GRU benchmark by an average of 53.89% in the RMSE and 53.32% in the MAE. Furthermore, in comparison with the TCN, CNN-LSTM, and Transformer models, the Trans-GRU network surpasses these models by an average of 82.22%, 74.52%, and 76.2% in the RMSE, respectively, and by an average of 82.48%, 74.72%. and 77.09% in the MAE, respectively. Notably, this performance advantage remains stable as the prediction horizon increases, with a 57.31% RMSE and 55.42% MAE reduction at the sixth step.

While the Transformer demonstrates strong capabilities in long-sequence feature extraction and maintains stable prediction accuracy across different prediction horizons, its performance does not fully meet the required accuracy levels. In contrast, the GRU effectively captures local time features but suffers from a gradual decline in accuracy as the prediction horizon increases. At the sixth step, the GRU’s RMSE and MAE increase by 19.77% and 25.27%, respectively, compared to the first step. The hybrid approach mitigates the individual biases of each model, balancing the Transformer’s tendency toward global overgeneralization and the GRU’s local sensitivity, resulting in more accurate and convergent predictions. This improvement can be attributed to the complementary nature of their modeling mechanisms.

Ablation studies, in which the Transformer and GRU were evaluated independently, further validate the synergistic advantages of the Trans-GRU architecture. Neither model alone achieves the performance of the combined Trans-GRU model, emphasizing the critical role of integrating both components for optimal performance.

Figure 10 and Figure 11 illustrate the overall prediction results and the prediction results during 1500–2000 min in detail. The red dashed line in Figure 10 and Figure 11 represents the upper limit of CH₄ outlet content in the actual SRM reformer. Figure 12 displays the error distribution for each model. The Trans-GRU model demonstrates significantly smaller prediction errors and a more concentrated error distribution compared to other models, highlighting its superior robustness. These findings underscore the potential of the Trans-GRU as an innovative and efficient solution for multi-step CH₄ content prediction in industrial processes.

5. Conclusions

In this paper, we presented a novel multi-step forecasting approach for predicting the CH₄ content outlet in the SRM reformer, utilizing a Trans-GRU network. Our approach introduces an innovative feature selection method based on the MIC, which effectively identifies the optimal order of key input variables, thereby enhancing feature selection accuracy. The proposed Trans-GRU network leverages the strengths of Transformer models in capturing long-sequence dependencies while incorporating GRUs to model local temporal features, overcoming the limitations of traditional forecasting methods in industrial applications. The experimental results demonstrate that the proposed approach significantly improves the accuracy of multi-step methane content prediction. The multi-step forecasting performance of Trans-GRU is 0.0120 in the RMSE and 0.0094 in the MAE at 1-step prediction, maintaining the performance advantage during 2–6 step predictions. The synergistic advantages of the Trans-GRU architecture were verified by ablation studies. Compared to TCNs, Transformers, CNN-LSTM, and GRUs, our proposed Trans-GRU network shows smaller prediction errors and a more concentrated error distribution, highlighting its superior robustness. The improvements in prediction accuracy provide promising opportunities for more efficient process control, anomaly detection, and optimization in industrial operations.

Future work will consider extending the proposed approach to encompass a broader range of data periods, covering diverse production conditions throughout the entire SRM cycle. Especially, the data samples that reflect the changes before and after the equipment adjustments and process optimizations will be consciously included. We will also focus on enhancing the robustness and applicability of the proposed approach in real-world industrial settings, facilitating improved process control and decision-making.

Author Contributions

Conceptualization, Q.Z. and X.H.; methodology, X.H.; software, X.H.; validation, Q.Z. and J.Z.; formal analysis, Q.Z.; investigation, Q.Z.; data curation, X.H.; writing—original draft preparation, Q.Z.; writing—review and editing, X.H. and P.Q.; supervision, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Midilli, A.; Ay, M.; Dincer, I.; Rosen, M.A. On hydrogen and hydrogen energy strategies: I: Current status and needs. Renew. Sustain. Energy Rev. 2005, 9, 255–271. [Google Scholar] [CrossRef]
Winter, C.-J. Hydrogen energy—Abundant, efficient, clean: A debate over the energy-system-of-change. Int. J. Hydrogen Energy 2009, 34, S1–S52. [Google Scholar] [CrossRef]
Gunathilake, C.; Soliman, I.; Panthi, D.; Tandler, P.; Fatani, O.; Ghulamullah, N.A.; Marasinghe, D.; Farhath, M.; Madhujith, T.; Conrad, K.; et al. A comprehensive review on hydrogen production, storage, and applications. Chem. Soc. Rev. 2024, 53, 10900–10969. [Google Scholar] [CrossRef] [PubMed]
Chaubey, R.; Sahu, S.; James, O.O.; Maity, S.J.R.; Reviews, S.E. A review on development of industrial processes and emerging techniques for production of hydrogen from renewable and sustainable sources. Renew. Sustain. Energy Rev. 2013, 23, 443–462. [Google Scholar] [CrossRef]
Amini, A.; Bagheri, A.A.H.; Sedaghat, M.H.; Rahimpour, M.R. CFD simulation of an industrial steam methane reformer: Effect of burner fuel distribution on hydrogen production. Fuel 2023, 352, 129008. [Google Scholar] [CrossRef]
Xu, J.; Froment, G.F.J.A.J. Methane steam reforming, methanation and water-gas shift: I. Intrinsic kinetics. AIChE J. 1989, 35, 88–96. [Google Scholar] [CrossRef]
Peng, X.; Jin, Q. Molecular simulation of methane steam reforming reaction for hydrogen production. Int. J. Hydrogen Energy 2022, 47, 7569–7585. [Google Scholar] [CrossRef]
Proell, T.; Lyngfelt, A.J.E. Steam methane reforming with chemical-looping combustion: Scaling of fluidized-bed-heated reformer tubes. Energy Fuel 2022, 36, 9502–9512. [Google Scholar] [CrossRef]
Amirshaghaghi, H.; Zamaniyan, A.; Ebrahimi, H.; Zarkesh, M.J.A.M.M. Numerical simulation of methane partial oxidation in the burner and combustion chamber of autothermal reformer. Appl. Math. Model. 2010, 34, 2312–2322. [Google Scholar] [CrossRef]
Noh, Y.S.; Lee, K.-Y.; Moon, D.J. Hydrogen production by steam reforming of methane over nickel based structured catalysts supported on calcium aluminate modified SiC. Int. J. Hydrogen Energy 2019, 44, 21010–21019. [Google Scholar] [CrossRef]
Ighalo, J.O.; Amama, P.B. Recent advances in the catalysis of steam reforming of methane (SRM). Int. J. Hydrogen Energy 2024, 51, 688–700. [Google Scholar] [CrossRef]
Wang, M.; Tan, X.; Motuzas, J.; Li, J.; Liu, S.J.J.o.M.S. Hydrogen production by methane steam reforming using metallic nickel hollow fiber membranes. J. Membr. Sci. 2020, 620, 118909. [Google Scholar] [CrossRef]
Kumar, A.; Baldea, M.; Edgar, T.F. A physics-based model for industrial steam-methane reformer optimization with non-uniform temperature field. Comput. Chem. Eng. 2017, 105, 224–236. [Google Scholar] [CrossRef]
Olivieri, A.; Vegliò, F. Process simulation of natural gas steam reforming: Fuel distribution optimisation in the furnace. Fuel Process. Technol. 2008, 89, 622–632. [Google Scholar] [CrossRef]
Lai, G.-H.; Lak, J.H.; Tsai, D.-H. Hydrogen production via low-temperature steam-methane reforming using Ni–CeO₂–Al₂O₃ Hybrid Nanoparticle Clusters as Catalysts. ACS Appl. Energy Mater. 2019, 2, 7963–7971. [Google Scholar] [CrossRef]
Rogers, J.L.; Mangarella, M.C.; D’Amico, A.D.; Gallagher, J.R.; Dutzer, M.R.; Stavitski, E.; Miller, J.T.; Sievers, C. Differences in the nature of active sites for methane dry reforming and methane steam reforming over nickel aluminate catalysts. ACS Catal. 2016, 6, 5873–5886. [Google Scholar] [CrossRef]
Severson, K.; Chaiwatanodom, P.; Braatz, R.D. Perspectives on process monitoring of industrial systems. Annu. Rev. Control 2016, 42, 190–200. [Google Scholar] [CrossRef]
Zhu, Z.; Lei, Y.; Qi, G.; Chai, Y.; Mazur, N.; An, Y.; Huang, X. A review of the application of deep learning in intelligent fault diagnosis of rotating machinery. Measurement 2023, 206, 112346. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S.N. A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27, 313–326. [Google Scholar] [CrossRef]
Kumar, A.; Bhattacharya, A.; Flores-Cerrillo, J. Data-driven process monitoring and fault analysis of reformer units in hydrogen plants: Industrial application and perspectives. Comput. Chem. Eng. 2020, 136, 106756. [Google Scholar] [CrossRef]
Kano, M.; Nakagawa, Y. Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32, 12–24. [Google Scholar] [CrossRef]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Han, J.; Chen, F.; Zhang, X.; Yun, C.; Dou, Z.; Yan, T.; Yang, G. Real-time risk prediction of chemical processes based on attention-based Bi-LSTM. Chin. J. Chem. Eng. 2024, 75, 131–141. [Google Scholar] [CrossRef]
Wang, Y.; Qian, C.; Qin, S.J. Attention-mechanism based DiPLS-LSTM and its application in industrial process time series big data prediction. Comput. Chem. Eng. 2023, 176, 108296. [Google Scholar] [CrossRef]
Ding, C.; Yang, M.; Zhao, Y.; Du, W. Graph convolutional network for axial concentration profiles prediction in simulated moving bed. Chin. J. Chem. Eng. 2024, 73, 270–280. [Google Scholar] [CrossRef]
Liu, D.; Wang, Y.; Liu, C.; Yuan, X.; Wang, K. KSLD-TNet: Key sample location and distillation transformer network for multistep ahead prediction in industrial processes. IEEE Sens. J. 2024, 24, 1792–1802. [Google Scholar] [CrossRef]
Yuan, X.; Huang, L.; Ye, L.; Wang, Y.; Wang, K.; Yang, C.; Gui, W.; Shen, F. Quality prediction modeling for industrial processes using multiscale Attention-Based Convolutional Neural network. IEEE T. Cybern. 2024, 54, 2696–2707. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Cao, H.; Li, Z.; Du, W.; Shen, W. A flexible multi-step prediction architecture for process variable monitoring in chemical intelligent manufacturing. Chem. Eng. Sci. 2025, 316, 121943. [Google Scholar] [CrossRef]
Li, Y.; Cao, H.; Wang, X.; Yang, Z.; Li, N.; Shen, W. A new Correlation-Similarity Conjoint Algorithm for developing Encoder-Decoder based deep learning multi-step prediction model of chemical process. Chem. Eng. Sci. 2024, 288, 119748. [Google Scholar] [CrossRef]
Liu, X.; Yang, L.; Zhang, Z. Short-term multi-step ahead wind power predictions based on a novel deep convolutional recurrent network method. IEEE Trans. Sustain. Energy 2021, 12, 1820–1833. [Google Scholar] [CrossRef]
Zhou, Y.; Li, Y.; Liu, W. A multi-step ahead global solar radiation prediction method using an attention-based transformer model with an interpretable mechanism. Int. J. Hydrogen Energy 2023, 48, 15317–15330. [Google Scholar] [CrossRef]
Dai, J.; Ling, P.; Shi, H.; Liu, H. A multi-step furnace temperature prediction model for regenerative aluminum smelting based on reversible instance Normalization-Convolutional Neural Network-Transformer. Processes 2024, 12, 2438. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Xia, M.; Shao, H.; Ma, X.; De Silva, C. A stacked GRU-RNN-Based approach for predicting renewable energy and electricity load for smart grid operation. IEEE Trans. Ind. Inf. 2021, 17, 7050–7059. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.-H.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023. [Google Scholar]
Dey, R.; Salemt, F.M. Gate-variants of Gated Recurrent Unit (GRU) neural networks. In Proceedings of the IEEE International Midwest Symposium on Circuits & Systems, Boston, MA, USA, 6–9 August 2017. [Google Scholar]
Yang, J.; Chai, T.Y.; Luo, C.M.; Yu, W. Intelligent demand forecasting of smelting process using data-driven and mechanism model. IEEE Trans. Ind. Electron. 2019, 66, 9745–9755. [Google Scholar] [CrossRef]
Zhang, Y.; Jia, S.L.; Huang, H.Y.; Qiu, J.Q.; Zhou, C.J. A novel algorithm for the precise calculation of the maximal information coefficient. Sci. Rep. 2014, 4, 6662. [Google Scholar] [CrossRef] [PubMed]
Lea, C.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks: A unified approach to action segmentation. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.H.; Bengio, Y.J.E.A. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv, 2014; arXiv:1412.3555. [Google Scholar] [CrossRef]
Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the SRM process.

Figure 2. Overall architecture of multi-step forecasting model.

Figure 3. GRU cell structure.

Figure 4. Exceedance count of CH₄ content limit in past 6 months.

Figure 5. Exceedance duration distribution in past 6 months.

Figure 6. Variable MIC values.

Figure 7. Model order selection result based on mean of MIC.

Figure 8. RMSE comparison across multiple time horizons.

Figure 9. MAE comparison across multiple time horizons.

Figure 10. Forecasting results of different models from 1-step to 6-step: (a) 1-step; (b) 2-step; (c) 3-step; (d) 4-step; (e) 5-step; (f) 6-step.

Figure 11. Forecasting results of different models during 1500–2000 min from 1-step to 6-step: (a) 1-step; (b) 2-step; (c) 3-step; (d) 4-step; (e) 5-step; (f) 6-step.

Figure 12. Boxplots of forecasting errors for different models from 1-step to 6-step: (a) 1-step; (b) 2-step; (c) 3-step; (d) 4-step; (e) 5-step; (f) 6-step.

Table 1. List of variables in the SRM process.

Variables	Parameters	Unit
x₁	Refinery gas volume flow rate	Nm³/h
x₂	Natural gas volume flow rate	Nm³/h
x₃	Overheated steam mass flow rate before pre-reforming reactions	t/h
x₄	Overheated steam mass flow rate after pre-reforming reactions	t/h
x₅	Gas volume flow rate	Nm³/h
x₆	Oxygen content in the combustion chamber	%
x_7–9	1–3rd furnace temperature	°C
x₁₀	Inlet temperature at reformer tubes	°C
x_11–15	1–4th top temperature at reformer	°C
x_16–25	1–10th temperature of flue gas discharged from reformer	°C
y	Methane content at heat exchanger outlet	%

Table 2. Comparison of multi-step forecasting performance across different models.

Models	1-Step		2-Step		3-Step		4-Step		5-Step		6-Step
Models	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE
TCN	0.0870	0.0692	0.0849	0.0676	0.1165	0.0884	0.1029	0.0812	0.0941	0.0736	0.1018	0.0829
GRU	0.0354	0.0273	0.0374	0.0294	0.0392	0.0311	0.0338	0.0264	0.0358	0.0281	0.0424	0.0342
CNN-LSTM	0.0685	0.0538	0.0678	0.0533	0.0675	0.0526	0.0686	0.0538	0.0682	0.0534	0.0688	0.0540
Transformer	0.0749	0.0602	0.0785	0.0628	0.0723	0.0585	0.0706	0.0572	0.0724	0.0585	0.0714	0.0579
Our method	0.0120	0.0094	0.0168	0.0131	0.0189	0.0146	0.0190	0.0149	0.0187	0.0145	0.0189	0.0146

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Q.; Han, X.; Zhang, J.; Qin, P. A Hybrid Multi-Step Forecasting Approach for Methane Steam Reforming Process Using a Trans-GRU Network. Processes 2025, 13, 2313. https://doi.org/10.3390/pr13072313

AMA Style

Zhang Q, Han X, Zhang J, Qin P. A Hybrid Multi-Step Forecasting Approach for Methane Steam Reforming Process Using a Trans-GRU Network. Processes. 2025; 13(7):2313. https://doi.org/10.3390/pr13072313

Chicago/Turabian Style

Zhang, Qinwei, Xianyao Han, Jingwen Zhang, and Pan Qin. 2025. "A Hybrid Multi-Step Forecasting Approach for Methane Steam Reforming Process Using a Trans-GRU Network" Processes 13, no. 7: 2313. https://doi.org/10.3390/pr13072313

APA Style

Zhang, Q., Han, X., Zhang, J., & Qin, P. (2025). A Hybrid Multi-Step Forecasting Approach for Methane Steam Reforming Process Using a Trans-GRU Network. Processes, 13(7), 2313. https://doi.org/10.3390/pr13072313

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Multi-Step Forecasting Approach for Methane Steam Reforming Process Using a Trans-GRU Network

Abstract

1. Introduction

2. Process Description

3. Intelligent Model for Multi-Step Forecasting

3.1. Structure Overview

3.2. Model Order Determination

3.3. Feature Extraction Based on Transformer-Encoding Modeling

3.4. Multi-Step Forecasting Based on GRU Model

4. Results and Discussion

4.1. Data Description

4.2. Model Order Selection Mechanism

4.3. Implementation Details

4.4. Multi-Step-Ahead Forecasting Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI