Hybrid Data-Driven and Mechanistic CO2 Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber

Chai, Sida; Guo, Siyu; Mercangöz, Mehmet

doi:10.3390/pr14060916

Open AccessArticle

Hybrid Data-Driven and Mechanistic CO₂ Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber

by

Sida Chai

^*,

Siyu Guo

and

Mehmet Mercangöz

Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(6), 916; https://doi.org/10.3390/pr14060916

Submission received: 13 January 2026 / Revised: 12 February 2026 / Accepted: 18 February 2026 / Published: 12 March 2026

(This article belongs to the Section Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Gas analyzers in post-combustion CO₂ capture plants are accurate but slow and sequential, yielding sparse, non-synchronous concentration records across absorber stages. We address this missing-data problem by reconstructing continuous CO₂ profiles with Moving Horizon Estimation (MHE) constrained by a mechanistic absorber model and available measurements; these MHE reconstructions are used as supervisory labels to train an end-to-end Stacked Denoising Autoencoder–Gated Recurrent Unit (SDAE-GRU) model. At run time, we deploy a hybrid soft sensor using the SDAE-GRU together with the mechanistic model and fuse their open-loop predictions via covariance-weighted blending with Gaspari-Cohn localization. We validate this approach on a pilot-scale MEA absorber using data from seven pilot runs conducted at distinct operating conditions, using datasets 1–5 for training/tuning and 6–7 for blind validation. On the blind validation runs, the hybrid estimator achieves a MAPE of 3.79% for stage-wise CO₂ predictions (averaged over all stages and time samples), outperforming both constituents evaluated standalone: 7.86% for the GRU-only soft sensor and 6.79% for the mechanistic model. Because MHE is used only offline to generate labels and to estimate model-error covariances, the deployed estimator is lightweight and suitable for online monitoring.

Keywords:

moving horizon estimation; hybrid model; denoising autoencoder; carbon capture; data fusion; dimension reduction

1. Introduction

The increasing amount of

{CO}_{2}

and other greenhouse gas (GHGs) emissions caused by anthropogenic activities—mostly from the combustion of fossil fuels—are shown to be responsible for global warming [1,2]. Net-zero emission plans consider that by 2050 the energy production provided by oil, coal, natural gas and other fossil fuels will still take around 20% [3]. Fossil fuel power plants are now integrating carbon capture and storage technologies to address the intermittency of weather-dependent renewable energy sources. This initiative aims to enhance the reliability of the energy system by providing alternative power sources during periods of instability in renewable energy supply, thereby facilitating a smooth transition in energy transformation [4]. Monoethanolamine (MEA)-based post-combustion carbon capture (PCC) technology has evolved into a mature and widely implemented solution for mitigating greenhouse gas emissions from fossil fuel power plants and various industrial processes [5,6]. This method aims to efficiently absorb

{CO}_{2}

from exhaust gases, preventing its release into the atmosphere and facilitating its secure storage downstream [7].

Steady-state and dynamic models for PCC absorption columns are well established [8,9], but most works emphasize temperature profiles and estimation of

{CO}_{2}

concentration profiles remains limited [10]. Availability of accurate

{CO}_{2}

profiles enable optimal MEA circulation-rate selection, improving capture efficiency and reducing energy use, and provide key states for controller design and safe operation [11].

Gas absorption processes frequently face missing

{CO}_{2}

measurements because gas analyzers sample locations sequentially and are subject to delays; gaps in the data degrade training quality and model generalization. Conversely, high-dimensional plant data introduce redundancy that obscures the most informative variables. Robust estimation therefore benefits from (i) principled handling of missing data and (ii) dimensionality reduction and noise filtering. Soft sensing is a mature approach for estimating unmeasured or asynchronously measured states [12] using mechanistic or data-driven models and has strong potential for use in PCC processes.

We consider the development of a soft sensor for PCC absorber columns using a hybrid framework. First, we develop a simplified mechanistic model of the absorber. Second, we design a moving-horizon estimator (MHE) that uses plant measurements to reconstruct continuous, physically consistent

{CO}_{2}

concentration profiles despite missing measurements. Third, these MHE estimates provide labels for supervised training of an end-to-end SDAE–GRU soft sensor that learns compact features and temporal dynamics. Finally, we combine mechanistic and data-driven predictions within a unified hybrid predictor to forecast

{CO}_{2}

profiles across the column.

1.1. Related Work

Accurate real-time monitoring starts with a dynamic model that reflects the process physics. Mechanistic models are attractive because they encode conservation laws, thermodynamics, and kinetics. They can predict reliably across operating regimes, but their complexity can hinder control and optimization. On the mechanistic side, Morgan et al. developed a thermodynamic framework for VLE, enthalpy, and solution chemistry; they regressed e–NRTL parameters in Aspen Plus^® to VLE, heat-capacity, and heat-of-absorption data and reduced complexity using information-theoretic criteria [13]. Putta et al. used a two-dimensional, rate-based absorber model to study how choices in thermodynamics (VLE, Henry’s law constant), reaction-kinetic correlations, and

{CO}_{2}

diffusivity affect predicted capture performance [14].

Data-driven models, especially neural networks, are faster to develop and cheaper to evaluate. Their main limitation is distribution shift: performance degrades when conditions depart from the training set. A hybrid strategy mitigates these trade-offs. It combines physics (for extrapolation and interpretability) with learning (for flexibility and speed). For example, Xing et al. coupled extended adaptive hybrid functions with a multiphysics model for data-driven, multi-objective optimization and techno-economics of the GDE-based

{eCO}_{2} RR

process [15]. Caprio et al. benchmarked hybrid regressors—ridge, decision-tree, SVM, and feed-forward ANN—on

{CO}_{2}

capture spray columns [16]. Tian et al. construct performance prediction models for a carbon capture experimental system embedded with physical constraints, utilizing three machine learning algorithms: Random Forest (RF), Back Propagation Neural Network (BPNN) and Convolutional Neural Network (CNN) [17]. Jiang et al. propose a novel model, named ICEEMDAN-Inception-Transformer, to thoroughly explore the relationship between power data and carbon emissions, providing precise hourly carbon emission acquisition for power enterprises [18]. Although the aforementioned AI-based methods demonstrate certain advantages in prediction accuracy compared with alternative approaches, their computational cost is relatively high due to model complexity. Moreover, these models strongly rely on high-frequency and high-accuracy measurements, which are difficult to obtain in practical industrial settings. In addition, legacy carbon capture systems are often not readily compatible with purely AI-based models.

Both pure and hybrid models can drift during deployment because of disturbances, noise, and modeling error. State estimation closes this gap. State estimation methods such as Kalman filtering and MHE fuse models with measurements to deliver consistent state trajectories [19]. MHE formulates a constrained optimization over a sliding window to minimize the mismatch between predicted and measured outputs while enforcing the process dynamics [20]. It updates the estimate at each sampling time, handles nonlinear models and constraints, and has been applied to many noisy process settings [19,21]. Its ability to incorporate nonlinear state and measurement equations makes it well suited to absorption columns [22]. Wang et al. illustrated this by pairing an LSTM surrogate with an MHE layer to robustly monitor key

{CO}_{2}

capture variables under unknown disturbances and sensor faults [11]. However, in their study, the proposed methodology is relatively complex and computationally expensive, involving the calculation and tuning of the MHE covariance matrices, frequent updates of the machine learning models, and a strong reliance on high-density and high-accuracy raw measurement data. In our study, the entire system can be simulated and its states estimated using fewer and coarser measurements, while the update procedure of the model covariance matrices is simplified to reduce the overall computational burden by adding covariance localization.

High-dimensional plant data introduce additional challenges: noise, redundancy, and asynchronous sampling. Feature learning and dimensionality reduction address these issues. Autoencoders learn compact latent variables that preserve salient structure [23,24]. Denoising autoencoders (DAE) improve robustness to corrupted inputs [25]; stacked DAEs (SDAE) add depth, enabling hierarchical feature extraction and better generalization in high-noise, high-dimensional settings [26]. However, features optimized only for reconstruction are not always ideal for forecasting [27]. End-to-end training—jointly optimizing a (S)DAE front-end with a time-series predictor—aligns representation learning with the prediction loss and often improves accuracy and simplicity [28]. Hybrid encoder–decoder models with GRU decoders have also advanced sequence forecasting in related energy applications [29].

Covariance localization further stabilises estimation in the presence of limited data. The Gaspari–Cohn taper is a compactly supported, fifth-order correlation function used to attenuate spurious long-range correlations via a Schur (Hadamard) product with the background covariance. It preserves positive semidefiniteness and is standard in EnKF and variational assimilation [30]. Stanley et al. extended localization to multivariate settings while maintaining positive semidefiniteness, enabling consistent updates across strongly coupled variables [31].

1.2. Contributions of the Paper

This paper is an extension of Zhuang’s work [32], building on the background outlined in the previous subsection, this paper makes the following key contributions:

1. Development of a simplified mechanistic model: A more compact mechanistic model is developed for the absorption column in the carbon capture pilot plant. The compact mechanistic model can predict the

{CO}_{2}

concentration and temperature at different locations of the absorption column accurately making it suitable for real-time applications.

2. Development of an MHE framework: A moving horizon estimator is designed based on the mechanistic model. This framework integrates real-time measurements from the pilot plant, providing a more robust and accurate estimation of the

{CO}_{2}

concentration profile compared to open-loop mechanistic model predictions. Compared with prior work, the use of Moving Horizon Estimation (MHE) in this study is essential due to the discontinuous nature of CO₂ concentration measurements in absorber systems caused by limitations of gas analysis equipment. While the training of downstream machine learning models requires complete concentration trajectories, previous work relied on simple linear interpolation to reconstruct missing measurements. However, linear interpolation has no physical basis and does not reflect the actual dynamic evolution of CO₂ concentration in the absorber, and is therefore inconsistent with practical operating conditions.

In contrast, the proposed MHE framework is built upon a mechanistic model and incorporates additional external measurements to reconstruct physically consistent CO₂ concentration profiles. The resulting concentration trajectories provide a realistic representation of the underlying process dynamics and serve as reliable training data for subsequent machine learning models.

3. Training of SDAE-GRU models: Considering the high computational cost associated with running the MHE in real-time, we used the offline results from MHE runs as labels to train a robust machine learning model representing a more practical and efficient alternative.

4. Hybrid modeling framework: The predictions from the SDAE-GRU model are integrated with the predictions generated by the standalone mechanistic model. This fusion is performed based on the respective error covariance matrices of their predictions, while the Gaspari-Cohn method is simultaneously applied for covariance localization. Error covariance matrices are updated in batches by running the MHE on windows of selected past data.

1.3. Organization of the Paper

The rest of the paper is structured as follows: Section 2 provides a description of the carbon capture system and the data sets under study. Section 3 outlines the methodology, including the mechanistic model, SDAE-GRU framework, the MHE formulation, and the data fusion method. Section 4 presents the results obtained from the implementation of the proposed solution using pilot plant data mimicking online operation of the state estimator, and finally Section 6 presents the conclusions and future research directions.

2. System Under Study

Figure 1 shows a simplified flowsheet of the pilot-scale carbon capture plant utilized in this study. The input gas stream is generated by mixing heated

{CO}_{2}

and

N_{2}

in varying ratios to simulate flue gas produced by upstream natural gas or coal-fired power plants. To simplify operation and minimize potential corrosion of the equipment, sulfur oxides, nitrogen oxides, and particulate matter are excluded from the gas mixture. The prepared gas stream is first cooled and then introduced at the bottom of the absorption column, then the flue gas rises through the packed sections of the column, where dissolved

{CO}_{2}

is removed via a reversible reaction between

{CO}_{2}

and monoethanolamine (MEA) in a lean aqueous MEA solution supplied from the top of the column. This process removes more than 95% of the

{CO}_{2}

. The

N_{2}

-rich gas exiting the absorption column is collected and reused for flue gas preparation. The

{CO}_{2}

-rich MEA solution is directed through a heat recovery system to a stripper column, where the reaction is reversed and

{CO}_{2}

is released by applying heat through a reboiler. The regenerated lean MEA solution is recycled back to the absorption column, while the recovered

{CO}_{2}

is collected and, along with the

N_{2}

, reused for preparing the input gas stream.

Figure 2 illustrates the overall configuration of the carbon capture pilot plant, as well as the structural details of a single stage within the absorption column. Due to limitations in the shooting angle and the internal structure of the building, it is not possible to capture all five stages in a single image. The left image therefore shows only the third to fifth floors. Gas flows upward while liquid flows downward between stages, forming a countercurrent pattern through the packing structure. The right image provides a detailed view of an individual stage. The left section corresponds to the absorber column, while the central part represents the condenser located at the top of the stripper tower, where MEA solvent vapors are condensed and recirculated back into the tower via dedicated piping.

Figure 3 illustrates the carbon dioxide detection system. Gas from within the column stage is directed into a dedicated measurement pipeline and subsequently delivered to the gas analyzer (AT400) located at the top of the tower. Before entering the analyzer, the gas passes through a separator designed to prevent any entrained liquid from reaching the instrument. The control panel on the analyzer allows the operator to select specific sampling points for gas composition analysis. Additionally, due to the large size and high cost of the gas analyzer, simultaneous installation of multiple units at different locations is impractical. As a result, carbon dioxide concentration measurements are discrete rather than continuous, and it is not possible to obtain real-time data from all locations simultaneously.

The developed model is validated with experimental data collected from the PCC pilot plant. The measurements used for validation consist of temperature, pressure, liquid and gas flow rates, liquid level, pH and the previously mentioned

{CO}_{2}

concentrations. The 7 datasets used in this study contain around 1500 samples (data points) in total. After removing outliers, the data are divided into seven sets corresponding to different operating conditions, determined by the average gas–liquid volumetric ratio of inlet flue gas and the input

{CO}_{2}

volumetric concentration (%), as shown in Figure 4.

As noted earlier, only a single gas analyzer is available for

{CO}_{2}

concentration measurement. The resulting measurements are presented as scatter plots in Figure 5 and Figure 6. Data are collected sequentially from sampling point 5 to point 1, as illustrated in Figure 1, with two to four samples acquired at each location. Because a single analyzer is shared across multiple sampling points, measurements are not obtained simultaneously, and the

{CO}_{2}

readings at each location are therefore temporally discontinuous. However, effective control and optimization of the overall system require complete, real-time information on the

{CO}_{2}

concentration profile. To address this limitation, the proposed method described in the next section was applied to the carbon capture system to provide reliable real-time estimates of the

{CO}_{2}

concentration profile.

3. Methodology

In an earlier work, the estimation objective described in the previous section was tackled by [32], using a combination of mechanistic and data-driven models but without considering any direct feedback from measurements in online operation, at the same time, the raw data used to train the LSTM was obtained by performing linear interpolation on scattered data points. As a result, the original data inherently contains distortion issues. In this work, our aim is to incorporate a state estimator to this approach, which can be seen as an advancement of the mechanistic model component to provide optimal data for the training of the machine learning model.

The proposed framework integrates a first-principles mechanistic model, a machine learning model, and a Moving Horizon Estimation (MHE) scheme to achieve accurate and robust state estimation under realistic operating conditions. The MHE framework utilizes real measurement data collected from the pilot plant to generate optimal continuous estimates of

{CO}_{2}

concentrations, effectively addressing the challenges posed by unknown disturbances, process noise, and missing or unreliable sensor measurements. These high-fidelity state estimates are then used as supervisory target values for training a robust hybrid data-driven model, implemented as a Stacked denoising Autoencoder–Gated Recurrent Unit (SDAE-GRU) network. The choice of employing a hybrid model rather than directly using MHE with the mechanistic model for real-time prediction stems from computational limitations. Since the MHE must solve a large number of nonlinear ODEs, its execution speed is only marginally faster than the operation of the absorption column. Moreover, the time required for system sampling further hinders the feasibility of running the MHE in synchrony with the system. Therefore, in this study, the simultaneous application of MHE may be marginally feasible; however, to enhance the generalization capability of the proposed approach, we adopt a periodic use of MHE to generate optimal estimates, which are then employed to periodically update the error matrix of the hybrid model. By learning from the optimal estimates provided by the MHE, the SDAE-GRU model is able to capture both the underlying process dynamics and the temporal correlations present in the measurement data.

To further improve prediction accuracy and reliability, the open-loop predictions obtained from the trained data-driven model and the standalone mechanistic model are fused through a covariance-weighted blending strategy. This fusion combines the strengths of both modeling approaches while mitigating their individual weaknesses, with the blending weights determined based on the respective deviation covariance matrices of each model’s predictions. Regarding the update of the covariance matrices for both the mechanistic model and the machine learning model, the MHE cannot operate in real time with the hybrid model due to the large number of nonlinear ODEs involved. Therefore, in our test dataset, we enlarged the data window and updated the covariance matrices once every dataset interval. This provided the MHE with sufficient computation time. In practical operation, each update introduces a delay of approximately 20–30 min. However, since the system itself is highly periodic, such a delay does not lead to any significant distortion. The resulting hybrid estimation framework provides a computationally efficient and scalable solution for real-time

{CO}_{2}

concentration monitoring in industrial carbon capture processes, offering enhanced adaptability to changing operating conditions and improved robustness against measurement uncertainties.

3.1. Mechanistic Model

The proposed mechanistic model is a simplified well-mixed approximation of the absorption process derived for real-time calculations. The absorption process is assumed to happen only within compartments including the five packing stages of the absorption column. In Figure 7, the whole column is built as 5 continuous stirred tank reactors (CSTR) in our model, there are eight measurements for the input of the whole model, the four correlated to the MEA solvent including the mass flow rate, MEA mole concentration, MEA_

{CO}_{2}

concentration, and temperature of the solvent are given in the top, and the four correlated to the flue gas including mass flow rate,

N_{2}

concentration,

{CO}_{2}

concentration, and gas temperature are given in the bottom. The model is based on mass and energy balances given by the Equations (1)–(10).

All five stages share the same equations, i = {1,2,3,4,5} is the number of different stages of the absorption column.

k

denotes the reaction rate coefficient,

R_{i}

denotes the reaction rate,

C

represents mole concentration of different components, and the subscripts

liq

,

gas

,

in

and

out

denote the liquid phase, gas phase, inlet and outlet, respectively,

C_{p}

stands for the heat capacity within the stage,

L_{v}

and

G_{v}

stand for the liquid and gas volume of a single stage.

The reaction mechanism is illustrated in Equations (1) and (2). The values of the parameters A and b (123,147 and 41,236.7 respectively) are optimized by using the optimization algorithm fmincon for constrained nonlinear minimization in MATLAB minimizing the prediction error for the column temperature.

R_{g}

is the gas constant. For the reaction rate equations, the impact brought by the reverse reaction in these conditions is very small compared with the forward reaction, thus the reverse reaction is not illustrated as an independent reaction rate equation, it is instead reflected in a smaller forward reaction rate constant.

k_{i} = A exp (- \frac{b}{R_{g} T_{l i q, h, i}})

(1)

R_{i} = k_{i} C_{M E A, i} C_{C O_{2}, i}

(2)

The ordinary differential equations that describe the energy balance are illustrated as Equations (3) and (4). In these equations, T denotes the temperature within the stage, and ΔH denotes the

{CO}_{2}

absorption heat. In this study,

Δ T_{i}

denotes the temperature change within the stage. The liquid and gas phases are considered as a single combined system; therefore, they are assumed to share the same temperature change.

Δ T_{i} = \frac{((F_{l i q, i n, i} T_{l i q, i n, i} - F_{l i q, o u t, i} T_{l i q, h, i}) c_{p, s o l} + (F_{g a s, i n, i} T_{g a s, i n, i} - F_{g a s, o u t, i} T_{g a s, h, i}) c_{p, g a s} + R_{i} (- Δ H) L_{v})}{c_{p, m i x} (L_{v} ρ_{L} + G_{v} ρ_{G})}

(3)

\frac{d T_{l i q, h, i}}{d t} = Δ T_{i}

(4)

\frac{d T_{g a s, h, i}}{d t} = Δ T_{i}

(5)

The ordinary differential equations that describe the mass balance are illustrated as Equations (6)–(10).

F

denotes the flow rate and

M

denotes the hold up of liquid and gas at each stage.

M_{C O_{2}}

denotes the molar mass of carbon dioxide.

\frac{d M_{l i q, i}}{d t} = F_{l i q, i n} - F_{l i q, o u t} + R_{i} \times L_{v} \times M_{C O_{2}}

(6)

\frac{d M_{g a s, i}}{d t} = F_{g a s, i n} - F_{g a s, o u t} - R_{i} \times L_{v} \times M_{C O_{2}}

(7)

\frac{d C_{N_{2}, i}}{d t} = \frac{(C_{N_{2}, g a s, i n} \times F_{g a s, i n} - C_{N_{2}, i} \times F_{g a s, o u t})}{G_{v}}

(8)

\frac{d C_{M E A, i}}{d t} = \frac{C_{M E A, l i q, i n} F_{l i q, i n} - R_{i} L_{v} - (F_{l i q, o u t} \times C_{M E A, i})}{L_{v}}

(9)

\frac{d C_{C O_{2}, i}}{d t} = \frac{C_{C O_{2}, g a s, i n} F_{g a s, i n} - R_{i} L_{v} - (F_{g a s, o u t} \times C_{C O_{2}, i})}{G_{v}}

(10)

3.2. Moving Horizon Estimation

In this subsection, the aforementioned mechanistic model is shown as the nonlinear process model in the following form:

\begin{matrix} x_{i} = F (x_{i - 1}, u_{i}) + w_{i}, \end{matrix}

(11)

\begin{matrix} y_{i} = H (x_{i}) + v_{i}, \end{matrix}

(12)

\begin{matrix} x_{i} \in X, \end{matrix}

(13)

\begin{matrix} w_{i} \sim N (0, Q), \end{matrix}

(14)

\begin{matrix} v_{i} \sim N (0, R), \\ x_{i} \in R^{n_{x}}, u_{i} \in R^{n_{u}}, w_{i} \in R^{n_{x}}, y_{i} \in R^{n_{y}}, v_{i} \in R^{n_{y}}, \\ F : R^{n_{x} \times n_{u} \times n_{w}} \to R^{n_{x}}, H : R^{n_{x} \times n_{v}} \to R^{n_{y}} \end{matrix}

(15)

where: (1) The index i denotes the discrete time instant. (2)

n_{x}

,

n_{u}

, and

n_{y}

represent the dimensions of the state, input, and measurement vectors, respectively. (3) The function F corresponds to the mechanistic model introduced in Section 3.1, model that projects the system state onto the observable space. Real-time temperature measurements obtained from the column are incorporated into the MHE framework to refine the internal state estimates, with particular emphasis on the

{CO}_{2}

concentration profiles. (4) The process disturbance

w_{i}

is modeled as a zero-mean multivariate gaussian variable with covariance Q, i.e.,

w_{i} \sim N (0, Q)

. Likewise, the measurement noise

v_{i}

follows a zero-mean gaussian distribution with covariance R,

v_{i} \sim N (0, R)

. The two noise sources are assumed to be statistically independent, and both Q and R are taken as diagonal matrices. (5) The admissible state set X encodes the physical feasibility constraints, including upper and lower bounds on the states.

In the MHE configuration designed for the absorber model described in Section 3.1, the process is assumed to operate under the nominal conditions. To avoid unrealistic variations arising solely from the optimization routine, the disturbance

w_{i}

is restricted to lie within the interval

[0, 1]

.

For the nonlinear system specified by Equations (11)–(15), the standard MHE formulation [19,20] is expressed as the optimization problem in (16), where the state trajectory and the associated disturbances over the estimation horizon are determined by minimizing the following objective function:

\begin{matrix} min_{{\hat{x}}_{k - N} \dots {\hat{x}}_{k - 1}} & (\sum_{i = k - N}^{k - 1} ∥ v_{i} ∥_{R^{- 1}}^{2} + \sum_{i = k - N + 1}^{k} ∥ w_{i} ∥_{Q^{- 1}}^{2} + φ_{k - N}) \\ s . t . \hat{x_{i}} = F ({\hat{x}}_{i - 1}, u_{i}) + w_{i}, i = k - N, \dots k - 1, \\ {\hat{y}}_{i} = H (\hat{x_{i}}) + v_{i} i = k - N, \dots k, \\ \hat{x_{i}} \in X, i = k - N, \dots k - 1 \\ w_{i} \in W, i = k - N, \dots k - 1 \\ v_{i} \in V, i = k - N, \dots k, \end{matrix}

(16)

where: (1)

{\hat{x}}_{i}

denotes the estimated system state at time i; (2)

{\hat{y}}_{i}

represents the corresponding measurement obtained from the plant, while

y_{i}

is the estimated output generated through the observation model; (3) N denotes the length of the estimation horizon, which is set to 20 following common practice; (4) The arrival cost term

φ_{k - N}

in Equation (17) incorporates information from system behavior prior to the current MHE window [27]. This term ensures that the estimator accounts for dynamics that occurred before the present horizon; (5)

{\hat{x}}_{k - N}

is the estimated initial state at the beginning of the horizon, whereas

x_{k - N}

is the a priori prediction obtained by propagating the mechanistic model together with the previously estimated state at time

k - N - 1

. The matrix

P_{k - N}

denotes the covariance of the approximated posterior distribution of the states at time

k - N

[33].

φ_{k - N} = {∥ {\hat{x}}_{k - N} - x_{k - N} ∥}_{P_{k - N}^{- 1}}^{2}

(17)

The last term of the objective function penalizes deviations in the initial state at the beginning of the estimation window. In our implementation, the covariance

P_{k - N}

is selected to be sufficiently small so that the reconstructed state trajectory remains smooth. The process disturbance covariance Q contributes to the second component of the cost function, whereas the measurement noise covariance R enters the observation-mismatch term. All three covariance matrices P, R, and Q are assumed to be diagonal.

In this work, the distributions—or suitable approximations of the distributions—for the process disturbances, measurement noise, and the arrival-cost uncertainty are presumed to be known in advance. This assumption is consistent with applications in which the system operates close to its nominal conditions, as is the case for the plant considered here. The explicit numerical values adopted for these covariance matrices within the state estimation framework are provided in Section 4.

With respect to the optimization variables in the MHE formulation, the initial state at the beginning of each estimation window appears in the arrival-cost term, where it represents the prior information entering the horizon. For the remaining terms of the objective function, the decision variables correspond to the sequence of states within the window, which evolve according to the system dynamics. In the measurement-mismatch term, the observation model is applied to these same in-window states. Thus, across all components of the cost function, the optimization variables consist of the state trajectory over the estimation horizon.

3.3. Machine Learning Framework

The raw data collected from the plant consist of 53 temperature transmitters, 11 pressure transmitters, 19 flow transmitters, 4 level transmitters, 2 pH analyzers, and the previously discussed gas concentration analyzer. To some extent, these variables are all related to the

{CO}_{2}

concentration profile in the absorption column. However, to prevent excessive input dimensionality and to reduce redundancy and noise while retaining the dominant process information, a SDAE is first employed to compress the data. Furthermore, since the

{CO}_{2}

concentration profile is a time-series, a GRU network is connected downstream of the SDAE to perform sequential prediction. The SDAE and GRU are trained in an end-to-end integrated manner to ensure model consistency and predictive accuracy.

3.3.1. Stacked Denoising Autoencoder

An autoencoder (AE) is an unsupervised neural architecture designed to learn a compact representation of the input data by reconstructing the original signal from a lower-dimensional embedding. The network consists of two parts: an encoder that transforms the input into a latent feature space, and a decoder that maps this latent representation back to the input domain. When used for dimensionality reduction, the latent layer typically contains fewer neurons than the input layer, forcing the model to extract the most informative structure of the data. The general form of the AE can be expressed as:

\{\begin{matrix} h = f_{1} (W_{1} \tilde{x} + b_{1}) \\ \hat{x} = f_{2} (W_{2} h + b_{2}) \end{matrix}

(18)

where

x = {[x_{1}, x_{2}, \dots, x_{n}]}^{T} \in R^{n}

represents the n-dimensional input vector, and

h = {[h_{1}, h_{2}, \dots, h_{m}]}^{T} \in R^{m}

denotes the m-dimensional latent representation. The noise-corrupted input is written as

\tilde{x} = {[{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{n}]}^{T}

, while the reconstructed output is expressed as

\hat{x} = {[{\hat{x}}_{1}, {\hat{x}}_{2}, \dots, {\hat{x}}_{n}]}^{T}

. The encoder uses parameters

W_{1} \in R^{m \times n}

and

b_{1} \in R^{m}

, and the decoder is parameterized by

W_{2} \in R^{n \times m}

and

b_{2} \in R^{n}

. The activation functions of the encoder and decoder are denoted by

f_{1} (\cdot)

and

f_{2} (\cdot)

, respectively. Typical activation functions include tanh, ReLU, and linear mappings. The full encoder is parameterized by

Θ_{E}

.

Denoising Autoencoders extend the conventional Autoencoder architecture and can be interpreted as a nonlinear generalization of Principal Component Analysis [34]. However, the feature extraction capacity of a single-layer DAE is inherently limited. To address this, the hidden representation from one DAE is used as the input to the next, forming a multi-layer architecture known as the Stacked Denoising Autoencoder [35]. Two implementation strategies are possible, depending on the availability of clean and noisy data. If both are available, the SDAE can be structured as a standard neural network comprising separate encoding and decoding stages. When only clean data are available, a noise injection layer can be employed to artificially corrupt the input using additive white Gaussian noise (AWGN) with zero mean and unit variance, as shown in Figure 8. This approach is motivated by the assumption that measurement noise in plant sensors follows a white, zero-mean Gaussian distribution, which, after normalization, has unit variance [36]. The SDAE and GRU are trained end to end, the parameters of the GRU will be specified in the relevant section later, while this section only presents the parameters related to the SDAE in Table 1.

3.3.2. Gated Recurrent Unit

Given that the measurements obtained from the pilot plant form time-series data, a Gated Recurrent Unit (GRU) network is adopted as the learning model in this study. The GRU is a streamlined variant of the Long Short-Term Memory (LSTM) architecture [37], offering comparable predictive capability while requiring considerably fewer computational resources [38]. This characteristic makes the GRU particularly suitable for real-time MHE applications, where lightweight models are preferred due to strict computational constraints.

Compared with more complex deep learning architectures, GRUs provide an effective compromise between representational power and computational efficiency, allowing them to capture temporal dependencies without excessive model size. This balance also motivates their use in our ongoing development of control-oriented frameworks, including model predictive control designs where real-time feasibility is essential. In a GRU cell, the hidden state is used to carry temporal information across time steps, and its internal dynamics are governed by two gating mechanisms: the reset gate and the update gate. The structure of a GRU unit is illustrated in Figure 9.

Calculations in the GRU cell are listed as follow, the reset gate

r_{t}

(sigmoid layer):

r_{t} = σ (U_{r} h_{t - 1} + W_{r} x_{t} + b_{r})

(19)

the update gate

z_{t}

(sigmoid layer):

z_{t} = σ (U_{z} h_{t - 1} + W_{z} x_{t} + b_{z})

(20)

the candidate hidden state

z_{t}

(tanh layer):

{\hat{h}}_{t} = t a n h (r_{t} ⊙ (U_{h} h_{t - 1}) + W_{h} x_{t} + b_{h})

(21)

hidden state:

h_{t} = z_{t} ⊙ h_{t} + (1 - z_{t}) ⊙ {\hat{h}}_{t - 1}

(22)

where

W_{r}

,

W_{z}

,

W_{h}

,

U_{r}

,

U_{z}

, and

U_{h}

denote the weight matrices, while

b_{r}

,

b_{z}

, and

b_{h}

are the corresponding bias vectors. The operator ⊙ represents the Hadamard (elementwise) product. The GRU models used for the drying and regeneration processes are first trained in Python using Keras/TensorFlow [39], with their hyperparameters tuned via Optuna [40]. After training, the resulting network is exported to MATLAB (2023a) using the Deep Learning Toolbox Converter [41].

The GRU architecture preserves temporal information by carrying past state information through its hidden states. Based on sequential measurements and previous observations, the model predicts the CO₂ concentration at each time step. The network architecture consists of an input layer followed by two stacked GRU layers. Dropout layers are inserted between these recurrent layers to mitigate overfitting by randomly deactivating a subset of neurons during training. The detailed configuration of the GRU model is summarized in Table 2.

3.4. Data Fusion Using the Hybrid Model

Data assimilation (DA) provides a framework for combining model-based predictions with process measurements to obtain an improved estimate of the system state [42]. In this study, we employ an approach similar to the update stage of a Kalman-type estimator, in which information from both the model outputs and the measured values is merged. The fusion is achieved by weighting the contributions of the mechanistic model and the machine learning model according to their respective error covariance matrices.

y = y_{mec} + B {(R + B)}^{- 1} (y_{GRU} - y_{mec})

(23)

B, R \in R^{5 \times 5}

represent the covariance matrices associated with the mechanistic model and the SDAE–GRU model, respectively, and quantify the uncertainty in each model’s predicted concentration vector

y

. The subscripts “mec’’ and “GRU’’ indicate the outputs of the mechanistic and data-driven components. In dynamic systems, the predictive accuracy of these models may fluctuate over time, implying that their error covariances are inherently time-varying. Because the available process data in this study are limited,

B

and

R

are treated as constant matrices rather than time-dependent quantities. Their values are determined by comparing the model-generated predictions with the optimal MHE-based estimates obtained from historical data.

\begin{matrix} R & = Cov (y_{GRU} - y_{MHE}), \\ B & = Cov (y_{mec} - y_{MHE}) \end{matrix}

(24)

The Gaspari–Cohn function is adopted as the covariance localization operator to suppress artificial long-range correlations in the error statistics. In practice, measurements at one spatial location typically exhibit meaningful correlation only with nearby points, while correlations with points farther upstream or downstream are negligible. The characteristic correlation length L specifies the radius over which adjacent measurements influence each other. The localized correlation coefficient, expressed as a function of the normalized distance

ρ

, is given by

G (ρ) = \{\begin{matrix} 1 - \frac{5}{3} ρ^{2} + \frac{5}{8} ρ^{3} + \frac{1}{2} ρ^{4} - \frac{1}{4} ρ^{5} & if 0 \leq ρ \leq 1 \\ 4 - 5 ρ + \frac{5}{3} ρ^{2} + \frac{5}{8} ρ^{3} - \frac{1}{2} ρ^{4} + \frac{1}{12} ρ^{5} - \frac{2}{3 ρ} & if 1 < ρ \leq 2 \\ 0 & if ρ \geq 2 \end{matrix}

(25)

with

\begin{matrix} Localized B & : B_{i j} \cdot G (ρ), B_{i j} \in B \\ Localized R & : R_{i j} \cdot G (ρ), R_{i j} \in R, ρ = \frac{| i - j |}{L} \end{matrix}

where

i, j

indicate the indices of the entries

B_{i j}

and

R_{i j}

, and L denotes the correlation length, which is set to 2 in this work. Although the matrices

B

and

R

are not updated in real time, incorporating such time-varying behavior into the framework would be straightforward. Let

N_{cov}

represent the number of data records used to compute these covariance matrices, and

f_{mec}

and

f_{GRU}

denote the outputs of the mechanistic and GRU-based models, respectively. A summary of the SDAE-based dimensionality reduction procedure is given in Algorithm 1. Because direct linear interpolation of CO₂ readings between the present measurement and future measurements is not feasible, the most recent estimates of

B

and

R

used in Algorithm 1 are obtained from earlier cycles of process data.

Algorithm 1 Pesudo code of the whole control stragety

1:: Inputs: Initial records sequence $[x (t_{0}), \dots, x (t_{N_{seq}})]$
2:: Encode original records sequence: $[\tilde{x} (t_{0}), \dots, \tilde{x} (t_{N_{seq}})] \leftarrow E ([x (t_{0}), \dots, x (t_{N_{seq}})], Θ_{E})$
3:: Initialize matrices $B$ and $R$ : $B (t_{N_{current}}) \leftarrow I_{5 \times 5}$ , $R (t_{N_{current}}) \leftarrow I_{5 \times 5}$
4:: Initialize current time step: $N_{current} \leftarrow N_{seq} + 43$
5:: Initialize update window size: $N_{window}$
6:: Initialize the last update time step: $N_{record} \leftarrow N_{seq}$
7:: Record mechanistic model prediction: $y_{mec} (t_{N_{current}}) \leftarrow f_{mec} ([x (t_{0}), \dots, x (t_{N_{current}})])$
8:: Record data-driven model prediction: $y_{GRU} (t_{N_{current}}) \leftarrow f_{GRU} ([\tilde{x} (t_{0}), \dots, \tilde{x} (t_{N_{current}})])$
9:: Record MHE estimation: $y_{MHE} (t_{N_{current}}) \leftarrow f_{MHE} ([\tilde{x} (t_{0}), \dots, \tilde{x} (t_{N_{current}})])$
10:: if $N_{current} - N_{record} \geq N_{window}$ then
11:: Compute mechanistic model deviation:
12:: $[y_{mec} (T) - y_{MHE} (T)], \forall T \in [t_{N_{record}}, t_{N_{current}}]$
13:: Compute GRU model deviation:
14:: $[y_{GRU} (T) - y_{MHE} (T)], \forall T \in [t_{N_{record}}, t_{N_{current}}]$
15:: Update mechanistic model covariance matrix:
16:: $B (t_{N_{current}}) \leftarrow Cov ([y_{mec} (T) - y_{MHE} (T)]) \cdot G (ρ)$
17:: Update GRU model covariance matrix:
18:: $R (t_{N_{current}}) \leftarrow Cov ([y_{GRU} (T) - y_{MHE} (T)]) \cdot G (ρ)$
19:: $N_{record} \leftarrow N_{current}$
20:: end if
21:: $y (t_{N_{current}}) \leftarrow y_{mec} (t_{N_{current}}) + B (t_{N_{current}}) {(R (t_{N_{current}}) + B (t_{N_{current}}))}^{- 1} (y_{GRU} (t_{N_{current}}) - y_{mec} (t_{N_{current}})$ )
22:: $N_{current} \leftarrow N_{current} + 43$
23:: Output: $y (t_{N_{current}})$ Corrected prediction.

4. Implementation and Results

For the model development, training and parameter optimization of the mechanistic model, MHE, SDAE-GRU, and the hybrid model, datasets 1–5 were used for model training and parameter tuning, while datasets 6–7 were used for validation. The whole framework is implemented as Figure 10.

4.1. Mechanistic Model Validation

For the validation of the mechanistic model, the historical measurements from dataset 5 are used. First, the column temperature profiles at sampling points 1 to 5, which provide continuous and complete measurements, were used for the model validation. As illustrated in Figure 11, the blue line denotes the real measurement, and the red line denotes the predictions from the standalone mechanistic model. The overall prediction is reasonably accurate; however, due to the lack of an accurate initial estimate for the internal system states, the initial states were optimized within a reasonable range. As a result, the model exhibits a certain degree of deviation at the beginning of the prediction period.

To correct the deviation caused by the model inaccuracies and measurement noise, methods such as SDAE-GRU, MHE, and the hybrid model are employed.

4.2. State Estimation Results

The real plant provides measurements at 43-s intervals, and therefore the mechanistic model and the MHE estimator are both operated using this same sampling period. The estimation horizon is selected as

N = 20

time steps to remain consistent with the sequence length used in the GRU model. The nonlinear optimization subproblems that arise in the two moving-horizon estimators are solved using IPOPT through its CASADI interface [43], which offers computation times suitable for online implementation of the state estimation framework.

Due to physical constraints of the system, several key states in the carbon capture process are restricted within predefined bounds. This section demonstrates the performance of the proposed estimation approach under these assumptions. The covariance matrices Q, R, and P are all chosen to be diagonal. The process disturbance covariance, measurement noise covariance, and state error covariance for the MHE are specified as Q = diag([1 0.5 0.0001 … 0.5 0.0001 1 0.5 0.0001]), R = diag([0.1 0.1 0.1 0.001 0.1]) and P = diag([1 1 1 1 1 … 1 1 1 1]) ×

10^{- 7}

.

Based on the mechanistic model presented in Section 3.1, a Moving Horizon Estimation (MHE) framework was developed to refine the estimation of carbon dioxide concentration using temperature measurement data. The performance of the proposed MHE approach was validated using Dataset 6. At the initial stage of operation, a large amount of

{CO}_{2}

enters from the bottom of the column, resulting in an initial increase in

{CO}_{2}

concentration. Subsequently, the concentration decreases and eventually stabilizes, exhibiting a relatively steady trend over time. The results present the estimated

{CO}_{2}

concentration in the bottom stage (stage 1). The result is illustrated in Figure 12. It can be observed that, after incorporating the measurement corrections, the model’s predictions of carbon dioxide concentration align more closely with the actual measured data points. The MHE approach provides more accurate estimation results than the mechanistic model alone. Therefore, in the subsequent SDAE-GRU framework, the optimal estimations generated by the MHE will be used as target values for training.

As illustrated in Figure 13, the

{CO}_{2}

concentration profiles of dataset 1–5 are presented, during plant operation, the amount of MEA solvent is significantly greater than the amount of

{CO}_{2}

present; consequently, more than 95% of the

{CO}_{2}

is absorbed in stage 1. As a result, the

{CO}_{2}

volumetric concentrations in the upper stages (stages 2–5) are nearly zero, and their corresponding curves are essentially flat lines close to the x-axis, which do not contribute to validating the predictive capability of the model.

4.3. Prediction Result from the SDAE-GRU

As illustrated in Figure 14, the data from the dataset 7 were used to evaluate the predictive performance of the GRU model. In the figure, the blue line represents the optimal estimation obtained by MHE, while the red line shows the prediction results of the GRU model. As illustrated, the GRU model is able to predict the

{CO}_{2}

concentration with good accuracy.

4.4. Hybrid-Model Prediction Results for CO₂ Concentration

As illustrated in Figure 15, the previously mentioned covariance-weighted blending and the Gaspari–Cohn method were employed to fuse the data. After fusion, it can be observed that the hybrid model provides more accurate estimates of

{CO}_{2}

concentration. Additionally, it can be seen that both the GRU and the mechanistic model, when used independently, are susceptible to issues such as over- or underestimation and excessively rapid fluctuations, due to input data noise and inherent limitations such as parameter inaccuracies in the mechanistic equations. The hybrid approach effectively integrates the strengths of both models, mitigating their individual weaknesses and enhancing overall prediction performance. The Mean Absolute Percentage Error (MAPE) was used to quantify the model performance. The detailed results of different models of testing data sets are provided in the Table 3. The error distributions remain relatively consistent across different datasets, as the overall operating conditions of the absorber are stable. When the flue gas initially enters the absorber, the CO₂ concentration rises rapidly to a peak and then gradually decreases and stabilizes as the absorbent is introduced. The obtained results demonstrate the advantages of the proposed state estimation framework. The Wilcoxon signed-rank test is a non-parametric statistical method that evaluates whether the paired differences between two sets of errors are symmetrically distributed around zero, thereby assessing whether one model consistently outperforms another without assuming any specific error distribution [44]. The Wilcoxon signed-rank test (p < 0.05) indicates that the proposed method significantly outperforms both the standalone mechanistic model and the GRU-based model. In addition to point-wise error metrics, statistical analyses were conducted to assess the robustness of the results. The Wilcoxon signed-rank test (p < 0.05) indicates that the proposed method significantly outperforms the standalone mechanistic model and the GRU-based model. Furthermore, bootstrap analysis of the prediction errors shows that the obtained MAE and RMSE values remain stable across different datasets, confirming the robustness of the proposed framework.

5. Discussion

The proposed hybrid mechanistic–MHE–machine learning framework was evaluated using multiple datasets collected from the carbon capture pilot plant. Figure 12, Figure 13 and Figure 14 present representative comparisons between the measured

{CO}_{2}

concentration profiles and the predictions generated by the proposed model. Quantitative performance metrics, including RMSE, MAE, and MAPE, are summarized in Table 3.

Across all datasets, the proposed framework achieves consistently lower prediction errors compared with the baseline interpolation-based approach. In particular, the reconstructed concentration trajectories exhibit smoother temporal evolution and improved agreement with measured trends, indicating that the estimator effectively mitigates noise and discontinuities in the raw analyzer data.

The observed improvement in prediction accuracy is especially pronounced during transient operating conditions, where

{CO}_{2}

concentration measurements are sparse and intermittently available. Under such conditions, linear interpolation fails to capture the true dynamic behavior of the absorber, resulting in physically inconsistent concentration trajectories.

In contrast, the incorporation of Moving Horizon Estimation (MHE) enables the reconstruction of concentration profiles that remain consistent with the underlying process dynamics and physical constraints. By explicitly accounting for system dynamics and measurement uncertainty, the proposed framework provides more realistic state trajectories, which serve as higher-quality training data for the downstream machine learning models. This explains the enhanced robustness and stability observed in the prediction results.

Compared with prior hybrid soft sensing approaches that rely on simple linear interpolation to fill missing concentration measurements, the proposed framework introduces a fundamentally different strategy for data reconstruction. Interpolation-based methods lack physical justification and are unable to represent the actual evolution of

{CO}_{2}

concentration under varying operating conditions, particularly during rapid transients.

By contrast, the proposed MHE-based reconstruction explicitly enforces physical consistency through a mechanistic model and external measurements. This difference is reflected in the improved prediction accuracy and reduced sensitivity to measurement sparsity observed in the results. The comparison highlights that the performance gain is not merely due to a more complex learning model, but rather to the improved physical fidelity of the training data.

The results also underline the critical role of MHE within the proposed framework. Without the MHE-based reconstruction step, the machine learning model would be trained directly on interpolated concentration trajectories that are physically unjustified. Previous studies have reported that such distorted training data can lead to degraded prediction accuracy and reduced generalization capability.

The present results suggest that MHE acts as an essential intermediate layer that bridges sparse measurements and data-driven learning, ensuring that the learning model operates on physically meaningful inputs. This role cannot be readily replaced by purely data-driven architectures, particularly in industrial settings where dense and high-precision measurements are difficult to obtain.

Despite the demonstrated advantages, several limitations of the proposed framework should be acknowledged. First, the accuracy of the reconstructed concentration profiles depends on the fidelity of the mechanistic model and the assumption that the system operates near nominal conditions. Significant model mismatch may reduce estimation accuracy. Second, the computational cost of the MHE increases with the length of the estimation horizon, which may limit real-time deployment in large-scale systems. While the covariance update procedure has been simplified to reduce computational burden, further optimization and parallelization strategies may be required for industrial-scale implementation.

6. Conclusions

This paper presents a state estimation framework for predicting the

{CO}_{2}

concentration in the absorption column of the carbon capture plant by integrating SDAE, GRU, mechanistic model and moving horizon estimation. Main conclusions are given below:

1.: A compact dynamic model of the PCC absorber was developed and validated using pilot-plant data across multiple operating conditions. The datasets were split into training and testing groups. The mechanistic model provides full ${CO}_{2}$ concentration profiles with an average prediction error of 6.79%.
2.: The MHE solution is developed based on the dynamic model. Physical constraints and a disturbance matrix are implemented for both input and output data to further improve the robustness of the system under complex operating conditions. By utilizing measured temperature for external correction, the MHE yields ${CO}_{2}$ concentration profiles with improved accuracy and robustness compared to using the mechanistic model alone. These optimal estimations will subsequently serve as the basis for constructing the SDAE–GRU model.
3.: A Stacked Denoising Autoencoder-Gated Recurrent Unit framework was developed using measured data from the plant as input and the ${CO}_{2}$ concentrations at five monitoring points as target outputs. The SDAE and GRU components were trained in an end-to-end pattern. Initially, the SDAE was employed to denoise and compress the input data, followed by the GRU for temporal prediction. The prediction error relative to the optimal estimates obtained from Moving Horizon Estimation (MHE) was approximately 7.86%.
4.: After developing the two models, their prediction results were fused using a covariance-weighted blending approach, incorporating the covariance matrices of the model prediction errors. Additionally, the Gaspari-Cohn weighting scheme was applied to the error covariance matrices based on spatial distances, in order to mitigate unnecessary interference from distant locations. The fused model achieved a prediction accuracy of approximately 3.79%.

The proposed method establishes a state estimation framework suitable for complex systems with missing data. Since Moving Horizon Estimation (MHE) is applied only during the training phase to provide supervisory signals, it does not affect the real-time performance of the deployed model. Therefore, the resulting state estimation framework is well-suited for subsequent control and further operational optimization of the system.

Author Contributions

S.C.: Writing—review & editing, Writing—original draft, Visualization, Validation, Supervision, Software, Methodology, Formal analysis. S.G.: Methodology, Data curation, Software, Methodology. M.M.: Writing—review & editing, Supervision, Project administration, Methodology, Formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that has been used is confidential.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Nomenclature

MHE	Moving horizon estimation
${CO}_{2}$	Carbon dioxide
PCC	Post-combustion carbon capture
DAE	Denoising autoencoder
SDAE	Stacked denoising autoencoder
GRU	Gated Recurrent Unit
GHGs	Greenhouse gas
MEA	Monoethanolamine
VLE	Vapor-liquid equilibrium
E-AHF	Extended adaptive hybrid functions
AI	Artificial intelligence
VC	Volumetric concentration
DTr	Decision tree regressor
SVMr	Support vector machine regressor
ANN	Artificial neural network
LSTM	Long Short-Term Memory
AE	Autoencoder
ENKF	Ensemble Kalman filtering
KF	Kalman filtering
CSTR	Continuous stirred tank reactors
AWGN	Additive white Gaussian noise
DA	Data assimilation
MAPE	Mean absolute percentage error
a,b,c	Reaction rate parameters
$k_{i}$	Reaction rate coefficient
$R_{i}$	Reaction rate
$C_{M E A, i}$	MEA concentration
$C_{C O_{2}, i}$	Carbon dioxide concentration
$Δ T_{i}$	Temperature change within a stage
$M_{l i q, i}$	Hold up of liquid phase
$M_{g a s, i}$	Hold up of gas phase
$Δ H$	${CO}_{2}$ absorption heat
$c_{p, s o l}$	Heat capacity of solvent (kJ/kg·°C)
$c_{p, g a s}$	Heat capacity of gas (kJ/kg·°C)
$T_{l i q, h, i}$	Temperature of liquid phase (°C)
$T_{g a s, i n, i}$	Temperature of gas phase (°C)
$L_{v}$	Liquid volume
$G_{v}$	Gas volume
i	Number of stages i = {1,2,3,4,5}

References

Koyama, K. 2019 Global Energy Situation Indicated by BP Statistics. 2020. Available online: https://scholar.google.com/scholar?hl=zh-CN&as_sdt=0%2C5&q=K.+Koyama%2C+2019+global+energy+situation+indicated+by+bp+statistics+%282020%29.&btnG= (accessed on 30 January 2026).
Fu, S.; Zou, J.; Zhang, X.; Qi, Y. Review on the latest conclusions of working group III contribution to the fifth assessment report of the intergovernmental panel on climate change. Chin. J. Urban Environ. Stud. 2015, 3, 1550005. [Google Scholar] [CrossRef]
Newell, R.; Raimi, D.; Villanueva, S.; Prest, B. Global energy outlook 2021: Pathways from Paris. Resour. Future 2021, 8, 39. [Google Scholar]
Heuberger, C.F.; Staffell, I.; Shah, N.; Mac Dowell, N. Quantifying the value of CCS for the future electricity system. Energy Environ. Sci. 2016, 9, 2497–2510. [Google Scholar] [CrossRef]
Luis, P. Use of monoethanolamine (MEA) for CO₂ capture in a global scenario: Consequences and alternatives. Desalination 2016, 380, 93–99. [Google Scholar] [CrossRef]
Kittel, J.; Idem, R.; Gelowitz, D.; Tontiwachwuthikul, P.; Parrain, G.; Bonneau, A. Corrosion in MEA units for CO₂ capture: Pilot plant studies. Energy Procedia 2009, 1, 791–797. [Google Scholar] [CrossRef]
Liu, T.; Tian, Z.; Chen, S.; Wang, K.; Harris, C.J. Deep Cascade Gradient RBF Networks With Output-Relevant Feature Extraction and Adaptation for Nonlinear and Nonstationary Processes. IEEE Trans. Cybern. 2022, 53, 4908–4922. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, H.; Chen, C.C.; Plaza, J.M.; Dugas, R.; Rochelle, G.T. Rate-based process modeling study of CO₂ capture with aqueous monoethanolamine solution. Ind. Eng. Chem. Res. 2009, 48, 9233–9246. [Google Scholar] [CrossRef]
Mahapatra, P.; Ma, J.; Ng, B.; Bhattacharyya, D.; Zitney, S.E.; Miller, D.C. Integrated dynamic modeling and advanced process control of carbon capture systems. Energy Procedia 2014, 63, 1354–1367. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, C.C. Modeling CO₂ absorption and desorption by aqueous monoethanolamine solution with Aspen rate-based model. Energy Procedia 2013, 37, 1584–1596. [Google Scholar] [CrossRef]
Wang, Q.; Zheng, C.; Wu, X.; Wang, M. Robust monitoring of solvent based carbon capture process using deep learning network based moving horizon estimation. Fuel 2022, 321, 124071. [Google Scholar] [CrossRef]
Hanzelik, P.P.; Kummer, A.; Ipkovich, Á.; Abonyi, J. Fusion and integrated correction of chemometrics and machine learning models based on data reconciliation. Comput. Aided Chem. Eng. 2023, 52, 1379–1384. [Google Scholar]
Morgan, J.C.; Chinen, A.S.; Omell, B.; Bhattacharyya, D.; Tong, C.; Miller, D.C. Thermodynamic modeling and uncertainty quantification of CO₂-loaded aqueous MEA solutions. Chem. Eng. Sci. 2017, 168, 309–324. [Google Scholar] [CrossRef]
Putta, K.R.; Svendsen, H.F.; Knuutila, H.K. CO₂ absorption into loaded aqueous MEA solutions: Impact of different model parameter correlations and thermodynamic models on the absorption rate model predictions. Chem. Eng. J. 2017, 327, 868–880. [Google Scholar] [CrossRef]
Xing, L.; Jiang, H.; Tian, X.; Yin, H.; Shi, W.; Yu, E.; Pinfield, V.J.; Xuan, J. Combining machine learning with multi-physics modelling for multi-objective optimisation and techno-economic analysis of electrochemical CO₂ reduction process. Carbon Capture Sci. Technol. 2023, 9, 100138. [Google Scholar] [CrossRef]
Di Caprio, U.; Wu, M.; Vermeire, F.; Van Gerven, T.; Hellinckx, P.; Waldherr, S.; Kayahan, E.; Leblebici, M.E. Predicting overall mass transfer coefficients of CO₂ capture into monoethanolamine in spray columns with hybrid machine learning. J. CO2 Util. 2023, 70, 102452. [Google Scholar] [CrossRef]
Tian, Z.; Gu, Y.; Bolat, P.; Zhang, Y.; Gao, W. Prediction and multi-objective optimization of pilot-scale carbon capture system based on multi-source monitoring information and novel data-driven model. Energy Convers. Manag. 2026, 350, 120937. [Google Scholar] [CrossRef]
Jiang, Y.; Mao, Z. A novel carbon emission monitoring method for power generation enterprises based on hybrid transformer model. Sci. Rep. 2025, 15, 2598. [Google Scholar] [CrossRef]
Nikoofard, A.; Johansen, T.A.; Molaei, A. Reservoir characterization in under-balanced drilling with nonlinear moving horizon estimation with manual and automatic control conditions. J. Pet. Sci. Eng. 2020, 192, 107248. [Google Scholar] [CrossRef]
Liu, S.; Yin, X.; Liu, J. State estimation of a carbon capture process through POD model reduction and neural network approximation. arXiv 2023, arXiv:2304.05514. [Google Scholar] [CrossRef]
Zhang, W.; Wang, Z.; Zou, C.; Drugge, L.; Nybacka, M. Advanced vehicle state monitoring: Evaluating moving horizon estimators and unscented Kalman filter. IEEE Trans. Veh. Technol. 2019, 68, 5430–5442. [Google Scholar] [CrossRef]
Andersson, L.E.; Scibilia, F.; Imsland, L. An estimation-forecast set-up for iceberg drift prediction. Cold Reg. Sci. Technol. 2016, 131, 88–107. [Google Scholar] [CrossRef]
Wang, C.; Zhao, W.; Luan, Z.; Gao, Q.; Deng, K. Decoupling control of vehicle chassis system based on neural network inverse system. Mech. Syst. Signal Process. 2018, 106, 176–197. [Google Scholar] [CrossRef]
Yuan, X.; Huang, B.; Wang, Y.; Yang, C.; Gui, W. Deep learning-based feature representation and its application for soft sensor modeling with variable-wise weighted SAE. IEEE Trans. Ind. Inform. 2018, 14, 3235–3243. [Google Scholar] [CrossRef]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
Chen, S.; Jiang, Q. Distributed Robust Process Monitoring Based on Optimized Denoising Autoencoder With Reinforcement Learning. IEEE Trans. Instrum. Meas. 2022, 71, 3503411. [Google Scholar] [CrossRef]
Erhan, D.; Courville, A.; Bengio, Y.; Vincent, P. Why does unsupervised pre-training help deep learning? In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 201–208. [Google Scholar]
Liu, P.; Zheng, P.; Chen, Z. Deep learning with stacked denoising auto-encoder for short-term electric load forecasting. Energies 2019, 12, 2445. [Google Scholar] [CrossRef]
Rai, A.; Shrivastava, A.; Jana, K.C. A robust auto encoder-gated recurrent unit (AE-GRU) based deep learning approach for short term solar power forecasting. Optik 2022, 252, 168515. [Google Scholar] [CrossRef]
Roh, S.; Jun, M.; Szunyogh, I.; Genton, M.G. Multivariate localization methods for ensemble Kalman filtering. Nonlinear Process. Geophys. 2015, 22, 723–735. [Google Scholar] [CrossRef]
Stanley, Z.; Grooms, I.; Kleiber, W. Multivariate localization functions for strongly coupled data assimilation in the bivariate Lorenz’96 system. Nonlinear Process. Geophys. Discuss. 2021, 28, 565–583. [Google Scholar] [CrossRef]
Zhuang, Y.; Liu, Y.; Ahmed, A.; Zhong, Z.; del Rio Chanona, E.A.; Hale, C.P.; Mercangöz, M. A hybrid data-driven and mechanistic model soft sensor for estimating CO₂ concentrations for a carbon capture pilot plant. Comput. Ind. 2022, 143, 103747. [Google Scholar] [CrossRef]
Liu, J. Moving horizon state estimation for nonlinear systems with bounded uncertainties. Chem. Eng. Sci. 2013, 93, 376–386. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, UK, 2016; Volume 1. [Google Scholar]
Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
Alex, J.; Benedetti, L.; Copp, J.; Gernaey, K.; Jeppsson, U.; Nopens, I.; Pons, M.N.; Rieger, L.; Rosen, C.; Steyer, J.; et al. Benchmark Simulation Model No. 1 (BSM1); IWA Publishing: London, UK, 2008. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org/ (accessed on 30 January 2026).
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. arXiv 2019, arXiv:1907.10902. [Google Scholar] [CrossRef]
Paluszek, M.; Thomas, S.; Paluszek, M.; Thomas, S. MATLAB machine learning toolboxes. In Practical MATLAB Deep Learning: A Project-Based Approach; Apress: New York, NY, USA, 2020; pp. 25–41. [Google Scholar]
Cheng, S.; Lucor, D.; Argaud, J.P. Observation data compression for variational assimilation of dynamical systems. J. Comput. Sci. 2021, 53, 101405. [Google Scholar] [CrossRef]
Andersson, J.A.E.; Gillis, J.; Horn, G.; Rawlings, J.B.; Diehl, M. CasADi—A software framework for nonlinear optimization and optimal control. Math. Program. Comput. 2019, 11, 1–36. [Google Scholar] [CrossRef]
Divine, G.; Norton, H.J.; Hunt, R.; Dienemann, J. A review of analysis and sample size calculation considerations for Wilcoxon tests. Anesth. Analg. 2013, 117, 699–710. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Simplified flowsheet of the post-combustion carbon capture (PCC) pilot plant, including the absorption–regeneration loop and the gas sampling points along the absorption column.

Figure 2. The overall structure (left) and the top stage of the PCC pilot plant (right).

Figure 3. Images of the CO₂ analyzer sampling system, highlighting the sampling locations and the measurement setup used for CO₂ concentration acquisition.

Figure 4. Illustration of the datasets covering different operating regions represented over gas-liquid volumetric ratio and inlet

{CO}_{2}

concentration.

Figure 4. Illustration of the datasets covering different operating regions represented over gas-liquid volumetric ratio and inlet

{CO}_{2}

concentration.

Figure 5. A snapshot of the CO₂ concentration measurement record, illustrating the intermittent and discontinuous nature of the analyzer data and the selected zoomed-in segment.

Figure 6. A plot of two CO₂ analyzer measurement cycles, illustrating the discrete and intermittent nature of the concentration measurements over time.

Figure 7. The mechanistic model is divided into five stages which stands for the five packing structures in the absorber column, flue gas flows from the bottom (right side) and MEA solution flows from the top (left side).

Figure 8. The architecture of the SDAE-GRU model.

Figure 9. The model structure of a GRU cell.

Figure 10. Overview of theproposed hybrid methodology for CO₂ concentration prediction, illustrating the offline training stage and the online deployment framework.

Figure 11. The comparison of the mechanistic model prediction and the real measurement of the temperature profiles at 1–5 sampling points of dataset 6.

Figure 12. Comparison of CO₂ concentration profiles at the bottom sampling layer of Dataset 6, including mechanistic model prediction, MHE estimation, and real measurement points.

Figure 13. Comparison of CO₂ concentration profiles at all five packing layers of Dataset 6, including mechanistic model prediction, MHE estimation, and real measurements.

Figure 14. Comparison of CO₂ concentration profiles at the bottom sampling layer of Dataset 7 between the GRU prediction and the MHE estimation.

Figure 15. Comparison of CO₂ concentration profiles of Dataset 7, including mechanistic model prediction, fused estimation results, GRU prediction, and real measurements.

Table 1. The detailed setup of the SDAE model.

Parameters	Values
Input shape	90
Neurons in the first layer	32
Neurons in the second layer	16
Latent space dimension	8
Dropout rate	0.1075
MaxEpochs	150
Mini batch size	32
Learning Rate	0.0006785
Optimization algorithm	Adam
Training dataset	Dataset 1–5
Testing dataset	Dataset 6–7
Input features	The measurement data from all sensors

Table 2. The detailed setup of the GRU model.

Parameters	Values
Input shape	10
Sequence length	20
Number of neurons	44
Dropout rate	0.1075
Epochs	150
Mini batch size	32
Learning rate	0.0006785
Optimization algorithm	Adam
Training dataset	Dataset 1–5
Testing dataset	Dataset 6–7
Input features	The compressed features from SDAE
Output features	${CO}_{2}$ concentration at 1–5 sampling points

Table 3. The model’s performance.

Model	Dataset	MAPE	MAE	RMSE
Mechanistic model	all	6.79%	0.0041	0.0043
GRU model	all	7.86%	0.005	0.0054
Hybrid model	all	3.79%	0.00224	0.003
Mechanistic model	6	6.64%	0.0039	0.0043
GRU model	6	8.52%	0.0057	0.0062
Hybrid model	6	4.3%	0.00267	0.0032
Mechanistic model	7	6.94%	0.0046	0.0052
GRU model	7	7.2%	0.0047	0.0052
Hybrid model	7	3.28%	0.00181	0.0021

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chai, S.; Guo, S.; Mercangöz, M. Hybrid Data-Driven and Mechanistic CO₂ Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber. Processes 2026, 14, 916. https://doi.org/10.3390/pr14060916

AMA Style

Chai S, Guo S, Mercangöz M. Hybrid Data-Driven and Mechanistic CO₂ Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber. Processes. 2026; 14(6):916. https://doi.org/10.3390/pr14060916

Chicago/Turabian Style

Chai, Sida, Siyu Guo, and Mehmet Mercangöz. 2026. "Hybrid Data-Driven and Mechanistic CO₂ Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber" Processes 14, no. 6: 916. https://doi.org/10.3390/pr14060916

APA Style

Chai, S., Guo, S., & Mercangöz, M. (2026). Hybrid Data-Driven and Mechanistic CO₂ Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber. Processes, 14(6), 916. https://doi.org/10.3390/pr14060916

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Data-Driven and Mechanistic CO₂ Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber

Abstract

1. Introduction

1.1. Related Work

1.2. Contributions of the Paper

1.3. Organization of the Paper

2. System Under Study

3. Methodology

3.1. Mechanistic Model

3.2. Moving Horizon Estimation

3.3. Machine Learning Framework

3.3.1. Stacked Denoising Autoencoder

3.3.2. Gated Recurrent Unit

3.4. Data Fusion Using the Hybrid Model

4. Implementation and Results

4.1. Mechanistic Model Validation

4.2. State Estimation Results

4.3. Prediction Result from the SDAE-GRU

4.4. Hybrid-Model Prediction Results for CO₂ Concentration

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Hybrid Data-Driven and Mechanistic CO2 Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber

Abstract

1. Introduction

1.1. Related Work

1.2. Contributions of the Paper

1.3. Organization of the Paper

2. System Under Study

3. Methodology

3.1. Mechanistic Model

3.2. Moving Horizon Estimation

3.3. Machine Learning Framework

3.3.1. Stacked Denoising Autoencoder

3.3.2. Gated Recurrent Unit

3.4. Data Fusion Using the Hybrid Model

4. Implementation and Results

4.1. Mechanistic Model Validation

4.2. State Estimation Results

4.3. Prediction Result from the SDAE-GRU

4.4. Hybrid-Model Prediction Results for CO2 Concentration

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Hybrid Data-Driven and Mechanistic CO₂ Soft Sensor with MHE-Imputed Labels and Covariance-Weighted Fusion in a Pilot-Scale Absorber

4.4. Hybrid-Model Prediction Results for CO₂ Concentration