Virtual Sensing of Key Variables in the Hydrogen Production Process: A Comparative Study of Data-Driven Models

Hydrogen is an ideal energy carrier manufactured mainly by the natural gas steam reforming hydrogen production process. The concentrations of CH4, CO, CO2, and H2 in this process are key variables related to product quality, which thus need to be controlled accurately in real-time. However, conventional measurement methods for these concentrations suffer from significant delays or huge acquisition and upkeep costs. Virtual sensors effectively compensate for these shortcomings. Unfortunately, previously developed virtual sensors have not fully considered the complex characteristics of the hydrogen production process. Therefore, a virtual sensor model, called “moving window-based dynamic variational Bayesian principal component analysis (MW-DVBPCA)” is developed for key gas concentration estimation. The MW-DVBPCA considers complicated characteristics of the hydrogen production process, involving dynamics, time variations, and transportation delays. Specifically, the dynamics are modeled by the finite impulse response paradigm, the transportation delays are automatically determined using the differential evolution algorithm, and the time variations are captured by the moving window method. Moreover, a comparative study of data-driven virtual sensors is carried out, which is sporadically discussed in the literature. Meanwhile, the performance of the developed MW-DVBPCA is verified by the real-life natural gas steam reforming hydrogen production process.


Introduction
Hydrogen is an ideal energy storage medium with zero carbon emissions, essential in energy systems' low-carbon transformation [1].Currently, over 90% of the world's large-scale industrial hydrogen production process is the fossil energy reforming process, of which more than 50% is the natural gas steam reforming process [2,3].The natural gas steam reforming hydrogen production process consists of feedstock purification, steam reforming, medium temperature conversion, and pressure swing adsorption.In this process, CH 4 , CO, CO 2 , and H 2 are primary process gases, and the concentrations of these gases are key variables (KVs) related to the product quality.Infrequent and inaccurate measurements of these KVs risk substantial production losses, poor control performance, and even hidden dangers to safety.Consequently, real-time measurements of these KVs are essential and desirable.
Currently, measurements of the hydrogen production process' KVs are classified into offline laboratory analyses, hardware sensors, and virtual sensors.The offline laboratory analyses give accurate measurements of the KVs, but result in significant measurement delays [4].The hardware sensors measure the KVs in real-time, but need colossal investment and maintenance costs [5,6].Virtual sensors are actually predictive mathematical models which use explanatory variables (EVs, i.e., easily measurable variables like pressure and flow rate) as inputs and estimates of the KVs as outputs, having the benefits of no measurement delays and low costs [7,8].Therefore, the virtual sensor effectively compensates for the shortcomings of the offline laboratory analysis and hardware sensor [9,10].As a result, virtual sensors have been intensively studied and widely used in the hydrogen production process.
Virtual sensors can generally be grouped into first-principle-based and data-driven virtual sensors.The first-principle-based virtual sensor is established by analyzing and explaining the physicochemical mechanisms of the hydrogen production process.For example, Yang et al. established a predictive model of ammonia pyrolysis decomposition rate through theoretical derivation for the reaction in the hydrogen production process [11].Huang et al. developed a mathematical model between the EVs and the conversion rate of methane by coupling conduction, convection, thermal radiation, and chemical reaction kinetics [12].Zhou et al. simulated the chemical looping hydrogen production process by analyzing the process mechanisms [13].However, due to the complex reaction dynamics, precise first-principle-based models of the hydrogen production process are challenging to obtain.Usually, laboratory-scale studies or ideal steady-state analyses are the main applications of these first-principle-based virtual sensors.On the contrary, a data-driven virtual sensor is constructed by process-measured data, and does not rely on accurate mechanisms [14,15].Therefore, it is closer to reality, and better describes the actual operations of the hydrogen production process [16,17].As a result, data-driven virtual sensors are widely applied in the hydrogen production process.For example, Tong et al. built the multi-layer perceptron (MLP) neural network to estimate the CH 4 concentration in the hydrogen production process [18].Zamaniyan et al. employed the MLP model to predict the concentration of H 2 and CO of the hydrogen production process [19].Ögren et al. established deep neural networks (DNNs) to estimate the concentration of CH 4 , CO, CO 2 , and H 2 , which have large network layers [20].Considering the challenges of artificial neural network models in model selection and parameter adjustment with limited data, small-data-oriented models have been applied to the hydrogen production process.Fang et al. used a support vector machine (SVM) to predict the hydrogen yield [21].Zhao et al. proposed a predictive model based on the least squares SVM (LSSVM) employed in wind power hydrogen production [22].
Despite these achievements on virtual sensors for the KVs, the hydrogen production process is usually complicated because of practical operations.Firstly, due to the feedstock variations and external disturbances, the hydrogen production process shows strong dynamics [23].Secondly, the hydrogen production process involves many series-wound devices (e.g., pre-reformers and pre-heaters), which requires significant time to deliver the energies and feedstocks to the reformer [24].That means the EVs and the KVs are recorded simultaneously and are mismatched (i.e., there are transportation delays between the EVs and the KVs).Thirdly, the hydrogen production process exhibits time-varying properties because of the process drifts caused by mechanical abrasions, catalyst deactivation, or even climatic variations [25].
The above complex properties of the hydrogen production process make it challenging to develop high-precision virtual sensors for the KVs.As far as we know, the existing virtual sensors developed for the hydrogen production process do not account for these complicated characteristics.Moreover, the hydrogen production process has multiple KVs to estimate, and the existing virtual sensors constructed multiple independent single-output virtual sensors (one for each KV), ignoring the correlations between the KVs.To this end, we first develop a multi-output virtual sensor model called "dynamic variational Bayesian principal component analysis (DVBPCA)" for real-time prediction of the KVs in the hydrogen production process.As a further contribution, a moving window-based DVBPCA (MW-DVBPCA) is developed to improve estimation performance while considering the time variations of the hydrogen production process.The main contributions we make in the article are organized as follows.
(1) The developed DVBPCA considers both process dynamics and transportation delays of energies and materials.Concretely, the finite impulse response (FIR) method is employed to model the dynamics of the hydrogen production process.And the transportation delays related to the EVs are automatically determined by differential evolution (DE).Moreover, the DVBPCA is able to make full use of the correlations between KVs for performance enhancement.(2) The moving window (MW) approach is employed for updating the DVBPCA with the latest online process information, which effectively captures the time-varying characteristics of the hydrogen production process in real-time.
(3) A comparative study of data-driven virtual sensors is implemented for the hydrogen production process, which is sparsely mentioned in the predecessors' research.Furthermore, the performance of the developed MW-DVBPCA is verified by the real-life natural gas steam reforming hydrogen production process.
This article is structured as follows.The variational inference (VI) is introduced in Section 2. Section 3 describes the developed DVBPCA and MW-DVBPCA in detail.In Section 4, the performance of the DVBPCA and MW-DVBPCA is evaluated using the operational data from the distributed control system (DCS) database of a real-life hydrogen production process.Finally, Section 5 concludes the article.

Variational Inference
The VI and Monte Carlo-Markov Chain are common methods to study posterior distributions over RVs in probabilistic models [26].And the VI was used to learn the developed DVBPCA, considering its advantages in convergence diagnosis and computational efficiency.
Denote the random variables (RVs) with unknown posterior distribution as Θ.The equation shown as Equation (1) decomposes the log of model evidence, as follows: where KL(q p) means the Kullback-Leibler (KL) divergence between q(Θ) and p(Θ | D), D is training dataset, q(Θ) represents any form of probability distribution over Θ, p(Θ | D) represents the actual posterior distribution over Θ, and L(q) means the variational evidence lower bound (ELBO) of the log of model evidence ln P(D) as KL(q p) ≥ 0. Therefore, the posterior distributions over Θ can be found by maximizing the ELBO L(q) as shown in Equation (2), i.e., where q * (Θ) represents the variational solution of q(Θ).
Assume q(Θ) can factorize into the product ∏ J j=1 q j θ j , where Θ = {θ 1 , θ 2 , • • • , θ J }.The general rule to find the variational distribution q j θ j is obtained as Equation (3): where E q(θ) [ f (θ)] is the expectation of a function f (θ) of the RV θ with respect to the distribution q(θ) over θ, θ i =j means factors in Θ except θ j , and C means constant terms.

Time-Delayed Moving Average Model
Due to the flexibility of combining various regression models, the FIR paradigm is generally employed for capturing dynamics.Moreover, the KVs are measured at a lower frequency than the EVs in the hydrogen production process, which implies that the model structure based on autoregression from the FIR family is unsuitable.More importantly, as mentioned earlier, the transportation delays between the EVs and the KVs are unknown, and require consideration.Based on these considerations, a time-delayed moving average (MA) model structure is embedded in the virtual sensor, which is given by where y(n) ∈ R K y are the n-th measurements of the K y -dimensional KVs, x i (n) is n-th measurement of the i-th EV, S is the number of the EVs, l i and p i refer to the transportation delay and the order of x i , respectively, and both are non-negative integers, and f (•) means the mathematical relationship between the EVs and KVs (but there may exist strong colinearities between the EVs of the model

Dynamic Variational Bayesian Principal Component Analysis
Variational Bayesian principal component analysis (VBPCA) is a widely used model to deal with the colinearities between the EVs, owing to its advantage in the automatic determination of appropriate model dimensionality [27].Therefore, it is ideally suited as the model f (•).In this article, name the VBPCA combined with the model structure shown in Equation (4) as the dynamic VBPCA (DVBPCA) due to the capability of capturing process dynamics.Moreover, the DVBPCA can handle the overfitting resulting from highly dimensional variable augmentation in Equation (4) through penalizing parameter values, which are detailed as follows.
Denote the observed sequential dataset as are inputs and outputs, respectively, and As illustrated in Equation (5), the observed data point d n is assumed to be generated by where ε n is the measurement noise.As shown in Equation ( 6), assign conjugate prior to the hidden variables where N (•) represents the normal distribution probability density function, 0 M represents an M-dimensional column vector with element 0, and I M represents an M-dimensional unit matrix.According to Figure 1, which shows that the samples are independently collected, the conditional distributions of the observed samples are designed as shown in Equations (7); that is, where I K represents a K-dimensional unit matrix.
To conform with common sense and to simplify subsequent calculations, the distributions over all RVs are selected as conjugate priors as detailed in Equation ( 8)- (11); that is, where 0 K represents a K-dimensional column vector with element 0, β is the accuracy parameter of the mean, a α , b α , a τ , b τ are the hyper-parameters of the distributions over these RVs, and G(•) is the gamma distribution probability density function.Particularly, when little or no prior information is obtained for these priors, it is suggested to set noninformative priors for the RVs to minimize the effect on the posterior distribution.That is, the values of hyperparameters a α , b α , a τ , and b τ are set as small as possible so that the posteriors do not rely on the priors, but only on the data.

Remark 1. To facilitate model training, rewrite Equation
Each inverse variance of the w m is controlled by the corresponding α m .Thus, if the posterior distribution of a particular α m is focused on larger values, the corresponding w m will converge to be tiny, effectively deactivating this particular direction in the latent space.Therefore, the effective dimensionality of the potential space is determined automatically.
According to Figure 1, the equation shown as Equation ( 12) represents the joint probability density function between training data D and the RVs Θ as follows: The VI is employed to train the DVBPCA to obtain the posterior distributions q(Θ) of the RVs Θ = {Z, W, µ, α, τ}.
Assume the posterior distribution over Θ, q(Θ), can factorize into q(Θ) = q(Z)q(W )q(µ)q(α)q(τ) According to Equation (2), the posterior distribution q * (z n ) over z n is detailed as where m Similarly, by using the general rule given by Equation ( 2), we can find the variational posterior distributions over other RVs, as shown on the right side of Equation ( 13) one by one, which are given as follows.
The posterior distribution q * (W ) over W is obtained as where , and m where where where 2 + a τ .The ELBO L(q) is calculated using the updated posterior distributions, i.e., The convergence of the training procedure can be diagnosed with ELBO.The termination criteria for the parameter learning process are defined as detailed in Equation ( 20), as follows: where L iter+1 and L iter are the values of ELBO at the (iter + 1)-th and iter-th iterative steps, respectively, and δ is the threshold.The main steps in training the DVBPCA model are summarized in Algorithm 1.
Algorithm 1 Pseudocode for the DVBPCA.

Moving Window-Based Dynamic Variational Bayesian Principal Component Analysis
The moving window (MW) method is used to trace time variations and reject disturbances [28].It captures the latest process information by discarding the oldest sample after the latest sample is obtained and rebuilding a model with all data in the window.Specially, for the first local model, as mentioned above, set noninformation priors for the RVs.For subsequent local modeling, the prior distributions of the RVs in the MW-DVBPCA are updated one by one and replaced with the trained posterior distributions of the RVs at the previous moment.The MW-DVBPCA is detailed in this subsection.
Firstly, the training dataset (i.e., historical data) is preset in the MW, and the DVBPCA is trained on the training dataset to obtain the posterior distributions of the RVs Θ.When the test samples (i.e., online samples) are acquired, the DVBPCA generated earlier is used to estimate the KVs, and the posterior distributions of RVs are used as the prior for the new DVBPCA model, which is elaborated as follows.
Let x n and y n denote the observations of inputs and outputs at certain online time instant n , respectively, where y n is unknown.The hidden RVs z n are first introduced.
Since the outputs y n are unknown, the posterior distribution over z n is calculated with the observation of x n which, based on Equation (14), is given by the equation as shown in Equation ( 21) as follows: where Then, the conditional distribution of y n given x n is obtained as where m y w and Σ w are the mean and covariance calculated from the posterior distribution q * (W y ), respectively, and m y µ and Σ yy µ are the mean values and covariance matrix calculated from the posterior distribution q * (µ y ), respectively.
By replacing the distribution of τ with its expectation, the probability distribution of the predicted KVs is approximated as Therefore, based on Equation ( 26), the estimations ŷn of y n are given by and the prediction uncertainty is quantified by the variance matrix of y n denoted by V p(y n |x n ) [y n ], which is given by Equation ( 28), i.e., Once the KVs predictions for the new sample are completed, the historical data window is slid down to include the latest measured sample set {x n , y n } and to eliminate the oldest sample.The DVBPCA is then retrained according to the current window of the historical dataset for online prediction.This process repeats once new samples are obtained online.

Differential Evolutionary-Based Model Selection
The DE is an effective and simple intelligent optimization algorithm commonly used for solving NP-hard problems in real number fields [29].Associated studies have demonstrated that DE has fast convergence performance as an excellent global optimization algorithm.Therefore, the DE is utilized for model selection (i.e., selecting the best parameter) of the DVBPCA and MW-DVBPCA, involving the dynamic orders p and time delays l given in Equation (4).Equation (29) shows the optimization problem, as follows: min g(p, l) (29) where g(•) is the fitness function, p min and p max are the lower and upper bounds of the dynamic orders p, respectively, and l min and l max are the lower and upper bounds of the delays l, respectively.
Based on Equation (4), {p, l} are non-negative integers.Therefore, the individuals generated by the DE need to be rounded.And the DE algorithm contains three core evolutionary operations, i.e., mutation, crossover and selection, which are introduced in detail as follows.
Initialization.Set the population size NP, dimension of individual (2S), and the upper bounds (ρ ) and the lower bounds (ρ ) of the dynamic orders and time delays, where individuals (ρ 1 , ρ 2 , • • • , ρ 2S ) in the population are randomly generated in integer field by Equation (30), i.e., Mutation.In the r-th generation, DE generates a mutation vector v e for the e-th individual, given by Equation (31); that is, where ρ r e 0 , ρ r e 1 , and ρ r e 2 are individuals that randomly selected from population with e 0 = e 1 = e 2 = e, and F is the mutation rate.
Crossover.A trial individual vector u e is generated by the original individual vector or the mutation vector based on the crossover rate CR ranged in (0, 1), which is shown in Equation (32), i.e., where rand e is a random integer ranged in (1, 2S).
Selection.According to the fitness function g(•), select the individual with a lower value as detailed in Equation ( 33) as follows: Termination.The evolution continues until it reaches the given maximum generation or the change in the fitness value is small.Otherwise, the above three operators should be performed repeatedly.
Figure 2 shows the flowchart of the DVBPCA and MW-DVBPCA combined with the DE for parameter optimization.Note that for the multi-output models, DVBPCA and MW-DVBPCA, the average prediction error of four KVs is set as the fitness function.

Case Studies and Comparisons
A comparative study of data-driven virtual sensors upon a real-life natural gas steam reforming hydrogen production process is provided in this section.Meanwhile, a case study is useful for understanding the hydrogen production processes and data features.The state-of-the-art virtual sensors and developed models (DVBPCA and MW-DVBPCA) are investigated, including partial least squares (PLS) [30], VBPCA, long short-term memory (LSTM) [31], echo state networks (ESN) [32], and dynamic PLS (DPLS) [33].The PLS is a classic static model with the advantages of modeling data colinearities and simple data structure.The VBPCA is a static probabilistic model.Different from the PLS, the VBPCA determines the number of principal components automatically, which is also the basis of our developed models (DVBPCA and MW-DVBPCA).The LSTM and ESN are advanced dynamic neural network models that consider the dynamics.The DPLS is the improvement of the PLS based on the FIR paradigm, which also accounts for the dynamics of the hydrogen production process.

Natural Gas Steam Reforming Hydrogen Production Process
The natural gas steam reforming hydrogen production process is the largest industrial source of hydrogen, and it consists of four main processes: feedstock purification, steam reforming, medium temperature conversion, and pressure swing adsorption.Among these processes, steam reforming is the dominant reaction process, which is schematically shown in Figure 3 [34].Processed gases are mixed with steam in the pre-reformer, where all hydrocarbons and some CH 4 are converted to CO, CO 2 and H 2 O.The temperature then decreases, which is not conducive to promoting hydrogen generation.The pre-reformer output is heated in a pre-heater and then continuously fed to the primary reformer for complete reforming.
The main chemical reactions of this process are shown as follows: According to Equation (34), the exit gases of the hydrogen production process consist of CH 4 , CO, CO 2 , and H 2 , and the concentrations of these gases are KVs (as labeled "Y" in Figure 3) related to product quality.Therefore, these KVs need to be strictly monitored.In practice, offline laboratory analysis and hardware sensors are traditional methods for measuring the KVs, but have delays and high investment.Meanwhile, series-wound devices (such as pre-reformers and pre-heaters) introduce considerable transportation delays, which must be considered in virtual sensor modeling.Therefore, for giving realtime predictions of the KVs, a virtual sensor considering transportation delays is desirable.

Explanatory Variable Selection and Data Collection
A total of 13 EVs for the virtual sensor development are selected based on the first principles and the operation experience of the engineers.These EVs are easily measured variables, closely linked to the KVs and highly correlated with process variations, as shown by labels U1-U13 in Figure 3.The descriptions of these EVs are listed in Table 1.
Table 1.EVs in the hydrogen production process.

U1
Flow of fuel natural gas into primary reformer U2 Flow of fuel off gas into primary reformer U3 Pressure of fuel off gas at the exit of heat exchanger 3 U4 Pressure of furnace flue gas at primary reformer's exit U5 Temperature of fuel off gas at the exit of heat exchanger 3 U6 Temperature of fuel natural gas at pre-heater's exit U7 Temperature of process gas at primary reformer's entrance U8 Temperature of furnace flue gas at primary reformer's top left U9 Temperature of furnace flue gas at primary reformer's top right U10 Temperature of mixed furnace flue gas at primary reformer's top U11 Temperature of transformed gas at primary reformer's left exit U12 Temperature of transformed gas at primary reformer's right exit U13 Temperature of transformed gas at primary reformer's exit The samples used to develop the virtual sensor of the KVs were obtained from the DCS database.The KVs' sampling rate is 10 min, and 2500 samples were collected.These were divided into three parts in order of collection time: the training set, the validation set, and the testing set, which consist of 1500 samples, 300 samples, and 700 samples, respectively.Figures 4 and 5 show sample division and the changing trend of these EVs and four KVs, respectively.In this article, we use scaled dimensionless data for modeling.

Evaluation Metrics
The root mean squares error (RMSE), the coefficient of determination (R 2 ), and the mean absolute error (MAE) are chosen to quantify different models' performance, which are defined as where ŷn and y n are the estimated values and true values of the n th testing sample, respectively, N is the size of the testing dataset, and ȳ = ∑ N n =1 y n /N are the mean of the KVs in the testing dataset.The RMSE and MAE indexes characterize the average and largest prediction errors on the test set, respectively, and the R 2 index characterizes the correlation between the predicted and true values.Therefore, the smaller the RMSE and MAE or the bigger the R 2 , the higher the predictive accuracy.

Parameter Selection
All models' optimal parameters need to be chosen to obtain different models' best prediction performance.Select the DE algorithm to minimize the average RMSE (i.e., the mean of the RMSE of the four KVs) on the validation set for parameter optimization of different models.For the PLS, the number of principal components and the time delays are the parameters to be optimized.For the DPLS, the number of principal components, the time delays and the dynamic orders are the parameters to be optimized.For the VBPCAbased models, the time delays and the dynamic orders are the parameters to be optimized.For the LSTM and ESN, the time delays are the parameters to be optimized.
For the VBPCA-based models, the key issue of automatic parameter determination of the principal component number is whether each column of the loading matrix W is either insignificant or significant, i.e., w m = 0 or w m = 0, which can be controlled by α according to Equation (8).Take DVBPCA for example, the fact that the variance of each w m can be quantified by 1/E(α m ) for m = 1, ..., M, which is displayed in Figure 6.The color distinction of the points in Figure 6 means the value of 1/E(α m ) is either insignificant or significant by setting threshold, where red means significant and blue means insignificant.As shown in Figure 6, the appropriate dimensionality of the principal component subspace is selected as 39.For the ESN, set the input regulation scale to 0.1, set the reservoir size to 50, and set the spectral radius to 0.8.Considering that the ESN involves the random weights in the reservoir computing step, 20 modeling tests were performed.For each trial, the random weights are saved and fixed, and the DE algorithm is used to optimize the model delays further.Then, the best results from 20 tests are picked for model performance comparison.Note that the ESN is a single-output model; we therefore construct four ESN models, one for each KV.For the LSTM, based on the debugging experience, set the time step to 1, set the learning rate to 0.1, set the hidden layer to 1, and set the neuron number in the hidden layer to 100, which performs favorably empirically in each of the replicated experiments of this work.
For the MW-DVBPCA, the impact of MW size is detailed in Figure 7.The MW size is in units of 10 mins, i.e., setting the MW size to 50 means that the MW contains 500 min of data.As shown in Figure 7, for small MW sizes, the data in the window may not appropriately represent the relationship between process variables.In contrast, excessive MW size covers too much outdated sample data so that the MW-DVBPCA fails to track the process change adequately.Therefore, according to Figure 7, the MW size was selected as 50.

Results and Analysis
The estimations on the test set of the KVs obtained by the investigated seven virtual sensors are visualized in Figures 8-10.In Figure 8, obviously, the PLS and VBPCA have the poor estimation performance.Because of the significant dynamics of the hydrogen production process, the estimation accuracy of the static models, PLS and VBPCA, is not as satisfactory as that of the other five dynamic models (such as around the 120-th sample of the CO 2 concentration).Figure 9 shows that the estimated values of the DVBPCA tracks real values better than the other three models (such as around the 300-th sample of the CO concentration and around the 300-th sample of the CO 2 concentration).That is because, on the one hand, the DVBPCA can deal with the colinearities between the EVs compared with the LSTM and ESN.On the other hand, the DVBPCA can tackle the overfitting results from high-order variable augmentation compared with the DPLS.Moreover, the ESN constructs four virtual sensors, one for each KV for estimation, but the DVBPCA is a multi-output model that considers the inherent relationships between the KVs. Figure 10 illustrate that the estimations of the four KVs by the MW-DVBPCA match the real values much better (particularly in the localized area of the CH 4 concentration around the 440-th sample) than other models, revealing the importance of considering time variation properties in the virtual sensor modeling of the hydrogen production process.Moreover, although the overall predicted values do match well with the true ones when the proposed MW-DVBPCA method was used, some discrepancies can be observed between test sample number 200 and 300 for all the four KVs.The possible reasons for these differences in the predicted and true values are as follows.Firstly, the characteristics of the samples between test sample number 200 and 300 are changing rapidly, and model learning does not accommodate such changes in time.Secondly, the samples between test sample number 200 and 300 have nonlinearities, but the local model constructed by MW-DVBPCA is linear.Overall, it is recognized that the proposed models show noticeable advantages over the benchmark models.Figure 11 compares the seven models in terms of scatter plots.Based on Figure 11, a further comparison of the prediction of the data-driven models can be made.As shown in Figure 11a, the predictions of the PLS for CH 4 component deviate obviously from the real values.Figure 11b,c shows that the VBPCA model improves the prediction accuracy of CO and CO 2 concentrations somewhat compared to the PLS model.However, the overall prediction accuracy is still very low. Figure 11d reveals that all models have relatively poor prediction accuracy for the H 2 concentration, but the MW-DVBPCA presents a better result.As indicated in Figure 11, the scatters by MW-DVBPCA are more closely and clearly located around the diagonal line than those of other models, thus illustrating better performance.
-  The estimation performance of all data-driven virtual sensors is quantitatively tabulated in Tables 2 and 3.For a further comparison, Tables 2 and 3 show the estimations of the data-driven virtual sensors not considering transportation delays.These models' parameter selection is consistent with the corresponding model considering transportation delays.Overall, the estimation results in Tables 2 and 3 provide an initial validation of the effectiveness of the data-driven virtual sensors.However, the performances of the different data-driven virtual sensors vary considerably.The performance of the models that consider delays is better than that of the corresponding models that do not, as shown in Tables 2 and 3. Take the DVBPCA as an instance.The predictive performance based on the RMSE index of the four KVs by the model accounting for the time delays is improved by 18.6%, 13.0%, 2.3%, and 5.5%, respectively, compared to the model ignoring time delays.This is because the hydrogen production process has substantial time delays, and it has been proven that ignoring the delays could result in significantly deteriorated performance [29].The R 2 values for two static models, i.e., the PLS and VBPCA, are as low as below 0.5; in contrast, the dynamic models, such as LSTM, ESN, and DPLS, show significantly better performance than the PLS and VBPCA, indicating the dynamic model hypothesis should be rejected; that is, statistically the median values of two virtual sensors are different.
The Wilcoxon test results are given in Table 4, where E

Computational Efficiency Analysis
Since this article is concerned with real-time estimation, examining the runtime of the model is desirable.The offline and online computational efficiency of the virtual sensors is evaluated using the average over 10 independent simulations, including the CPU time consumed offline for parameter optimization (CPT opt ) and the CPU time consumed online (CPT online ).All experiments were computed on a Core i5 (2.90 GHz × 2) with 8 GB RAM, Windows 10 and R2021a version of MATLAB.
Table 5 lists the time taken by each virtual sensor on parameter determination.As can be observed, the CPT opt index for the DVBPCA model is much smaller than that for the LSTM, due to its more concise structure.The CPT opt indices for other dynamic global models are almost the same as those for the DVBPCA, but other dynamic global models have lower accuracy than the DVBPCA given in Table 2 and Table 3.Note that the CPT opt index for the MW-DVBPCA is much larger than that for the DVBPCA, which is because the MW-DVBPCA needs to rebuild the model each time it predicts a new valid sample.Fortunately, the parameter determination processes are carried out offline.In other words, this process hardly affects the online calculative efficiency of the MW-DVBPCA.The last column of Table 5 illustrates the online computation time of MW-DVBPCA.Consequently, the online computational efficiency of the developed models is not an issue.In practice, the CPT opt indices for all virtual sensors are less than 0.1 s/sample, significantly faster than the minimum sampling period for the KVs in the hydrogen production process.The results show that all data-driven virtual sensors meet the time requirements for real-time estimation, including the developed DVBPCA and MW-DVBPCA.

Conclusions and Outlook
Considering the complicated properties of the hydrogen production process, we develop virtual sensors for the KVs of the hydrogen production process in this article.The MW-DVBPCA is developed to model complicated properties of the hydrogen production process, including dynamics, time variations, and transportation delays.To be specific, the FIR paradigm and MW technique are employed to extract process dynamics and to deal with time variations, respectively.The time delays are determined automatically by the DE.A comparative study of developed virtual sensors and other state-of-the-art virtual sensors is carried out.Meanwhile, the performance of MW-DVBPCA is demonstrated by the real-life natural gas steam reforming hydrogen production process.
From the industrial point of view, the online estimations of the KVs of the hydrogen production process need further research.Some future works are given to further improve the estimation performance of the KVs.

•
Robust methods.The probabilistic model in this article is based on the traditional Gaussian distribution assumption, which is susceptible to outliers.Therefore, the training set must be cleaned to remove outliers.However, some outliers are indistinctive and challenging to detect and remove.To this end, finding a probability distribution insensitive to the noise and outliers can help improve the generalization performance of predictive models.Typically, Student's t distribution with heavier tails is a candidate choice.As a result, designing a robust virtual sensor based on Student's t distribution is worth investigating.

•
Data-driven approaches fused with process knowledge.In fact, the states (or the hidden variables) of the system are influenced by variables characterizing materials and energies fed into the process.Conventional virtual sensors take all observed variables as inputs and the KVs as outputs, which makes it difficult to describe the true causality between variables of the hydrogen production process, weakening the interpretability and generalization abilities.A causal virtual sensor can better reflect the process mechanism and thus estimate the KVs more accurately.Therefore, equipping the MW-DVBPCA model with the causality of process variables of the hydrogen production process is desirable.

Figure 3 .
Figure 3.The flowchart of steam reforming process.

Figure 5 .
Figure 5.The diagram of data partition of four KVs.

Figure 6 .
Figure 6.Variance of each w m by DVBPCA.
Figure 1graphically represents the DVBPCA.where, W x ∈ R K x ×M is the loading matrix of the inputs x n , W y ∈ R K y ×M DVBPCA mean the median values of squared estimated errors obtained by the PLS, VBPCA, LSTM, ESN, DPLS, DVBPCA, and MW-DVBPCA, respectively.Additionally, set the significance level η at 5%.As shown in Table4, all hypothesized pvalues are far below η.Hence, all hypotheses are rejected.In other words, there is statistical significance in comparing the MW-DVBPCA with other virtual sensors in the hydrogen production process.

Table 4 .
Results of the Wilcoxon test.

Table 5 .
Comparison of computational efficiency of various models (seconds).