Dynamic Feature Extraction and Semi-Supervised Soft Sensor Model Based on SCINet for Industrial and Transportation Processes

Wang, Jun; Qi, Changjian; Luo, Xing; Deng, Shihao; Lei, Qi

doi:10.3390/asi8030073

Open AccessArticle

Dynamic Feature Extraction and Semi-Supervised Soft Sensor Model Based on SCINet for Industrial and Transportation Processes

by

Jun Wang

^1,*,

Changjian Qi

¹,

Xing Luo

²,

Shihao Deng

² and

Qi Lei

^2,*

¹

CRRC Nanjing Puzhen Vehicle Co., Ltd., Nanjing 210031, China

²

Main Campus, School of Automation, Electrical EngineeringCentral South University, Changsha 410083, China

^*

Authors to whom correspondence should be addressed.

Appl. Syst. Innov. 2025, 8(3), 73; https://doi.org/10.3390/asi8030073

Submission received: 25 March 2025 / Revised: 30 April 2025 / Accepted: 23 May 2025 / Published: 29 May 2025

Download

Browse Figures

Versions Notes

Abstract

In industrial processes, dynamic changes are one of the factors restricting the performance of soft sensor models. Meanwhile, the inconsistency of sensor sampling rates often leads to the problem of mismatch between process variables and quality variables. This paper proposes a semi-supervised soft sensor modeling method based on sample convolution and interactive networks (SCINet). To extract the dynamic information of industrial processes more fully, an unsupervised time series dynamic feature extractor was designed based on SCINet and an autoencoder, and the feature extractor was trained using complete data. The dynamic features encoded by the dynamic feature extractor were transferred to the eXtreme Gradient Boosting (XGBoost) ensemble model with strong generalization ability. The semi-supervised soft measurement model SSCI-XGBoost was established. The effectiveness of dynamic feature transfer and model performance improvement was verified on the industrial process dataset.

Keywords:

soft sensor; sample convolution and interactive networks (SCINet); dynamic feature extractor

1. Introduction

There are many complex problems in the soft sensor modeling of industrial processes, such as process nonlinearity, process dynamics, quality variables lagging behind process variables, and quality variables not matching process variables [1,2,3]. When dealing with complex industrial processes and massive industrial data, shallow neural networks are not enough to deal with these high-dimensional data and dynamic process scenarios, which leads to problems such as insufficient prediction accuracy and poor generalization of soft sensor models [4,5].

With the development of artificial intelligence, the feature representation ability of deep neural networks has become more and more prominent [6]. The deep neural network transforms the original input into higher dimensional and more complex features, which can better simulate the nonlinearity and dynamics of the industrial process so as to obtain a more accurate soft sensor model [7]. Deep neural networks can be divided into two categories: static networks represented by deep belief network (DBN) [8] and stacked auto-encoder (SAE) [9] and dynamic networks represented by recurrent neural network (RNN) [10]. In a variety of static networks, it is generally assumed that there is no time-dynamic dependence between samples in the industrial process. In order to establish a soft sensor model, the static model constructs an augmented matrix according to the input dimension of the data to overcome the process dynamics [11], but determining how to reasonably construct the augmented matrix can only rely on expert experience.

The correlation of each sample in the process data hides the dynamic information of the industrial process, and capturing this information is the key to improving the accuracy of the soft sensor model. Sequential model RNN has been applied to industrial processes [12]. To solve the long-term dependence of RNN, soft sensor models using long short-term memory (LSTM) [13] or gated recurrent unit (GRU) [14,15] have achieved good prediction results in some industrial scenarios. Based on LSTM, Yuan et al. constructed a supervised LSTM unit for learning the hidden dynamic information related to quality in the industrial process and verified the effectiveness of the proposed supervised LSTM soft sensor model on multiple datasets [16].

In the research of the sequential model, Liu et al. proposed sample convolution and interaction networks (SCINet) [17] for time series forecasting. SCINet enhances the original sequence’s predictability by capturing features’ temporal dependence at different temporal resolutions [18]. It has a stronger feature extraction ability and superior long-term and short-term prediction ability in wind power prediction and stock trading [19].

All the above sequential models are designed as end-to-end models. Although these models have powerful dynamic feature extraction capabilities, their regressors are generally single- or multilayer, fully connected networks. When performing regression fitting on the extracted features, the performance of the models is limited. Ensemble models in machine learning have strong generalization ability by constructing multiple sub-regressors, but such ensemble models cannot be trained directly with deep networks.

Therefore, to combine the advantages of both, some researchers have used deep neural networks for extracting complex dynamic features and used ensemble models as regressors to transfer dynamic features [20,21]. Lian et al. used DBN combined with a particle swarm optimization algorithm to extract features as the support vector regression (SVR) input to establish a soft sensor model, achieving good results [22]. Wang et al. used DBN to extract the features of auxiliary variables and input them into the extreme learning machine for training to obtain the soft sensor model [23]. Fan et al. proposed a hidden layer feature extractor based on continuous restricted Boltzmann machine (CRBM) and established the CRBM-SVR soft sensor model [24].

In the historical data of many industrial processes, due to the inconsistency of sampling frequency, the high cost of quality variable analysis, and the restriction of field environment, the number of process variables that are easy to collect is generally more than that of quality variables that are difficult to collect and analyze. If only the labeled samples are used, and the valuable information contained in the unlabeled samples is ignored, the performance of the soft sensor model will also be restricted due to the limited prior information.

Semi-supervised soft sensor modeling evolves from generating pseudo-labels to using unsupervised pre-training and supervised fine-tuning. To use a large number of unlabeled samples to improve the performance of the model, Li et al. proposed a semi-supervised ensemble SVR soft sensor model by using the idea of generating pseudo-labels in ensemble learning and used an extended pseudo-labeled dataset to improve the performance of the soft sensor model [25]. Considering the dynamic characteristics of industrial processes and the mismatch between quality samples and process samples caused by irregular sampling, Tang et al. proposed a historical feature fusion attention semi-supervised LSTM (HFFA-SSLSTM) soft sensor model to extract the historical dynamic information of samples, which significantly improved the prediction accuracy [26].

To obtain a better soft sensor model, it is important to consider the process dynamic characteristics of the samples and the generalization ability of the model. Combined with the mismatch problem between the process samples and the quality samples, these existing problems and the shortcomings of the existing methods motivate this research.

Considering the existing problems of industrial processes, this paper proposes a semi-supervised soft sensor modeling method based on SCINet dynamic feature extraction. First, the sample convolution and interaction network with an encoder–decoder structure extracted effective dynamic features unsupervised. Then, the dynamic features were transferred to the XGBoost model [27], which had a strong generalization ability to establish a soft sensor model. The feature extraction ability of the deep network and the strong model’s generalization ability were considered. Finally, the effectiveness of the proposed method was verified by the example analysis of the debutane column dataset [28] and the sulfur recovery dataset [29].

The main contributions of this paper are as follows:

(1) A dynamic feature extractor based on SCINet and autoencoder was designed to extract the dynamic features of all samples in an unsupervised manner, which makes full use of the process information contained in unlabeled samples. (2) The dynamic features of labeled samples were transferred to the XGBoost model to train a regressor, which fully combines the feature extraction ability of deep neural network and the generalization ability of ensemble model.

The rest of the paper is organized as follows: The network structure of SCINet is introduced in Section 2. Section 3 describes the XGBoost model. Section 4 presents a dynamic feature based XGBoost model and explains how dynamic features are transferred to XGBoost and the modeling process. In Section 5, full experimental validation is carried out on industrial datasets for performance evaluation. Finally, the conclusion is given in Section 6.

2. SCINet

SCINet [17] is a hierarchical network that enhances the predictability of an original time series by capturing the time dependence of features at multiple time resolutions. SCINet has a binary tree structure, as shown in Figure 1a. The basic component of SCINet is the SCI-Block, as shown in Figure 1b. In each SCI-Block, the original sequence is decomposed into two sub-sequences. Different convolution modules extract homogeneous and heterogeneous information from the decomposed sub-sequences. Through interactive learning and information complementarity, new sequence representations are formed. To capture the dynamic features at different time granularities, SCINet downsamples the input sequence F in the time dimension to obtain F_odd and F_even. These two sub-sequences have a relatively coarse time resolution and retain most of the information of the original sequence. In each SCI-Block, four different convolution modules, namely

ψ

, ϕ, η, and ρ, are employed to extract the features of F_odd and F_even. To deal with the potential information loss problem caused by downsampling, an interactive learning strategy is introduced in SCI-Block to facilitate the information exchange between subsequences. Firstly, F_odd and F_even are mapped to the hidden layer states through two convolution modules,

ψ

and ϕ. This stage can be regarded as performing scaling transformations on F_odd a F_even, as in Formula (1). Then, using the other two convolution modules η and ρ, the scaled features after the first stage are further mapped and interacted to obtain two updated features,

F_{o d d}^{'}

and

F_{e v e n}^{'}

, as shown in Formula (2).

\begin{matrix} F_{e v e n}^{s} = F_{e v e n} ⊙ \exp (ψ (F_{o d d})), \\ F_{o d d}^{s} = F_{o d d} ⊙ \exp (ϕ (F_{e v e n})) \end{matrix}

(1)

\begin{matrix} F_{e v e n}^{'} = F_{e v e n}^{s} - η (F_{o d d}^{s}) \\ F_{o d d}^{'} = F_{o d d}^{s} + ρ (F_{e v e n}^{s}) \end{matrix},

(2)

where

ψ

, ϕ, η, and ρ are convolutional filters, and exp is the exponential transformation.

The input data can be divided into k (k = 1, 2, …, K) time sequence;

X_{s e q} = [X^{1}, X^{2}, \dots, X^{K}]

. For each time sequence,

X^{k} = [x_{1}^{k}, x_{2}^{k}, \dots, x_{T}^{k}]

, where T is the window length, is decomposed layer by layer, and is processed by different levels of SCI-Blocks, which can effectively learn the features at different temporal resolutions. Feature information from previous layers is accumulated, meaning deeper layers contain information from time-scale features at shallower layers. The concatenated features are accumulated with the original sequence X_seq to obtain the hidden layer dynamic features, hv. The hv is fed to the fully connected layer for decoding to obtain the predicted sequence

{\hat{X}}^{k} = [{\hat{x}}_{1}^{k}, {\hat{x}}_{2}^{k}, \dots, {\hat{x}}_{τ}^{k}]

, where

τ

is the step length of the refined sequence. Then, the absolute error loss function between the predicted value and the true value is given as follows:

L o s s = \frac{1}{τ} \sum_{i = 1}^{τ} ‖ {\hat{x}}_{i}^{τ} - x_{i}^{τ} ‖

(3)

3. XGBoost Model Analysis

The eXtreme Gradient Boosting ensemble tree model (XGBoost) is a widely used method in ensemble learning and has achieved excellent results in many regression and classification problems [27]. A common method to train a good model is to minimize the loss function on the training data, that is, the empirical risk minimization function. The model trained with this objective function has a relatively high complexity. To avoid this problem, model complexity, namely the structural risk minimization function, is usually introduced into the objective function of the integrated model, as shown in Equation (4).

\begin{matrix} o b j (θ) = \frac{1}{N} \sum_{n = 1}^{N} L o s s (y_{n}, f (x_{n})) + \sum_{m = 1}^{M} Ω (f_{m}) \end{matrix}

(4)

where n represents the number of samples, and x_n and y_n are the inputs and outputs of the samples, respectively. In Equation (4), the objective function consists of two parts: the training error of the model and the regularization term that controls the complexity of the model. The regularization term is obtained by summing the regularization terms of all subtrees, and

Ω (f_{m})

is the regularization term of the mth tree.

The XGBoost model sets different weights for the CART tree using the scoring function. These CART trees are combined in a weighted form to form a strong learner, effectively reducing the model error and variance. The regularization term is introduced to control the complexity of the model, which is obtained by adding the regularization terms of all subtrees. The complexity of each tree is shown as Equation (5):

Ω (f) = γ p + \frac{1}{2} λ \sum_{j = 1}^{p} ω_{j}^{2}

(5)

where p represents the number of leaf nodes, γ is the regularization coefficient controlling the number of leaf nodes, λ is the regularization coefficient controlling the weight of leaf nodes, and

ω_{j}

is the weight of the jth leaf node.

4. Dynamic Feature Based XGBoost Model

4.1. The Structure

In an end-to-end sequential neural network, the network can generally be regarded as two parts: the first part is used to extract process dynamic information, and the second part of the fully connected network regression layer is used to fit features. Although the deep sequence model can extract dynamic features, the generalization ability of the regression layer constructed by the fully connected neural network is insufficient in the case of a small number of training samples, and the accuracy of the feature fitting will be limited. The ensemble modeling method in machine learning can improve the accuracy and stability of the model by combining multiple sub-regressors together and has strong generalization ability [30].

When ensemble modeling addresses the problem of process dynamics, the original sequence sample is typically expanded in one dimension and used as model input. However, this approach only uses the model’s ability to fit the input samples and does not leverage the dynamic information between sequences. Compared with the sequential model, the performance of the ensemble model is also restricted by its insufficient ability to capture dynamic features. Therefore, the deep sequence neural network is used to extract complex dynamic features, and then, the ensemble model is used as the regressor. This way, the dynamic features can be transferred to the ensemble model to compensate for the dynamic feature capture ability deficiency and obtain a better regression effect. The structure of the dynamic feature extraction-based soft sensor model is shown as Figure 2.

4.2. Dynamic Feature Extractor and Feature Transfer

The autoencoder (AE) proposed in [31] is used to learn the main rules of the data to extract features. Based on the unsupervised AE, a series of supervised and semi-supervised stacked auto-encoder models have been developed, such as discussed in [12,32]. The structure of this type of soft sensor model is still the ordinary fully connected neural network, and the semi-supervised static models such as SS-SAE are only capable of dealing with missing samples and do not account for the process dynamics. However, the mismatch problem between process dynamics and the number of process quality variables often coexists, so it is hoped that the soft sensor model can fully use unlabeled samples and extract process dynamic information.

Based on AE, considering the dynamic characteristics of the industrial process, a dynamic feature extractor [32] can be designed. Different from the traditional autoencoder, the input of the dynamic feature extractor is the sequence data in a period-of-time window, and the goal of the decoder is to try to restore the output sequence to the input sequence so as to maximize the learning of the dynamic information in the industrial process.

SCINet can effectively extract the relationship features between samples. Limited by the ordinary fully connected network of regressors, SCINet has insufficient generalization ability, and the model is easy to overfit when the number of training samples is small. Based on the feature capture capability of SCINet, considering the generalization performance of the model and the existing problem of mismatch in the number of process quality variables, the unsupervised dynamic feature extractor was designed, as shown in Figure 3.

For the encoder, the input data are expressed as a K time sequence,

X_{s e q} = [X^{1}, X^{2}, \dots, X^{K}]

. The decoder output sequence can be expressed as

{\hat{X}}_{s e q} = [{\hat{X}}^{1}, {\hat{X}}^{2}, \dots, {\hat{X}}^{K}]

, whose dimension is consistent with the input sequence. The training loss function of the dynamic feature extractor is the reconstruction error of the input sequence and the output sequence, defined as follows:

L o s s = \frac{1}{K} \sum_{k = 1}^{K} ‖ {\hat{X}}^{k} - X^{k} ‖

(6)

where K is the number of sequences.

The encoding part of SCINet is used as the encoder of the dynamic feature extractor, and the sequence processed by the encoder is represented by a vector (hv). Combining the SCINet loss function (3) and the loss function of the dynamic feature extractor (6), the optimization objective of the feature extractor is designed as follows:

J = \frac{1}{K} ‖ g_{θ} (d_{θ} (X_{s e q})) - X_{s e q} ‖

(7)

where

d_{θ}

and

g_{θ}

, respectively, feature the extractor encoder and the decoder.

After the training of the dynamic feature extractor, the sequence corresponding to the labeled samples is input to the encoder of the dynamic feature extractor, and the output hidden layer vector is the dynamic feature corresponding to the sequence hv. The dynamic features are transferred to XGBoost to construct the SCI-XGBoost soft sensor model. Since the unsupervised training is not affected by the quality labels, the extracted dynamic features can be provided to XGBoost with stronger generalization ability for quality variable prediction.

4.3. Semi-Supervised Soft Sensor SSCI-XGBoost

Considering the mismatch of the quantitative relationship between process variables and quality variables caused by the inconsistent sampling rate in industrial processes, if only labeled samples are used for modeling, the dynamic information contained in many process variables will be ignored.

In order to fully capture the potential dynamic information in the industrial process and use unlabeled samples to improve the performance of the model, this paper proposes a semi-supervised soft sensor model SSCI-XGBoost based on dynamic feature extraction. The full data are used to train the SCINet feature extractor. Then, the labeled sample and its dynamic features are used to train the XGBoost regressor. The modeling flow chart is shown in Figure 4. The modeling process is divided into two stages: offline training and online testing. In the offline stage, historical data are organized to determine the optimal parameters of the soft sensor model, and then, the model is trained. In the testing phase, the online data and the trained model are loaded to realize the online measurement.

5. Experiment

5.1. Evaluation Index

To verify the effectiveness of the proposed model, this paper selects mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (R²), three metrics that are used to evaluate the performance of soft sensor models quantitatively. The formulas of these metrics are as follows:

M A E = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |

(8)

R M S E = \sqrt{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} / N

(9)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(10)

To eliminate the influence of different dimensions of the data, in the data preprocessing stage, all datasets used for experimental validation are preprocessed using standardization for each process variable and quality variable. The hardware platforms and software versions used in this experiment are as follows: CPU: Intel(R) Core(TM) i7-9700 (3.00 GHz); memory: 16 GB; operating system: Windows 11 (64-bit). All code is implemented in Python (3.7.13), and the main Python libraries used are PyTorch (1.11.0) and XGBoost (1.5.0).

5.2. Case 1. Debutane Column Process

In this section, experiments are carried out on the debutane column’s industrial process to prove the effectiveness of the feature transfer. In this process, there is a complex coupling relationship between samples, which has strong process dynamics. Figure 5 shows the schematic diagram of the debutanizer. The description and units of primary process and quality variables are listed in Table 1. Process variables (u₁, u₂, …, u₇) and quality variable (butane concentration, y) are collected every 15 min. Butane concentration must be analyzed using a gas chromatograph, which takes 30 min. Therefore, the acquisition time of the quality variable lags the process variable by 45 min. This lag prevents the production process from obtaining real-time feedback information in a timely manner, affecting the subsequent process’s control.

The acquisition of the quality variable time lags behind the process variables by four time steps, and a soft sensor model is needed to measure butane concentration in real time. Fortuna developed a nonlinear autoregressive moving average (NARMA) model as shown in (11) [28]. At the i moment, only the butane concentrations at the i-4 moment and the moment before are available from the database. Butane concentrations at i-3 to i are obtained by iteration of the model prediction, which leads to larger errors.

\begin{matrix} \hat{y} (i) = f [u_{1} (i), u_{2} (i), u_{3} (i), u_{4} (k), u_{5} (k), u_{5} (i - 1), \\ u_{5} (i - 2), u_{5} (i - 3), (u_{6} (i) + u_{7} (i)) / 2, \\ \hat{y} (i - 1), \hat{y} (i - 2), \hat{y} (i - 3), y (i - 4)] \end{matrix}

(11)

where

\hat{y}

represents the predicted value of the soft sensor model. To avoid this accumulation of iterative errors, SCI-XGBoost predicts butane concentration at time i using only historical data, including time i and its predecessor, as follows.

\begin{matrix} \hat{y} (i) = f [u_{1} (i), u_{2} (i), u_{3} (i), u_{4} (k), u_{5} (k), u_{5} (i - 1), \\ u_{5} (i - 2), u_{5} (i - 3), (u_{6} (i) + u_{7} (i)) / 2, y (i - 4)] \end{matrix}

(12)

The debutane column dataset has a total of L = 2394 samples. A sliding window is used to construct a sequence dataset on the debutane column dataset, and the data are sliced by sliding from front to back in the time dimension. It is known that the step length of quality variable lag P = 4, and sliding window size T = 16; then, the number of samples L and the number of the sequence dataset N satisfy the relationship N = L − T – P + 1, so there is a total of N = 2375 sequence samples. The sequence dataset is divided according to the rule of training set; validation set/test set = 7:1:2, and the number of samples in each part is 1662:238:475.

Regarding the hyperparameters of the SCI-XGBoost model, the optimal values of some parameters are determined by a grid search algorithm and multiple attempts. Parameters such as the size of the convolution kernel in the convolutional layer of the feature extractor, the multiple of the extended dimension of the hidden layer, and the maximum depth of downsampling are used to determine the optimal parameters through several trials. An adaptive learning rate adjustment strategy improves the neuron weights during training. The learning rate will be reduced slightly if the training loss does not decrease after five consecutive iterations. Additionally, training can be halted to prevent overfitting when the model’s loss stabilizes and shows little to no decrease, even if the maximum number of iterations has not yet been reached. The grid search method is also used to determine the critical parameters of the parameter setting of the XGBoost regressor. After many trial experiments, the model trained in Table 2 performs best on the validation set.

In order to verify the effectiveness of the SCI-XGBoost model, ANN, XGBoost, LSTM, SLSTM, and SCINet were selected as comparison models. For the non-sequential models, ANN and XGBoost, the model’s input is the data after one-dimensional expansion of the sequence data to increase their dynamic learning ability. The network structure of ANN is [128-64-16-1], and the activation function is Relu. For both LSTM and SLSTM, the number of hidden layer neurons is set as 100, the input sequence length is the same as the input window length T of SCI-XGBoost, the learning rate is 0.01, and the number of iterations is 100.

Under the above hyperparameter settings, the prediction curves of various models on the test set of the debutane column are shown in Figure 6, and the evaluation metrics of the models are shown in Table 3. When only the sequence data of k-4 and its previous time are used to predict the quality variable at time k, the predicted value of SLSTM has a large deviation from the true value at some peak moments, and the prediction performance is not as good as LSTM and SCINet. LSTM has worse prediction performance than SCINet at some peak moments. SCI-XGBoost has better performance than SCINet, which indicates that even if the dynamic features are extracted through SCINet, it is difficult to achieve the regression accuracy of the ensemble model only through the fully connected layer, which proves the strong generalization ability of XGBoost. Compared with XGBoost, the predicted value of XGBoost is significantly deviated from the true value at the peak time, indicating that XGBoost has insufficient ability to capture the dynamic characteristics of the process. Therefore, SCI-XGBoost has strong dynamic feature capture ability and better generalization performance, so the prediction effect is better than that of other models.

For further comparison, the boxplot comparisons of absolute prediction errors on the testing dataset are shown in Figure 7 for these six models. Compared with other models, SCI-XGBoost has the narrowest box, and the bottom is also the closest to zero.

For the XGBoost model, when an input feature has a higher proportion of the number of splits in the CART tree-splitting process, the feature is more important to the model. Figure 8 shows the bar chart of the proportion of splits in the XGBoost model using dynamic features. Compared with the XGBoost model, which does not use dynamic features, some dynamic features are important, similar to the original features. The importance of other dynamic features is much higher than that of the original features, indicating that the added dynamic features are effective in improving the performance of XGBoost.

5.3. Case 2. Sulfur Recovery Unit

To verify the effectiveness of the semi-supervised model SSCI-XGBoost and prove the ability of the unsupervised dynamic feature extractor to extract the dynamic information of unlabeled samples, the sulfur recovery industrial process dataset was selected for experimental verification.

The sulfur recovery unit is an important device for dealing with industrial waste gas, which removes gases such as SO₂ and H₂S in acidic airflow. Its simplified industrial process flow is shown in Figure 9. The exhaust gas treated by the sulfur recovery unit still has residual SO₂ and H₂S. To avoid pollution, the concentrations of SO₂ and H₂S must be monitored before the exhaust gas is discharged into the atmosphere. The strong corrosion of acid gas requires the sensor to be removed and replaced frequently, so using a soft sensor to predict the gas concentration can reduce the monitoring cost. The sulfur recovery dataset [29] is described in Table 4. The dataset consists of five process variables (u₁, u₂, …, u₅) and two mass variables SO₂ (y₁) and H₂S (y₂).

The sulfur recovery dataset has a total of L = 10,081 samples. A sliding window is used to construct a sequence dataset on the dataset, and the sliding window size is T = 16. The number of samples L and the number of the sequence dataset N satisfy the relationship N = L − T + 1, so there is a total of N = 10,066 sequence samples. The sequence dataset is divided according to the rule of training set; validation set/test set = 7:1:2, and the number of samples in each part is 7046:1026:1994.

In order to verify the effectiveness of the semi-supervised model, it is assumed that some quality variables are missing, and the quality variables of some samples in the training set are randomly erased. With some quality variables missing, the training samples of the sulfur recovery dataset are organized as shown in Figure 10 (T = 3 is assumed to simplify the representation, whereas T = 16 during the actual training process). The training set is divided into a labeled sample training set and an unlabeled sample training set, both of which can be used for training the unsupervised feature extractor, and the labeled sample training set is used for training the XGBoost regressor.

Regarding the parameter setting of SSCI-XGBoost, referring to the parameter setting of SCI-XGBoost on the debutane column dataset and combined with the grid search algorithm and repeated experiments on this dataset, the final optimal hyperparameter table of SSCI-XGBoost on the sulfur recovery dataset is shown in Table 5.

SS-SAE, XGBoost, and SCI-XGBoost were selected as comparison models. SS-SAE is a semi-supervised model, while XGBoost and SCI-XGBoost only use labeled samples for model training. On the sulfur recovery dataset, the input of the SS-SAE method refers to the paper [29], which provides the sulfur recovery dataset; the hyperparameter settings refer to the paper [20], which proposed the SS-SAE method and carried out the hyperparameter settings. To overcome the process dynamics, the network structure of SS- SAE is set as [20-16-12-6-1], the pre-training learning rate is 0.1, the number of pre-training iterations is 10, the fine-tuning learning rate is 0.01, and the number of fine-tuning iterations is 60.

For the training set, it is assumed that there are three cases for the proportion of labeled samples: 33%, 50%, and 67%. Under the above hyperparameter settings and different proportions of labeled samples, SSCI-XGBoost and other models are trained, respectively, and the experimental results of each model on the test set are shown in Table 6.

From Table 6, it can be seen that the performance of SCI-XGBoost is better than XGBoost in the case of three different proportions of labeled samples, indicating that the proposed model can fully extract process dynamic information. This result is consistent with the experimental results in the previous section to verify the effectiveness of feature extraction using the debutane tower dataset, which once again verifies the effectiveness of feature transfer. SSCI-XGBoost performs better than SCI-XGBoost trained only with labeled samples under different proportions of labeled samples, which indicates that the unsupervised dynamic feature extractor can capture the hidden dynamic information in unlabeled samples as a supplement. In the case of 33% and 50% labeled samples, the performance of XGBoost is better than SS-SAE, indicating that the ensemble model has stronger generalization ability when the number of labeled samples is small. The performance of the semi-supervised model SS-SAE is not as good as SSCI-XGBoost, which indicates that the dynamic feature extractor is more capable of capturing the process dynamic information.

In order to more intuitively represent the performance improvement brought by SSCI-XGBoost using unlabeled samples, the RMSE broken lines of the three models XGBoost, SCI-XGBoost, and SSCI-XGBoost are shown in Figure 11. In the case of three different proportions of labeled samples, the RMSE values of SCI-XGBoost are smaller than those of XGBoost, which indicates that the feature transfer is effective. The performance of SSCI-XGBoost is better than SCI-XGBoost, which proves that the feature extractor captures the hidden information of the unlabeled samples. Combining the dynamic feature extraction ability of a sequential model and the generalization performance of an ensemble model, the semi-supervised soft sensor model SSCI-XGBoost, based on unsupervised dynamic feature extraction, can fully capture the hidden dynamic information in unlabeled samples to maximize the model performance.

6. Conclusions

In this paper, an unsupervised dynamic feature extractor is designed based on SCINet and AE. The hidden layer features encoded by the dynamic feature extractor are transferred to the XGBoost ensemble model with stronger generalization performance, and a semi-supervised soft measurement model is established. Aiming at the problem of the mismatch in the quantitative relationship between process variables and quality variables caused by the inconsistent sampling rate of sensors, to fully capture the potential dynamic information in industrial processes, and to effectively utilize unlabeled samples and labeled samples, a semi-supervised soft measurement modeling method based on dynamic feature extraction is proposed.

The dynamic feature extractor designed in this paper does not adopt the stacked model. In future work, when the industrial process data dimension is higher, the data volume is larger, the dynamic feature extractor can be extended to a stacked model for processing. Meanwhile, due to the design of residual connections in the unsupervised dynamic feature extractor, the dimensions of the encoded features are the same as the input ones. Too large feature dimensions may produce disturbing information and degrade the performance of the regressor. Therefore, adding an attention layer to the extractor to reduce the dimension of the features may achieve better modeling results.

Author Contributions

Conceptualization, J.W. and C.Q.; methodology, X.L.; software, S.D.; validation, writing—review and editing, Q.L.; visualization and project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Jun Wang and Changjian Qi was employed by the company CRRC Nanjing Puzhen Vehicle Co., Ltd.. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Cao, Y.; Liao, Y.; Chen, L.; Wang, C.; Hu, Y. A Wireless Power Feedback Based Battery Equalizer with Multiple-Receiver. IEEE Trans. Transp. Electrif. 2025, 26, 2025. [Google Scholar] [CrossRef]
Wang, C.; Cao, M.; Cao, Y. Battery aging estimation algorithm with active balancing control in battery system. J. Energy Storage 2025, 108, 115055. [Google Scholar] [CrossRef]
Yuan, X.; Ou, C.; Wang, Y.; Yang, C.; Gui, W. A novel semi-supervised pre-training strategy for deep networks and its application for quality variable prediction in industrial processes. Chem. Eng. Sci. 2020, 217, 115509. [Google Scholar] [CrossRef]
Shao, W.; Tian, X. Semi-supervised selective ensemble learning based on distance to model for nonlinear soft sensor development. Neurocomputing 2017, 222, 91–104. [Google Scholar] [CrossRef]
Zheng, J.; Song, Z.; Ge, Z. Probabilistic learning of partial least squares regression model: Theory and industrial applications. Chemom. Intell. Lab. Syst. 2016, 158, 80–90. [Google Scholar] [CrossRef]
Cai, Z.; Zhu, W. Feature selection for multi- label classification using neighborhood preservation. IEEE/CAA J. Autom. Sin. 2017, 5, 320–330. [Google Scholar] [CrossRef]
Sun, Q.; Ge, Z. A survey on deep learning for data-driven soft sensors. IEEE Trans. Ind. Inform. 2021, 17, 5853–5866. [Google Scholar] [CrossRef]
Lu, Z.; Liu, H.; Chen, F.; Li, H.; Xue, X. BOF steelmaking endpoint carbon content and temperature soft sensor based on supervised dual-branch DBN. Meas. Sci. Technol. 2024, 35, 035119. [Google Scholar] [CrossRef]
Yuan, X.; Gu, Y.; Wang, Y.; Yang, C.; Gui, W. A deep supervised learning framework for data-driven soft sensor modeling of industrial processes. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4737–4746. [Google Scholar] [CrossRef]
Guesbaya, M.; García-Mañas, F.; Rodríguez, F.; Megherbi, H. A Soft Sensor to Estimate the Opening of Greenhouse Vents Based on an LSTM-RNN Neural Network. Sensors 2023, 23, 1250. [Google Scholar] [CrossRef]
Yu, F.; Xiong, Q.; Cao, L.; Yang, F. Stable soft sensor modeling based on causality analysis. Control. Eng. Pract. 2022, 122, 105109. [Google Scholar] [CrossRef]
Chang, S.; Chen, X.; Zhao, C. Flexible Clockwork Recurrent Neural Network for multirate industrial soft sensor. J. Process Control 2022, 119, 86–100. [Google Scholar] [CrossRef]
He, Y.; Lv, S.; Zhu, Q.; Lu, S. Novel multiattribute space-based LSTM for industrial soft sensor applications. IEEE Trans. Ind. Inform. 2023, 20, 4745–4752. [Google Scholar] [CrossRef]
Xie, R.; Hao, K.; Huang, B.; Chen, L.; Cai, X. Data-driven modeling based on two-stream λ gated recurrent unit network with soft sensor application. IEEE Trans. Ind. Electron. 2020, 67, 7034–7043. [Google Scholar] [CrossRef]
Duchanoy, C.; Moreno-Armendáriz, M.; Urbina, L.; Cruz- Villar, C.; Calvo, H.; Rubio, J. A novel recurrent neural network soft sensor via a differential evolution training algorithm for the tire contact patch. Neurocomputing 2017, 235, 71–82. [Google Scholar] [CrossRef]
Yuan, X.; Li, L.; Wang, Y. Nonlinear dynamic soft sensor modeling with supervised long short-term memory network. IEEE Trans. Ind. Inform. 2019, 16, 3168–3176. [Google Scholar] [CrossRef]
Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. Scinet: Time series modeling and forecasting with sample convolution and interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar]
Fang, H.; Ding, L.; Wang, X.; Chang, Y.; Yan, L.; Liu, L.; Fang, J. SCINet: Spatial and contrast interactive super-resolution assisted infrared UAV target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5006722. [Google Scholar] [CrossRef]
Liu, M.; Li, Y.; Hu, J.; Wu, X.; Deng, S.; Li, H. A New Hybrid Model Based on SCINet and LSTM for Short-Term Power Load Forecasting. Energies 2024, 17, 95. [Google Scholar] [CrossRef]
Zhang, X.; Ge, Z. Automatic deep extraction of robust dynamic features for industrial big data modeling and soft sensor application. IEEE Trans. Ind. Inform. 2019, 16, 4456–4467. [Google Scholar] [CrossRef]
Ren, L.; Wang, T.; Lai, L.; Zhang, L. A data-driven self-supervised lstm-deepfm model for industrial soft sensor. IEEE Trans. Ind. Inform. 2021, 18, 5859–5869. [Google Scholar] [CrossRef]
Lian, P.; Liu, H.; Wang, X.; Guo, R. Soft sensor based on dbn-ipso-svr approach for rotor thermal deformation prediction of rotary air-preheater. Measurement 2020, 165, 108109. [Google Scholar] [CrossRef]
Wang, X.; Hu, W.; Li, K.; Song, L.; Song, L. Modeling of soft sensor based on dbn-elm and its application in measurement of nutrient solution composition for soilless culture. In Proceedings of the IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 10–12 December 2018; pp. 93–97. [Google Scholar]
Fan, W.; Si, F.; Ren, S.; Yu, C.; Cui, Y.; Wang, P. Integration of continuous restricted boltzmann machine and svr in nox emissions prediction of a tangential firing boiler. Chemom. Intell. Lab. Syst. 2019, 195, 103870. [Google Scholar] [CrossRef]
Li, Z.; Jin, H.; Dong, S.; Qian, B.; Yang, B.; Chen, X. Semi-supervised ensemble support vector regression based soft sensor for key quality variable estimation of nonlinear industrial processes with limited labeled data. Chem. Eng. Res. Des. 2022, 179, 510–526. [Google Scholar] [CrossRef]
Tang, Y.; Wang, Y.; Liu, C.; Yuan, X.; Wang, K.; Yang, C. Semi-supervised lstm with historical feature fusion attention for temporal sequence dynamic model- ing in industrial processes. Eng. Appl. Artif. Intell. 2023, 117, 105547. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Fortuna, L.; Graziani, S.; Xibilia, M. Soft sensors for product quality monitoring in debutanizer distillation columns. Control. Eng. Pract. 2005, 13, 499–508. [Google Scholar] [CrossRef]
Fortuna, L.; Rizzo, A.; Sinatra, M.; Xibilia, M. Soft analyzers for a sulfur recovery unit. Control. Eng. Pract. 2003, 11, 1491–15003. [Google Scholar] [CrossRef]
Gao, S.; Xu, J.; Ma, Z.; Tian, R.; Dang, X.; Dong, X. Research on modeling of industrial soft sensor based on ensemble learning. IEEE Sens. J. 2024, 24, 14380–14391. [Google Scholar] [CrossRef]
Rumelhart, D.; Hinton, G.; Williams, R.J. Learning Internal Representations by Error Propagation; Cambridge MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
Yuan, X.; Wang, Y.; Yang, C.; Gui, W. Stacked isomorphic autoencoder based soft analyzer and its application to sulfur recovery unit. Inf. Sci. 2020, 534, 72–84. [Google Scholar] [CrossRef]

Figure 1. The network structure of SCINet, (a) SCINet and (b) SCI-Block.

Figure 2. The structure of the dynamic feature extraction-based soft sensor model.

Figure 3. Dynamic Feature Extraction and Transfer to XGBoost.

Figure 4. SSCI-XGBoost modeling flow chart.

Figure 5. The schematic diagram of the debutanizer.

Figure 6. Prediction curve of the model on the test set, (a) ANN, (b) SLSTM, (c) LSTM, (d) SCINet, (e) XGBoost, and (f) SCI-XGBoost.

Figure 7. Box of absolute prediction errors for different Models.

Figure 8. Features importance of XGBoost.

Figure 9. The flowchart of the sulfur recovery unit.

Figure 10. Training set organization when missing partial quality samples.

Figure 11. RMSE of the test set for the models under different proportion of labeled samples.

Table 1. Descriptions of the variables in the debutanizer column.

Variables	Variable Description	Units
u₁	Top temperature	°C
u₂	Top pressure	kg/cm²
u₃	Reflux flow	m³/h
u₄	Flow to next process	m³/h
u₅	6th tray temperature	°C
u₆	Bottom temperature A	°C
u₇	Bottom temperature B	°C
y	Butane concentration	%

Table 2. Hyperparameters of the SCI-XGBoost model.

	Hyperparameter	Value
Dynamic Feature Extractor	Length of sliding window	16
	Maximum depth of subsample	3
	Rxpansion of hidden layer	3
	Dize of the convolution kernel	3
	Initial learning rate	0.01
	Maximum number of iterations	100
	Optimizer	Adam
	Sctivation function	Tanh
XGBoost	Number of ensemble trees	100
	Maximum depth of the tree	2
	Learning rate	0.08

Table 3. Evaluation metric of the model on the test set.

Model	MAE	RMSE	R²
ANN	0.0652	0.0815	0.8381
SLSTM	0.0891	0.1179	0.6619
LSTM	0.0409	0.0474	0.9451
XGBoost	0.0335	0.0413	0.9583
SCINet	0.0315	0.0393	0.9623
SCI-XGBoost	0.0295	0.0367	0.9672

Table 4. Variable description of the sulfur recovery unit.

Variables	Variable Description	Units
u₁	Gas flow MEA__GAS	m³/s
u₂	Air flow AIR__MEA	m³/s
u₃	Secondary air flow AIR__MEA__2	m³/s
u₄	Gas flow in SWS zone	m³/s
u₅	Air flow in SWS zone	m³/s
y₁	Concentration of H₂S	mol/m³
y₂	Concentration of SO₂	mol/m³

Table 5. The hyperparameters of the SSCI-XGBoost model.

	Hyperparameter	Value
Dynamic Feature Extractor	Length of sliding window	16
	Maximum depth of subsample	3
	Expansion of hidden layer	3
	Size of the convolution kernel	3
XGBoost	Number of ensemble trees	200
	Maximum depth of the tree	3
	Learning rate	0.15

Table 6. Performance metrics of the models on the sulfur recovery test set.

Proportion	Model	MAE	RMSE	R²
33%	XGBoost	0.0142	0.0306	0.6983
	SCI-XGBoost	0.0141	0.0291	0.7283
	SS-SAE	0.0188	0.0306	0.6722
	SSCI-XGBoost	0.0137	0.0281	0.7447
50%	XGBoost	0.0135	0.0290	0.7287
	SCI-XGBoost	0.0133	0.0282	0.7428
	SS-SAE	0.0180	0.0289	0.7088
	SSCI-XGBoost	0.0132	0.0274	0.7582
67%	XGBoost	0.0124	0.0231	0.8271
	SCI-XGBoost	0.0123	0.0221	0.8420
	SS-SAE	0.0137	0.0206	0.8521
	SSCI-XGBoost	0.0118	0.0195	0.8774

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Qi, C.; Luo, X.; Deng, S.; Lei, Q. Dynamic Feature Extraction and Semi-Supervised Soft Sensor Model Based on SCINet for Industrial and Transportation Processes. Appl. Syst. Innov. 2025, 8, 73. https://doi.org/10.3390/asi8030073

AMA Style

Wang J, Qi C, Luo X, Deng S, Lei Q. Dynamic Feature Extraction and Semi-Supervised Soft Sensor Model Based on SCINet for Industrial and Transportation Processes. Applied System Innovation. 2025; 8(3):73. https://doi.org/10.3390/asi8030073

Chicago/Turabian Style

Wang, Jun, Changjian Qi, Xing Luo, Shihao Deng, and Qi Lei. 2025. "Dynamic Feature Extraction and Semi-Supervised Soft Sensor Model Based on SCINet for Industrial and Transportation Processes" Applied System Innovation 8, no. 3: 73. https://doi.org/10.3390/asi8030073

APA Style

Wang, J., Qi, C., Luo, X., Deng, S., & Lei, Q. (2025). Dynamic Feature Extraction and Semi-Supervised Soft Sensor Model Based on SCINet for Industrial and Transportation Processes. Applied System Innovation, 8(3), 73. https://doi.org/10.3390/asi8030073

Article Menu

Dynamic Feature Extraction and Semi-Supervised Soft Sensor Model Based on SCINet for Industrial and Transportation Processes

Abstract

1. Introduction

2. SCINet

3. XGBoost Model Analysis

4. Dynamic Feature Based XGBoost Model

4.1. The Structure

4.2. Dynamic Feature Extractor and Feature Transfer

4.3. Semi-Supervised Soft Sensor SSCI-XGBoost

5. Experiment

5.1. Evaluation Index

5.2. Case 1. Debutane Column Process

5.3. Case 2. Sulfur Recovery Unit

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI