Enhanced Soft Sensor with Qualified Augmented Samples for Quality Prediction of the Polyethylene Process

Data-driven soft sensors have increasingly been applied for the quality measurement of industrial polymerization processes in recent years. However, owing to the costly assay process, the limited labeled data available still pose significant obstacles to the construction of accurate models. In this study, a novel soft sensor named the selective Wasserstein generative adversarial network, with gradient penalty-based support vector regression (SWGAN-SVR), is proposed to enhance quality prediction with limited training samples. Specifically, the Wasserstein generative adversarial network with gradient penalty (WGAN-GP) is employed to capture the distribution of the available limited labeled data and to generate virtual candidates. Subsequently, an effective data-selection strategy is developed to alleviate the problem of varied-quality samples caused by the unstable training of the WGAN-GP. The selection strategy includes two parts: the centroid metric criterion and the statistical characteristic criterion. An SVR model is constructed based on the qualified augmented training data to evaluate the prediction performance. The superiority of SWGAN-SVR is demonstrated, using a numerical example and an industrial polyethylene process.

The reliability of training data for the efficient development of data-driven soft sensors is a key aspect [22,23]. However, only limited labeled samples are obtained in many polyethylene processes, which is a phenomenon that has received less attention than it deserves. For example, in the case of frequent changes in operating conditions, manual operations result in large measurement intervals and long settling times. Consequently, the acquisition of sufficient training samples is intractable [24][25][26]. With limited available training data, it is difficult to capture the process characteristics and model the relationship between product quality and operating conditions. Hence, the development of a soft sensor model with insufficient data requires further investigation.
The virtual sample generation (VSG) technique is effective in handling the problem of the insufficient construction of soft sensors with limited training data [27][28][29][30]. Several during the early stages of new working conditions [5]. GANs, which are unsu generative models, have been adopted to generate virtual samples for data augme The connection and main differences between traditional supervised soft sensor and data-augmentation-based candidates are illustrated in Figure 1  Unfortunately, owing to the unstable training of GANs [40][41][42][43], unsuitable are inevitably generated. It is worth noting that the distributions of unsuitable sam not properly match the distribution of the real data. If these unsuitable sam merged with the original data for establishing the model, the prediction of the so may be degraded. In this study, a data selection strategy is proposed to imp quality of the generated samples. It is expected that a more reliable soft sensor m be obtained by introducing these newly qualified virtual samples into the trainin

WGAN-GP Data Augmentation Approach
GANs have recently attracted significant attention owing to their good distr learning capabilities. The vanilla GAN uses the Jensen-Shannon (JS) diverg measure the distance between the generated and original data. However, this ofte problems, such as mode collapse and vanishing gradients [44]. To address these p Arjovsky et al. proposed the WGAN [44], which uses the Earth-Mover distanc than JS divergence as a distance measurement. In the WGAN, to enforce the L constraint, the weights of the discriminator are clipped to lie within a compact s r], where r is a constant. The discriminator attempts to distinguish between the generated samples and concentrates its parameter distribution on the two extrem maximum and minimum; that is, r and -r. Consequently, a WGAN often becom in a poor regime and fails to learn.
To solve the problem caused by the weight clipping of the WGAN, Gulraj proposed a WGAN with a gradient penalty (WGAN-GP) [45]. Specifically, a Unfortunately, owing to the unstable training of GANs [40][41][42][43], unsuitable samples are inevitably generated. It is worth noting that the distributions of unsuitable samples do not properly match the distribution of the real data. If these unsuitable samples are merged with the original data for establishing the model, the prediction of the soft sensor may be degraded. In this study, a data selection strategy is proposed to improve the quality of the generated samples. It is expected that a more reliable soft sensor model can be obtained by introducing these newly qualified virtual samples into the training data.

WGAN-GP Data Augmentation Approach
GANs have recently attracted significant attention owing to their good distributionlearning capabilities. The vanilla GAN uses the Jensen-Shannon (JS) divergence to measure the distance between the generated and original data. However, this often causes problems, such as mode collapse and vanishing gradients [44]. To address these problems, Arjovsky et al. proposed the WGAN [44], which uses the Earth-Mover distance rather than JS divergence as a distance measurement. In the WGAN, to enforce the Lipschitz constraint, the weights of the discriminator are clipped to lie within a compact space [−r, r], where r is a constant. The discriminator attempts to distinguish between the real and generated samples and concentrates its parameter distribution on the two extremes of the maximum and minimum; that is, r and −r. Consequently, a WGAN often becomes stuck in a poor regime and fails to learn.
To solve the problem caused by the weight clipping of the WGAN, Gulrajani et al. proposed a WGAN with a gradient penalty (WGAN-GP) [45]. Specifically, a penalty constraint is imposed on the gradient norm of the discriminator. The weight of the discriminator is reduced to an extremely small range using the gradient penalty strategy, which accelerates the model convergence and solves the gradient explosion problem. The objective function of the WGAN-GP is as follows: where λ is the penalty coefficient,ẑ is sampled through random interpolation on the connecting line of the original data z O and generated data z G ; that is,ẑ = θz O + (1 − θ)z G , and θ is a random number in [0, 1].

Virtual Sample Selection Strategy
Owing to the unstable training of the WGAN-GP, the quality of the generated virtual samples varies significantly. A data selection strategy is proposed for sample filtering, to eliminate the negative effects of unqualified virtual samples that are generated by the WGAN-GP for model construction. The selection strategy includes a centroid metric criterion, which is denoted as S1, and a statistical characteristic criterion, which is denoted as S2. The distribution scatters of the original and rough virtual samples are plotted in Figure 2. The distribution of most virtual samples conforms to the real data distribution. However, the WGAN-GP also generates samples that are located in regions A and B, which are far from the distribution of the original data. If the generated samples in regions A and B, which are regarded as unqualified, are added to the original set, the prediction performance of the model may deteriorate. A detailed description of the proposed selection strategy is provided below. discriminator is reduced to an extremely small range using the gradient pe which accelerates the model convergence and solves the gradient explosion objective function of the WGAN-GP is as follows: where λ is the penalty coefficient, ẑ is sampled through random interp connecting line of the original data O z and generated data G z ; that is, ˆ = z , and θ is a random number in [0, 1].

Virtual Sample Selection Strategy
Owing to the unstable training of the WGAN-GP, the quality of the ge samples varies significantly. A data selection strategy is proposed for sam eliminate the negative effects of unqualified virtual samples that are gen WGAN-GP for model construction. The selection strategy includes a c criterion, which is denoted as S1, and a statistical characteristic criterion, wh as S2. The distribution scatters of the original and rough virtual samples Figure 2. The distribution of most virtual samples conforms to the real dat However, the WGAN-GP also generates samples that are located in reg which are far from the distribution of the original data. If the generated sam A and B, which are regarded as unqualified, are added to the original set, performance of the model may deteriorate. A detailed description of selection strategy is provided below. First, the S1 criterion is developed to filter the virtual samples in region A. The samples in region A are too close to the centroid z C of the original samples. Furthermore, the distribution of these samples is not uniform compared to that of the original sample. Thus, the virtual samples around the centroid are considered to be information-poor and unqualified. The centroid z C is defined as the closest point in space to the original data, as follows: where µ OX and µ OY are the process and target variables of z C , respectively. The Euclidean distance is commonly used to measure the distance between two samples. A large distance indicates that the samples are far from one another. The square of the distance between z C and a finite number of original samples is formulated as follows: where z Oi is the i th original sample and z Oi = (x Oi , y Oi ), and z r is any point in space.
Similarly, the square of the distance between the j th generated sample and the original samples is calculated as follows: where z Gj = (x Gj , y Gj ) is the j th generated sample.
According to the definitions of d j and d C , d j ≥ d C . A smaller d j means that z Gj is closer to z C , indicating a more dissimilar distribution of z Gj to the original samples. A sample in region A satisfies d j < ρd C , where ρ ≥ 1 is a parameter. Therefore, the qualified samples, based on the S1 criterion, are defined as: Subsequently, the S2 criterion is adopted to filter the unsuitable virtual samples in area B. The samples in region B are far away from the distribution of the original data and tend to be outliers. The samples can be screened according to the statistical characteristics of the original samples. Based on the probability density function p(x) for each normal operating data point of the initial samples, the 100β% confidence bound can be defined as the likelihood threshold h that satisfies the following formula: where p(x) is a multivariate Gaussian distribution and the above confidence bounds can be found in a previous paper [46]. In particular, when the generated sample x Gj satisfies the following formula, it is considered as an outlier, as follows: where C −1 OX is the covariance of the input data of the original samples and χ 2 q (β) is the β-fractile of the Chi-square distribution, with a degree of freedom, q.
In summary, according to the aforementioned two-stage data selection strategy, k-qualified samples are selected from the rough generated data and are denoted as x Sj , y Sj j=1,...,k . This data selection strategy makes the selected virtual samples more homogeneous, in agreement with the original data distribution.

SWGAN-SVR Soft Sensor Model
In this case, SVR is adopted as the base soft-sensor model for nonlinear processes. SVR is a statistical learning method that uses the structural risk-minimization criterion instead of the empirical risk-minimization criterion for model construction [15]. The target function of the SVR is as follows [15]: where b is the bias, w is the weight vector, ξ i and ξ * i are slack variables, γ is a regularization parameter that controls the penalty for samples exceeding the fitting error, φ is a nonlinear kernel function, ε is an insensitivity coefficient, and n is the number of samples for the SVR model.
The constrained optimization can be solved using the Lagrange function by introducing Lagrange multipliers. Subsequently, Equation (8) is converted into a dual problem, as'follows: where α i and α * i are the Lagrange multipliers and K(·, ·) represents a kernel function. In this study, the radial basis function (RBF) is adopted: where ψ > 0 is a controlling parameter for the RBF kernel width. Therefore, the SVR model can be described as A flowchart of the SWGAN-SVR model is presented in Figure 3. It is difficult to develop a reliable SVR soft sensor for the initial limited training data, {x Oi , y Oi } i=1,...,M . In such a situation, the WGAN-GP is adopted for data augmentation and N virtual samples are generated, which is denoted as x Gj , y Gj j=1,...,N . Furthermore, considering the unstable training process of the WGAN-GP, unsuitable virtual samples are generated, which need to be screened out from the group of rough virtual samples. Consequently, after employing the proposed two-stage data selection strategy; that is, the centroid metric criterion S1 and statistical characteristic criterion S2, k-qualified samples x Sj , y Sj j=1,...,k are obtained. By combining qualified virtual samples with the initial limited training samples, a new augmented training sample set is obtained, which can be denoted as . Subsequently, an SVR soft sensor is constructed for quality prediction. Note that other supervised soft-sensor modeling methods, such as partial least squares regression and Gaussian process regression, can also replace SVR in this framework.

Results and Discussion
A numerical example and an industrial polyethylene process were adopted to validate the effectiveness of the proposed SWGAN-SVR modeling method. The commonly used root-mean-square error (RMSE), coefficient of determination ( 2 R ), and mean absolute error (MAE) indices were used for the performance evaluation and are expressed as follows: where t y and ˆt y are the quality measurement and prediction values of the t th observation, respectively, and m is the sample size.

Numerical Example
A numerical example with a two-dimensional input and one-dimensional output was constructed to simulate the process of insufficient initial training samples: where 1 x and 2 x are two state variables that are constructed using the variable u , y is an output variable, and e is Gaussian noise with a zero mean and a variance of 0.01.
In this study, 100 samples were collected. To build the soft-sensor model, 50 samples were randomly selected as the training data and 50 samples were used for testing. In such

Results and Discussion
A numerical example and an industrial polyethylene process were adopted to validate the effectiveness of the proposed SWGAN-SVR modeling method. The commonly used root-mean-square error (RMSE), coefficient of determination (R 2 ), and mean absolute error (MAE) indices were used for the performance evaluation and are expressed as follows: where y t andŷ t are the quality measurement and prediction values of the t th observation, respectively, and m is the sample size.

Numerical Example
A numerical example with a two-dimensional input and one-dimensional output was constructed to simulate the process of insufficient initial training samples: x 1 = 3u 2 + 4u, u = −10, −9.8, −9.6, . . . , 10 x 2 = 8u + 2 cos(πu/3), u = −10, −9.8, −9.6, . . . , 10 where x 1 and x 2 are two state variables that are constructed using the variable u, y is an output variable, and e is Gaussian noise with a zero mean and a variance of 0.01.
In this study, 100 samples were collected. To build the soft-sensor model, 50 samples were randomly selected as the training data and 50 samples were used for testing. In such a situation, using only limited training samples to train an SVR soft sensor may be insufficient. Therefore, it is essential to generate virtual samples to increase the data capacity and enrich the data diversity.
First, we investigated the number of generated samples that were sufficient for this example, using a 10-fold cross-validation algorithm. Specifically, a new training set containing both the original samples and generated virtual samples was divided into 10 non-overlapping subsets. Subsequently, based on the i th subset, which was regarded as a temporary test set, and extra subsets other than the i th subset, which was regarded as a temporary training set, an SVR model was constructed. Each subset was used as a temporary test set, in turn. Consequently, the total prediction result for a certain number of generated samples was obtained across 10 trials. The RMSE results for different numbers of generated samples are depicted in Figure 4. As the number of virtual samples increased, the RMSE value first decreased and then increased. This is mainly because the generated virtual samples filled the information gap in the initial training stage, which improved the model prediction accuracy. When the size of the virtual samples was sufficiently large, the influence of the initial samples was weakened, and more significant differences occurred between the initial and virtual samples. Therefore, as illustrated in Figure 4, the appropriate number of virtual samples for this example was 450.
a situation, using only limited training samples to train an SVR soft s insufficient. Therefore, it is essential to generate virtual samples to inc capacity and enrich the data diversity.
First, we investigated the number of generated samples that were su example, using a 10-fold cross-validation algorithm. Specifically, a ne containing both the original samples and generated virtual samples was d non-overlapping subsets. Subsequently, based on the i th subset, which was temporary test set, and extra subsets other than the i th subset, which was temporary training set, an SVR model was constructed. Each subset w temporary test set, in turn. Consequently, the total prediction result for a c of generated samples was obtained across 10 trials. The RMSE result numbers of generated samples are depicted in Figure 4. As the number of v increased, the RMSE value first decreased and then increased. This is main generated virtual samples filled the information gap in the initial trainin improved the model prediction accuracy. When the size of the virtua sufficiently large, the influence of the initial samples was weakened, and m differences occurred between the initial and virtual samples. Therefore, a Figure 4, the appropriate number of virtual samples for this example was 4 The scatter distributions of the original samples and the 450 generate presented in Figure 5a. Several unsuitable samples did not conform to distribution. According to the proposed S1 and S2 data selection crite distribution of the qualified virtual samples, rough virtual samples, and samples are shown in Figure 5b. Unsuitable samples that were too close and distant outliers were filtered. Consequently, the qualified virtual sam the distribution of the original samples. When combined with the origina the qualified virtual samples served as complements to the initial samples SVR model was built, based on the qualified augmented training samples; results for the test set are listed in Table 1. For comparison, the prediction WGAN-SVR, and WGAN-SVR using the S1 criterion (denoted as WGAN WGAN-SVR using the S2 criterion (denoted as WGAN-SVR(S2)), are also 1. WGAN-SVR, WGAN-SVR(S1), WGAN-SVR(S2), and SWGAN-SVR out The scatter distributions of the original samples and the 450 generated samples are presented in Figure 5a. Several unsuitable samples did not conform to the initial data distribution. According to the proposed S1 and S2 data selection criteria, the scatter distribution of the qualified virtual samples, rough virtual samples, and initial limited samples are shown in Figure 5b. Unsuitable samples that were too close to the centroid and distant outliers were filtered. Consequently, the qualified virtual samples matched the distribution of the original samples. When combined with the original training data, the qualified virtual samples served as complements to the initial samples. The SWGAN-SVR model was built, based on the qualified augmented training samples; the prediction results for the test set are listed in Table 1. For comparison, the prediction results of SVR, WGAN-SVR, and WGAN-SVR using the S1 criterion (denoted as WGAN-SVR(S1)), and WGAN-SVR using the S2 criterion (denoted as WGAN-SVR(S2)), are also listed in Table 1. WGAN-SVR, WGAN-SVR(S1), WGAN-SVR(S2), and SWGAN-SVR outperformed the SVR method, with smaller RMSE and MAE values and larger R 2 values. This is mainly because the generated samples increased the diversity of the training samples. The prediction performances of WGAN-SVR(S1) and WGAN-SVR(S2) were further enhanced, compared to the results of WGAN-SVR. By adopting only one data selection criterion, unsuitable virtual samples around the centroid or far-away outliers were screened out, which improved the quality of the augmented samples. This also demonstrates that unsuitable virtual samples result in the insufficient construction of reliable soft sensors. Furthermore, after simultaneously adopting the S1 and S2 criteria, SWGAN-SVR achieved the best prediction performance among the five methods. This indicates that a two-stage data selection strategy is beneficial for selecting qualified augmented samples and improving the performance of the base SVR soft sensor.
Polymers 2022, 14, x FOR PEER REVIEW 9 of 14 unsuitable virtual samples result in the insufficient construction of reliable soft sensors. Furthermore, after simultaneously adopting the S1 and S2 criteria, SWGAN-SVR achieved the best prediction performance among the five methods. This indicates that a two-stage data selection strategy is beneficial for selecting qualified augmented samples and improving the performance of the base SVR soft sensor.  For a better illustration, the detailed prediction results and relative prediction errors of the five soft sensors on the test set are presented in Figures 6a,b, respectively. As shown in Figure 6a, SWGAN-SVR tracked the real trajectory better than the other four soft sensors, and the prediction curve of SWGAN-SVR was the one that was most consistent with the real curve. As illustrated in Figure 6b, the prediction errors of the proposed SWGAN-SVR were much smaller for the entire test set, and the errors were mostly around zero. A boxplot of the absolute prediction error values for the five methods is shown in Figure 7. SWGAN-SVR had a narrower error range, which was closer to zero, than the other four methods. Furthermore, as demonstrated through a comparison of the red lines in the boxes, the median value of the absolute error was smaller than that of the other four methods, indicating a better prediction performance for SWGAN-SVR.  For a better illustration, the detailed prediction results and relative prediction errors of the five soft sensors on the test set are presented in Figure 6a,b, respectively. As shown in Figure 6a, SWGAN-SVR tracked the real trajectory better than the other four soft sensors, and the prediction curve of SWGAN-SVR was the one that was most consistent with the real curve. As illustrated in Figure 6b, the prediction errors of the proposed SWGAN-SVR were much smaller for the entire test set, and the errors were mostly around zero. A boxplot of the absolute prediction error values for the five methods is shown in Figure 7. SWGAN-SVR had a narrower error range, which was closer to zero, than the other four methods. Furthermore, as demonstrated through a comparison of the red lines in the boxes, the median value of the absolute error was smaller than that of the other four methods, indicating a better prediction performance for SWGAN-SVR.

Industrial Polyethylene Process
An industrial polyethylene process [4] was utilized to verify the necessity and superiority of the proposed method for practical applications. The product of the polyethylene manufacturing process was sampled once daily from the laboratory. Hence, in the initial stage of a new product grade, the collected quality variables (that is, the melt index (MI)) are insufficient for the development of a reliable soft sensor. After using a simple 3-sigma criterion to remove outliers, 60 samples were investigated. The dataset was partitioned into two parts: 30 randomly selected samples were used as the training data and the remaining 30 samples were used for testing.
Using a 10-fold cross-validation method, a suitable number of virtual samples was first determined for this example. The complete RMSE indices for different numbers of virtual samples are presented in Figure 8. The RMSE value was smallest when the number of generated samples was 150. Hence, 150 virtual samples were generated as an appropriate supplement to the initial limited samples. The proposed data selection strategy was adopted to improve the quality of the generated virtual samples. Subsequently, the proposed SWGAN-SVR model was built, based on the qualified augmented samples. Furthermore, SVR, WGAN-SVR, WGAN-SVR(S1), and WGAN-SVR(S2) were built to predict the MI value. The details of the prediction performance of the five methods on the test set are listed in Table 2. According to the prediction results, the SVR method achieved the largest RMSE value and smallest R 2 value, indicating the worst prediction accuracy among the five methods. This occurred because the initial training data were insufficient for the construction of reliable soft sensors. With this data augmentation strategy, the WGAN-SVR, WGAN-SVR(S1), WGAN-SVR(S2), and SWGAN-SVR methods can improve the prediction accuracy, compared to the SVR

Industrial Polyethylene Process
An industrial polyethylene process [4] was utilized to verify the necessity and superiority of the proposed method for practical applications. The product of the polyethylene manufacturing process was sampled once daily from the laboratory. Hence, in the initial stage of a new product grade, the collected quality variables (that is, the melt index (MI)) are insufficient for the development of a reliable soft sensor. After using a simple 3-sigma criterion to remove outliers, 60 samples were investigated. The dataset was partitioned into two parts: 30 randomly selected samples were used as the training data and the remaining 30 samples were used for testing.
Using a 10-fold cross-validation method, a suitable number of virtual samples was first determined for this example. The complete RMSE indices for different numbers of virtual samples are presented in Figure 8. The RMSE value was smallest when the number of generated samples was 150. Hence, 150 virtual samples were generated as an appropriate supplement to the initial limited samples. The proposed data selection strategy was adopted to improve the quality of the generated virtual samples. Subsequently, the proposed SWGAN-SVR model was built, based on the qualified augmented samples. Furthermore, SVR, WGAN-SVR, WGAN-SVR(S1), and WGAN-SVR(S2) were built to predict the MI value. The details of the prediction performance of the five methods on the test set are listed in Table 2. According to the prediction results, the SVR method achieved the largest RMSE value and smallest R 2 value, indicating the worst prediction accuracy among the five methods. This occurred because the initial training data were insufficient for the construction of reliable soft sensors. With this data augmentation strategy, the WGAN-SVR, WGAN-SVR(S1), WGAN-SVR(S2), and SWGAN-SVR methods can improve the prediction accuracy, compared to the SVR approach. The generated virtual samples fill the information gap in the initial data and increase the sample capacity. Moreover, by adopting the two-stage data selection criteria, the SWGAN-SVR method achieved the best prediction performance among the five methods. The SWGAN-SVR method attempts to select the qualified virtual samples and, subsequently, to improve the quantity and quality of the initial training data. Note that

Industrial Polyethylene Process
An industrial polyethylene process [4] was utilized to verify the necessity and superiority of the proposed method for practical applications. The product of the polyethylene manufacturing process was sampled once daily from the laboratory. Hence, in the initial stage of a new product grade, the collected quality variables (that is, the melt index (MI)) are insufficient for the development of a reliable soft sensor. After using a simple 3-sigma criterion to remove outliers, 60 samples were investigated. The dataset was partitioned into two parts: 30 randomly selected samples were used as the training data and the remaining 30 samples were used for testing.
Using a 10-fold cross-validation method, a suitable number of virtual samples was first determined for this example. The complete RMSE indices for different numbers of virtual samples are presented in Figure 8. The RMSE value was smallest when the number of generated samples was 150. Hence, 150 virtual samples were generated as an appropriate supplement to the initial limited samples. The proposed data selection strategy was adopted to improve the quality of the generated virtual samples. Subsequently, the proposed SWGAN-SVR model was built, based on the qualified augmented samples. Furthermore, SVR, WGAN-SVR, WGAN-SVR(S1), and WGAN-SVR(S2) were built to predict the MI value. The details of the prediction performance of the five methods on the test set are listed in Table 2. According to the prediction results, the SVR method achieved the largest RMSE value and smallest R 2 value, indicating the worst prediction accuracy among the five methods. This occurred because the initial training data were insufficient for the construction of reliable soft sensors. With this data augmentation strategy, the WGAN-SVR, WGAN-SVR(S1), WGAN-SVR(S2), and SWGAN-SVR methods can improve the prediction accuracy, compared to the SVR approach. The generated virtual samples fill the information gap in the initial data and increase the sample capacity. Moreover, by adopting the two-stage data selection criteria, the SWGAN-SVR method achieved the best prediction performance among the five methods. The SWGAN-SVR method attempts to select the qualified virtual samples and, subsequently, to improve the quantity and quality of the initial training data. Note that owing to the strong nonlinearity of this example, the R 2 index was relatively smaller than that of the numerical example described in Section 4.1.  The scatter distribution of the rough generated samples and selected unsuitable samples are presented in Figure 9. Virtual samples close to the centroid and the distant outliers were filtered. The remaining qualified samples matched well with the distribution of the original samples. Moreover, the diversity of the original samples increased with the incorporation of the qualified samples.   The scatter distribution of the rough generated samples and selected unsuitable samples are presented in Figure 9. Virtual samples close to the centroid and the distant outliers were filtered. The remaining qualified samples matched well with the distribution of the original samples. Moreover, the diversity of the original samples increased with the incorporation of the qualified samples.  The scatter distribution of the rough generated samples and selected unsuitable samples are presented in Figure 9. Virtual samples close to the centroid and the distant outliers were filtered. The remaining qualified samples matched well with the distribution of the original samples. Moreover, the diversity of the original samples increased with the incorporation of the qualified samples.  The detailed prediction results of the five soft sensors on the test set are depicted in Figure 10. The proposed SWGAN-SVR method was superior to the other four methods in terms of tracking the real trend of the output variable. The prediction of SWAGN-SVR was in good agreement with the actual trajectory of the MI value, and, thus, exhibited a much smaller deviation. The relative prediction errors of the five methods are shown in Figure 11. The SWGAN-SVR method achieved the best prediction performance and yielded the smallest prediction error at most sampling points. Consequently, the obtained results indicate that the proposed SWGAN-SVR soft sensor can enhance prediction performance when dealing with insufficient training samples. The detailed prediction results of the five soft sensors on the test set are depicted in Figure 10. The proposed SWGAN-SVR method was superior to the other four methods in terms of tracking the real trend of the output variable. The prediction of SWAGN-SVR was in good agreement with the actual trajectory of the MI value, and, thus, exhibited a much smaller deviation. The relative prediction errors of the five methods are shown in Figure 11. The SWGAN-SVR method achieved the best prediction performance and yielded the smallest prediction error at most sampling points. Consequently, the obtained results indicate that the proposed SWGAN-SVR soft sensor can enhance prediction performance when dealing with insufficient training samples.   The detailed prediction results of the five soft sensors on the test set are depicted in Figure 10. The proposed SWGAN-SVR method was superior to the other four methods in terms of tracking the real trend of the output variable. The prediction of SWAGN-SVR was in good agreement with the actual trajectory of the MI value, and, thus, exhibited a much smaller deviation. The relative prediction errors of the five methods are shown in Figure 11. The SWGAN-SVR method achieved the best prediction performance and yielded the smallest prediction error at most sampling points. Consequently, the obtained results indicate that the proposed SWGAN-SVR soft sensor can enhance prediction performance when dealing with insufficient training samples.  . Relative prediction errors for the polyethylene process. Figure 11. Relative prediction errors for the polyethylene process.

Conclusions
In this study, a reliable soft sensor framework is developed to enhance prediction performance by introducing augmented data. Because having limited training data will be insufficient for establishing a reliable soft sensor, rough virtual samples are generated using the WGAN-GP method to enrich the sample information. Subsequently, based on a two-stage data selection strategy, qualified augmented samples are gradually selected to eliminate the negative effects of unsuitable samples on the prediction performance. Based on the qualified augmented training samples, the SWGAN-SVR method is designed to capture the process characteristics, which is beneficial for regression. The prediction results for the two examples demonstrate the advantages of the proposed approach. Further investigations will aim to enhance the quality of the generated samples, using GANs. Additionally, the combination of the process characteristics to generate more informative samples for practical applications is an interesting topic.