4.1. Experimental Settings and BDA Parameters Configuration
In order to effectively verify the proposed method and facilitate comparison, two experimental settings are given according to [
20].
- Setting 1:
Set batch 1 (source domain) as a fixed training set and tested on batch K, K = 2,…, 10 (target domains);
- Setting 2:
The training set (source domain) is dynamically changed with batch K-1 and tested on batch K (target domain), K = 2,…, 10.
The number of measurements and classes of test data contained in each batch are summarized in
Table 1.
In this study, the accuracy rate of the sample classification of the target domain was used as the basic evaluation criterion of the algorithm effect, and the specific calculation was as follows:
where
f(
x) is the true label of the test sample
x, and
y(
x) is the predicted label of the sample
x.
Five parameters were included in the optimization model of the drift compensation algorithm in this study. Namely, the weight balance factor μ, the regularization parameter λ, the subspace bases d, the number of iterations T, and the gamma parameter γ.
λ is a regularization parameter to ensure that the optimization problem is well-defined. In theory, λ controls the complexity of the classification model. When λ → 0, the classification model degenerates and leads to serious overfitting problems. When λ → ∞, the classification model is too simple to fully fit the discriminative structure of the data.
The subspace bases d represents the size used to construct the transformation matrix A. d is usually chosen as the dimension that makes the subspace sufficiently accurate. The value of d cannot exceed the number of features.
For the number of iterations T, the BDA method continuously improves the classification accuracy by iteratively updating the pseudo-labels.
The gamma parameter defines the reciprocal of the standard deviation of the RBF kernel. For nonlinear problems, we can use kernel mapping and kernel matrix. Two types of kernel functions, linear kernel and RBF kernel , were used in this study.
The parameters
d,
λ, and
γ obtain the best values through deviation optimization. The setting of the number of iterations
T refers to [
22]. Finally, the set of parameter settings was:
d = 100,
λ = 1,
γ = 1, and
T = 10, respectively.
In order to analyze the impact of
μ on the performance of the BDA method, first, the factor
μ can be simply regarded as a parameter in the transfer process. In the interval [0, 1], the value of
μ is taken from 0, and every time 0.1 increases, a set [0, 0.1,…, 1.0] is obtained. Then, an impression of the influence of the value of the weight balance factor
μ on the result can be obtained, and the RBF kernel function is applied. The experimental results are shown in
Table 2 and
Figure 6.
Obviously, the optimal value of weight balance factor μ varied in different data batches. This showed the importance of marginal and conditional distribution differences across domains. In each batch of data used in the test, the μ value corresponding to the best accuracy had no obvious law. Some batches dominated the conditional distribution (for example, Batch 4), and some batches tended to have a certain ratio of the joint distribution of the two (for example, Batch 8).
In order to get the best results, the PSO algorithm was combined to search for the best µ of each set of data in a wide range. The parameters in the PSO algorithm were set as follows: the population size was 20; the maximum number of iterations was 10; the weight balance factor optimization limit range was 0 to 1, and the two values of 0 and 1 were considered in advance; the speed limit range was −0.3 to 0.3; and the inertia weight was 0.8.
4.2. Performance Verification
The experiment followed Setting 1 and Setting 2 in turn. We implemented the proposed BDA drift compensation method, which included the primal BDA method, the RBF kernel, and the linear kernel.
Table 3 and
Table 4 show the best
μ value and the corresponding best accuracy of each batch of data after the PSO process. The factor
μ retained three decimal places and was compared with the JDA algorithm with the same kernel. In addition, a non-domain adaptive algorithm using NN as the standard classifier was added. The above drift compensation method based on machine learning using the same dataset is given in [
20].
Table 5 shows the accuracy of different methods for 9 batches under experimental Setting 1.
Figure 7 intuitively shows the recognition accuracy of all methods.
First, the overall recognition accuracy of the BDA method was higher than that of the comparison method. The BDA method of the RBF kernel optimized by the POS process had the highest average recognition accuracy of 68.92%. Second, compared with the best comparison method JDA, the recognition accuracy was increased by 4.54%, considering that JDA can only adjust marginal and conditional distributions with equal weights (μ = 0.5). However, BDA can significantly improve accuracy by adjusting the weight balance parameter μ to adapt to different situations. Last, due to the huge distribution gap between the drift data sets, the non-transfer learning method NN only achieved an average recognition accuracy of 56.69%. This showed that the performance of domain adaptation methods was better than that of non-domain adaptation methods. This showed the effectiveness of the transfer learning method, and BDA had the best performance among the three.
According to the results, the gas recognition accuracy of the BDA method for Batches 8 and 10 showed the lowest performances, especially for Batch 8, the accuracy of which was only 38.10%. The same was true for other comparison methods, whose performance was much lower than other batches. After long-term operation of the MOS sensor, due to the deterioration of MOS-sensitive materials, the pollution of the MOS gas sensor unit, and the deterioration of the interface electrical contact, the sensor data for Batches 8, 9, and 10 may have been seriously deteriorated. In order to analyze the validity of the data, we applied principal component analysis (PCA) to this data set, and then projected the data into a 2D subspace based on the first two PCs. As shown in
Figure 8, the data space distribution of Batch 8 had significant changes compared with other batches. Zhang et al. [
14] believed that these changes were caused by drift over time. However, the data space distributions of Batch 9 and 10 were similar to the other batches. Therefore, we believe that the changes in Batch 8 may not have been entirely caused by the drift. Another possible reason was that Batch 8 had a smaller amount of data, and only contained 294 measurement samples. Their combined influence resulted in a huge difference in the distribution for Batch 8.
As shown in
Figure 9, the performance of all methods for Setting 2 was better than for Setting 1. The possible reason was that the drift influences between batches in Setting 2 were smaller than those in Setting 1, which resulted in a small difference in classification and recognition tasks. The best performance was still the RBF kernel BDA method, with an average recognition accuracy of 81.06%. It is worth noting that the optimal factor
μ of the RBF kernel BDA in multiple batches was closer to 1, which showed that the conditional distribution was dominant among them. The drift data set of adjacent batches may be more similar. In Batch 2
→ 3, Batch 4
→ 5, and Batch 8
→ 9, the recognition accuracy reached 97.95%, 96.36%, and 98.28%, respectively. In addition, as in Setting1, the recognition accuracy based on the transfer learning method was still much higher than the non-transfer learning NN method. However, in the last batch, the accuracy dropped drastically. We believe that this phenomenon was related to the data collection work of the MOS sensor array. The last collection of sensor array data occurred five months after the previous time. During this time, the sensors were kept powered off. Due to the lack of normal operating temperature, external contaminants may have adhered to the sensitive material layer of sensors. This process is usually irreversible, and sensors were contaminated. This maximized the difference in data distribution between Batch 9 and Batch 10. Although the data in Batch 10 was severely disturbed due to these reasons, the BDA method we proposed was still better than the comparison method when tested for Setting 2. The detailed values of the recognition accuracy of each batch for Setting 2 are recorded in
Table 6.
Meanwhile, for Setting 1 and Setting 2 of the drift data of the MOS gas sensor array on 10 different batches, the high-precision results obtained with these two different settings also proved the good robustness of the BDA drift compensation method.