Active Learning on Dynamic Clustering for Drift Compensation in an Electronic Nose System

Drift correction is an important concern in Electronic noses (E-nose) for maintaining stable performance during continuous work. A large number of reports have been presented for dealing with E-nose drift through machine-learning approaches in the laboratory. In this study, we aim to counter the drift effect in more challenging situations in which the category information (labels) of the drifted samples is difficult or expensive to obtain. Thus, only a few of the drifted samples can be used for label querying. To solve this problem, we propose an innovative methodology based on Active Learning (AL) that selectively provides sample labels for drift correction. Moreover, we utilize a dynamic clustering process to balance the sample category for label querying. In the experimental section, we set up two E-nose drift scenarios—a long-term and a short-term scenario—to evaluate the performance of the proposed methodology. The results indicate that the proposed methodology is superior to the other state-of-art methods presented. Furthermore, the increasing tendencies of parameter sensitivity and accuracy are analyzed. In addition, the Label Efficiency Index (LEI) is adopted to measure the efficiency and labelling cost of the AL methods. The LEI values indicate that our proposed methodology exhibited better performance than the other presented AL methods in the online drift correction of E-noses.


Introduction
Olfactory perception based on Electronic noses (E-noses) [1] has attracted much attention in research communities in recent years. Although the feasibility of E-noses has been demonstrated with various solutions in a considerable number of cases, the drift problem still hampers its further development. Drift, normally caused by environmental and physicochemical factors, disturbs the compatibility between the gas-sensor responses and the Artificial Intelligence (AI) algorithms in E-nose systems. It is evident that gas-sensor drift would irreversibly mislead the AI models over time. In other words, drift will significantly influence the performance of E-nose systems that make decisions based on AI algorithms.
There are several ways to mitigate the negative effects of gas-sensor drift. The primary option is gas-sensor enhancement, which improves the sensor's repeatability and stability by employing an advanced structure or film composition. Aiming for more universal and cost-effective methodologies, algorithm-based methods are becoming fashionable. Among these, a number of methods are used for decomposing the drift component from responses of the gas-sensor-array. Accordingly, what remains are the corrected responses after drift correction. In practice, the drift component can be decomposed based on statistical characteristics by means of Principal Component Analysis (PCA), Common PCA (CPCA) [2], PCA-based Component Correction (PCA-CC) [3], Independent Component Analysis (ICA) [4,5], and wavelet [6,7]. Additionally, it is possible to identify components unrelated to the drift and short-term drift scenarios. In Section 4, we present and discuss the results achieved using the database. Finally, we summarize our conclusions in Section 5. The framework of the AL process is presented in Figure 1. It is an obvious closed-loop structure, which retrains the "learner" iteratively by the "selected instance" and its "label". The "selected instance" is chosen from the "data pool" full of drifted instances, while the "label" to the "selected instance" is queried from the "experts". The "instance selection strategy" regularly determines the instance near the classification boundary with a certain rule for learner retraining. The "experts" implement manual labelling to provide a label to the selected instance. Finally, the "learner" is renewed with the selected instance and the label for next-round instance selection.

Basic Active Learning Framework
For online drift correction, we often face the problem whereby continuous gas instances are easy to achieve, but their labels are rare. Thus, "selecting a limited number of gas instances-querying associated labels from experts" becomes an effective approach for forming samples for online drift correction. AL is a feasible method for guiding optimal instance selection. It intends to achieve higher learner performance with fewer human annotations.

Basic Instance-Selection Strategy
AL regards manual labelling as the most time-consuming and labor-intensive part. Therefore, instance-selection strategies that explore the most useful instance for learner correction are the key part of the AL paradigm. Considering the benefits of pool-based instance selection, we would like to introduce three pool-based instance-selection strategies (US, QBC, and ER) in the following subsections.

Uncertainty Sampling
The US strategy explores the instance that makes the greatest contribution in terms of classifier updating according to the outputs of the recognition component. Various metrics can be adopted to indicate the uncertainty of the instance based on the recognition outputs. In this study, a popular metric named margin [32] is chosen, as follows: For online drift correction, we often face the problem whereby continuous gas instances are easy to achieve, but their labels are rare. Thus, "selecting a limited number of gas instances-querying associated labels from experts" becomes an effective approach for forming samples for online drift correction. AL is a feasible method for guiding optimal instance selection. It intends to achieve higher learner performance with fewer human annotations.

Basic Instance-Selection Strategy
AL regards manual labelling as the most time-consuming and labor-intensive part. Therefore, instance-selection strategies that explore the most useful instance for learner correction are the key part of the AL paradigm. Considering the benefits of pool-based instance selection, we would like to introduce three pool-based instance-selection strategies (US, QBC, and ER) in the following subsections.

Uncertainty Sampling
The US strategy explores the instance that makes the greatest contribution in terms of classifier updating according to the outputs of the recognition component. Various metrics can be adopted to indicate the uncertainty of the instance based on the recognition outputs. In this study, a popular metric named margin [32] is chosen, as follows: where margin i denotes the margin value of the i-th instance p i , f E (ŷ c1 p i ) and f E (ŷ c2 p i ) are, respectively, the maximum and second maximum posteriori probabilities of p i calculated by a classifier. According to this criterion, smaller margin i means greater uncertainty of p i . Thus, the margin-based US strategy is prone to selecting the instance with the minimum margin from a bunch of instances.

Query by Committee
In contrast to US, QBC uses multiple classifiers to perform instance recognition and evaluation. Typically, one classifier (the chair) holds incoming instance recognition; another two classifiers (members) calculate the valuableness of the instances for human annotation. QBC identifies the instance most in need of annotation based on the disagreement of the two member outputs. Conventionally, Kullback-Leibler distance (KLD) [27] is the metric of disagreement, as follows: where KL i is the KLD of the i-th sample p i . K denotes the total number of committee members. P k (c m p i ) represents the probability that sample p i belongs to Class c m by Member k.

Error Reduction Sampling
ER sampling prefers to select the instances that significantly minimize the generalization errors of learners. The process of ER sampling includes: (1) choose a loss function EP D * to estimate the generalization errors; (2) estimate each instance with the loss function; (3) select an instance for labelling with the largest reduction of loss-function values. In the ER sampling process, the form of the loss function has a great impact on the performance. Both logarithmic loss and 0/1 loss functions are common criteria, as follows: whereP D * (y x) is the posterior probability of Class y by a given sample x. P denotes the renewed data pool and Y is the category label set. In the following sections, we set the default loss function of ER as logarithmic loss.

Adaptive Confidence Rule
The Adaptive Confidence Rule (ACR) is a pool-based approach designed specifically for online drift correction of E-noses [27]. This method assumes that the entire data distribution of the drifted responses/features is continuously moving over time. Consequently, the instances near both classification planes and the category center are equally important for collection of drift information.
In the ACR procedure, instance evaluation is performed according to Equations (2) and (3), as in QBC. The difference between ACR and other AL paradigms lies in the rule of instance selection. ACR tries to record the outputs of the chair before and after classifier updating in each instance-selection process. If the outputs are different, it means the classification plane of the learner is not well trained. Therefore, the instance with the greatest KLD should be selected for learner retraining to distinguish the classification boundary of the drifted instances. In contrast, low KLD means the corresponding instance is near the center area of a certain category. Thus, ACR prefers to choose the instance with the lowest KLD when the two outputs from the chair are equal. This behavior would supply complete drift information in all domains for drift correction. Finally, ACR improves the representativeness of the selected instances under drifted data distribution.

Proposed Active Learning Strategy
Traditional pool-based AL methods prefer to assess instances based on the instance distribution of the data pool. Accordingly, the selected instances may distribute on a local area (e.g., classification boundary of certain category). In other words, drift information of other domains (e.g., drifted instances of other categories) are barely to collect in AL process. Figure 2a demonstrates the common selected-instance distribution of the AL methods. We can see that all the selected instances are concentrated on the overlap between Class 1 and 3, and no instance emerges in the area of Class 2, because the boundary uncertainty of Class 2 is lower than those of Classes 1 and 3. Consequently, Class 2 cannot be corrected when the learner is retrained, which decreases the recognition accuracy of Class 2 dramatically. As shown in Figure 2b, our proposed methodology aims to pick instances considering both category and uncertainty. There are three stages of the proposed methodology AL-DC: initialization, clustering and instance selection.

5
supply complete drift information in all domains for drift correction. Finally, ACR improves the representativeness of the selected instances under drifted data distribution.

Proposed Active Learning Strategy
Traditional pool-based AL methods prefer to assess instances based on the instance distribution of the data pool. Accordingly, the selected instances may distribute on a local area (e.g., classification boundary of certain category). In other words, drift information of other domains (e.g., drifted instances of other categories) are barely to collect in AL process. Figure 2a demonstrates the common selected-instance distribution of the AL methods. We can see that all the selected instances are concentrated on the overlap between Class 1 and 3, and no instance emerges in the area of Class 2, because the boundary uncertainty of Class 2 is lower than those of Classes 1 and 3. Consequently, Class 2 cannot be corrected when the learner is retrained, which decreases the recognition accuracy of Class 2 dramatically. As shown in Figure 2b, our proposed methodology aims to pick instances considering both category and uncertainty. There are three stages of the proposed methodology AL-DC: initialization, clustering and instance selection.
where t i x and t i c represent the i-th instance of the data pool and associated cluster, respectively.
Afterwards, we update t k m by: Active Learning on Dynamic Clustering

Initialization
Suppose a learner has initially been established by the training set with all kinds of samples. Then, AL-DC computes the initial means of all the samples in the training set by category: where x s i represents the i-th samples belong to Category c k . N k and m s k are the sample number and mean value of Category c k , respectively.

Clustering
We define clustering mean m t k and let m t k = m s k . Then, we perform clustering process as follows: where x t i and c t i represent the i-th instance of the data pool and associated cluster, respectively. Afterwards, we update m t k by: where N k is the number of instances belonging to Cluster k in the data pool. Then, we perform (7) and (8) iteratively until each m t k has converged. Through the above steps in this stage, several sample clusters are obtained. We assume the samples in the same cluster have similar location and category. Meanwhile, the number of clusters is consciously equal to the number of categories, which ensures a distribution of clusters close to that of categories.

Instance Selection
We define a binary flag vector f = f 1 , f 2 , · · · , f K and set all elements of f to 1, where K denotes the number of clusters. Then, we select the most valuable cluster by: where I(·) > 0 is the function that calculates the uncertainty value of a certain sample. k * means the serial number of the most valuable cluster. We set f k * = 0 to avoid successive selecting in the same cluster. Then, we pick up the finest instance x * in the most valuable cluster according to: Next, move x * from the data pool to the training set and mark x * with the label querying from the experts. During the repetition of Equations (9) and (10), we reset all the elements of f as 1 if In other words, the instances should be selected from each cluster alternately to maintain the balance of sample category of the training set. Moreover, Equations (10) makes larger informative (uncertainty) instances preferred in the selection, which helps the AL-DC methodology to converge faster on accuracy with the same labelling times. Finally, the whole methodology will stop when enough instances have been chosen, or certain precision has been satisfied. The details of AL-DC are summarized in Algorithm 1.

Algorithm 1.
Algorithm of active learning on dynamic clustering.

Input:
The learner. The training set.

Output:
The renewed learner. Procedure:

1.
Compute the mean value m s k of each class based on (6); 2.
Cluster the instances in data pool according to m s k by (7) and (8) iteratively; 3.
Delete x * from data pool and add it to the training set; 5.
Renew the learner with the updated training set; 6. Reset End if certain conditions are reached, otherwise, return to step 1.

System and Dataset
The drifted data were generated from an E-nose system designed by the machine learning repository at UC Irvine [16]. For gas sensing, four commercial gas-sensor models (TGS2600, TGS2602, TGS2610, and TGS2620) were selected to form a 16-sensor array, that is, four sensors were employed of each model. During each experiment, the gas-sensor array was working at a constant temperature of 400 • C. The flow rates of the injected gases can be adjusted via three Mass Flow Controllers (MFCs), and the total flow rate through the gas chamber is kept at 200 mL/min. There were six volatile compounds (ammonia, acetaldehyde, acetone, ethylene, ethanol, and toluene) adopted to the experiments over 36 months. A series of experiments was performed, and 13,910 samples were recorded to form a drift dataset. For ease of access, as shown in Table 1, all of the recorded data were arranged into 10 batches based on recording time. For each gas sensor, eight features were abstracted from the original response in one experiment. Among these features, two of them, namely, ∆R and its normalized version, indicated steady-state characteristics. The other six features were dynamic ones, called exponential moving average (ema a ), abstracted from both the rising and decaying stages of the raw sensor responses. The ema a is calculated as follows: where k is a natural number indexing the discrete time at which the chemical vapor is present, r[k] is the time profile of sensor resistance and set y[0] = 0 initially. Three different values are set for a (a = 0.1, a = 0.01 and a = 0.001). Thus, three of six dynamic features were for the rising stage, and the others were for the decaying stage. Considering 16 gas sensors have been tagged on the gas-sensor array, each sample can finally be mapped into a 128-dimensional feature vector (8 features × 16 sensors). Considering the high-dimension of the raw feature vector, we utilized Principal Component Analysis (PCA) plots to visualize the distributions of the drifted samples by batch in Figure 3a-i. The gas samples with the same chemical composition were distributed differently from Batch 1-10, which implies the in-deed drift effects on the samples with time. This dataset is often used to evaluate drift correction methods that recognizes gas samples by compounds. In this paper, we used these 10 batches of data as continuous on-line drifted samples. In terms of the learner, two classifier types, including Extreme Learning Machine (ELM) and Support Vector Machine (SVM), are adopted. For ELM, we used the Radial Basis Function (RBF) kernel and set the kernel parameters to 0.005. Considering both recognition rate and computational cost, we chose the linear kernel for SVM instead of the RBF kernel. Additionally, the penalty factor C of SVM was optimized in the range 10 −3 to 10 3 with a variable step size and set to 0.2.
To prove the superiority of the proposed methodology, we conducted the assessments in four stages. Primarily, we compared the proposed AL-DC methodology with other state-of-the-art methods with respect to recognition rate under drift effects. Higher recognition rate indicates better drift correction. Then, we discussed the parameter sensitivity of AL-DC with a varied selectedinstances number. The methodology is easier to apply if it has lower parameter sensitivity. Thirdly, we explored the reason AL-DC achieves excellent performance. Finally, labelling efficiency of AL-DC is analyzed using the index LEI. Higher LEI signifies faster accuracy increase with the same time and labor costs. For the implementation of AL methods, we divided the data into initial training samples, an online data pool and a test set. We collected the initial training samples from the first batch for learner building. Each subsequent batch contains an online data pool and a test set. The online data pool was a set of drifted instances with no labels, which stored the candidates for labelling. After the learner was updated by the selected instances with labels, the remaining data were used for testing. Therefore, the learner can be retrained at each time a new batch arrives. Totally, 9 AL processes would be trigged for drift correction with either Setting 1 or 2. To be consistent with the setting of the comparison methods, the scope of data pool covered the whole batch in the first discussion stage. In other discussion stages, the ratio of data pool to test set was about 1:2. The

Experimental Setup
In the experiments, we arranged two E-nose drift scenarios with different settings: "Setting 1" defines a long-term drift compensation scenario, which appoints Batch 1 as the initial training set and all the following batches are devoted to instance selection and testing. Each instance-selection process would be triggered when a new batch comes.
"Setting 2" creates short-term scenarios for drift counteraction, which uses two consecutive batches. The former batch is seen as the initial training set, while the latter one is used for instance selection and performance evaluation.
In terms of the learner, two classifier types, including Extreme Learning Machine (ELM) and Support Vector Machine (SVM), are adopted. For ELM, we used the Radial Basis Function (RBF) kernel and set the kernel parameters to 0.005. Considering both recognition rate and computational cost, we chose the linear kernel for SVM instead of the RBF kernel. Additionally, the penalty factor C of SVM was optimized in the range 10 −3 to 10 3 with a variable step size and set to 0.2.
To prove the superiority of the proposed methodology, we conducted the assessments in four stages. Primarily, we compared the proposed AL-DC methodology with other state-of-the-art methods with respect to recognition rate under drift effects. Higher recognition rate indicates better drift correction. Then, we discussed the parameter sensitivity of AL-DC with a varied selected-instances number. The methodology is easier to apply if it has lower parameter sensitivity. Thirdly, we explored the reason AL-DC achieves excellent performance. Finally, labelling efficiency of AL-DC is analyzed using the index LEI. Higher LEI signifies faster accuracy increase with the same time and labor costs. For the implementation of AL methods, we divided the data into initial training samples, an online data pool and a test set. We collected the initial training samples from the first batch for learner building. Each subsequent batch contains an online data pool and a test set. The online data pool was a set of drifted instances with no labels, which stored the candidates for labelling. After the learner was updated by the selected instances with labels, the remaining data were used for testing. Therefore, the learner can be retrained at each time a new batch arrives. Totally, 9 AL processes would be trigged for drift correction with either Setting 1 or 2. To be consistent with the setting of the comparison methods, the scope of data pool covered the whole batch in the first discussion stage. In other discussion stages, the ratio of data pool to test set was about 1:2. The detailed instance numbers of the data pool and testing sets are recorded in Table 2. For each batch, learner updating on AL was executed before learner testing.

Computational Environment
The drift dataset was preserved in 10 txt files. We imported the dataset and established the proposed methodology model in MATLAB (2014a). The computation was executed on a desktop computer with the following configuration: System: 64-bit, Windows 10. Processor: Intel i5-8500. RAM: 16 GB. Hard disk: solid state disk 128GB.

Accuracy Comparison
In this subsection, we employ several state-of-art drift counteraction methods for comparison. Four drift-correction types are presented: Component Correction (CC), Instance Correction (IC), Label-Free Correction (LFC) and AL. For CC, we chose two different approaches. CC-LDA [11] and CC-OSC [26] use LDA and OSC, respectively, to decompose the drift component from features and adopt SVMs as classifiers. TCTL+SEMI [26] and DAELM [24] are the representatives of the IC type, which renews the recognition models with drifted samples periodically and aimlessly. Moreover, the LFC type, which includes SVM-comgfk [23], ML-comgfk [23], the Multi-Feature Kernel Semi-supervised joint learning model (MFKS) [22], and Domain Regularized Component Analysis (DRCA) [11], assists with classifier updating in a no-label way, that is, the drift correction is performed on the drifted instances only. The last type is AL including AL-ACR [27] and our proposed AL-DC. Considering AL-DC is an open framework for an instance-selection strategy, we apply US, QBC and ER to AL-DC, which we denote as AL-DC-US, AL-DC-QBC and AL-DC-ER, respectively. For the methods which need labels, we add a number in brackets to indicate the sample size for drift compensation. Table 3 demonstrates the accuracy of the methods under the long-term scenario. We compute the accuracy A c by:

Long-Term Drift
where N correct and N batch denote the sample numbers of correct identification and current batch, respectively. We also use the accuracy defined in Formula (12) in Section 4.2.2. In Sections 4.3-4.5, we redefine accuracy A c considering both the data pool and the test set as follows: where N 1 correct and N 2 correct are the numbers of the corrected recognized samples in the data pool and the test set, respectively. Note: the bold in each column of the table denotes the highest recognition accuracy to certain batch, the same below.
We collect the accuracies from Batches 2 to 10 and provide the average values in the last column. In terms of the average accuracy, CC-LDA and CC-OSC have poor performance compared with other methodologies. The reason for this is that the decomposed drift-like component cannot keep up with the trend of actual drift. For the other methods, periodical correction is required, and we discover that the majority of AL methods have an excellent average accuracy (> 85%). In particular, AL-DC-ER (20) obtains the best results among all of the presented methodologies, with an accuracy of 93.06%. In terms of the IC type, TCTL+SEMI and DAELM-T achieved satisfactory rates of 87.60% and 85.62%, respectively. We infer that the occasional correction and sample label are the main reasons for these impressive outcomes. LFC-type methods maintain accuracies below 80% due to the lack of reliable labels. We also notice that the selective labelling in AL shows more flexibility than supervised learning does. For AL type methods, we can see that the AL-DC type methods (AL-DC-US and AL-DC-ER) provide more favorable results than AL-ACR under the same conditions. With respect to accuracies by batches, the AL-DC methods won 7 out of 9 comparisons. Furthermore, the recognition rates of Batch 10 decrease dramatically in all methods, which may be caused by the 5-month break between Batches 9 and 10.

Short-Term Drift
According to Setting 2, we summarize the recognition results of anti-drift methods in Table 4. With regard to the average accuracy, the CC approach generally lags behind the other types. DAELM-T has an accuracy of 85.10%, which is obviously higher than that of the CC and LFC methods. We attribute this to periodical drift correction with sample labels. The shortage of sample labels makes the LFC methods poorer than DAELM-T. The average accuracies of the AL methods are commonly higher than those of the other types. It is noticeable that the methods belonging to the AL-DC type again show greater recognition performance than AL-ACR. In particular, AL-DC-ER (20) achieves the highest average rate, at 93.74%, among all the methods. For the short-term drift scenarios presented, the AL-DC-type methods achieve the highest results in 8 out of 9 cases, and AL-ACR won the remaining one. Therefore, the AL-DC-type methods demonstrate excellent performance for short-term drift compensation.
Based on the results shown in Tables 3 and 4, we can draw the following two conclusions: (1) AL-based drift correction is superior to any other type with respect to accuracy; and (2) AL-DC methods, especially AL-DC-ER, outweigh AL-ACR with respect to drift counteraction.

Parameter Sensitivity
In this section, we intend to show the performance variation of the proposed AL-DC paradigm with the number of the selected instances N. As demonstrated in Figure 4a-f, we use the lines with black asterisks, red crosses, black triangles, and red squares to denote the accuracy of AL-ELM, AL-SVM, AL-DC-ELM, and AL-DC-SVM, respectively. Here, ELM and SVM indicate the classifier type used in the recognition and instance selection. The subplots in the first row represent the accuracy variations of the long-term scheme, while those in the second row are related to the short-term scheme. US, QBC, and ER strategies were adopted based on the subplots in the first, second and third columns, respectively. We discover that the AL methods on SVMs achieve better accuracy than the ELMs do. Moreover, the AL-DC methods are clearly superior to the traditional ALs when using the same classifier. When the number of selected instances N increases, the accuracy curves of AL-DC methods increase faster than the referenced ones. This confirms the excellent performance of the AL-DC type methods. Furthermore, the AL-DC methods often begin to increase their accuracies after N = 6. The reason for this is that the category of the drift dataset is equal to 6. For AL-DC, the training set with full categories is easier to form when N ≥ category number. After a rapid expansion around N = 6, the AL-DC curve enters a stable growth phase. In this phase, the overall trend of the curve is slightly upward with increasing N. Considering large N leads to extra computational and labor costs; we assume that the most favorable value of parameter N should be beyond and close to the category number. Of course, larger N is preferred if there is sufficient computational power, human resource and storage space.

11
demonstrate excellent performance for short-term drift compensation.
Based on the results shown in Tables 3 and 4, we can draw the following two conclusions: (1) AL-based drift correction is superior to any other type with respect to accuracy; and (2) AL-DC methods, especially AL-DC-ER, outweigh AL-ACR with respect to drift counteraction. In this section, we intend to show the performance variation of the proposed AL-DC paradigm with the number of the selected instances N. As demonstrated in Figure 4a-f, we use the lines with black asterisks, red crosses, black triangles, and red squares to denote the accuracy of AL-ELM, AL-SVM, AL-DC-ELM, and AL-DC-SVM, respectively. Here, ELM and SVM indicate the classifier type used in the recognition and instance selection. The subplots in the first row represent the accuracy variations of the long-term scheme, while those in the second row are related to the short-term scheme. US, QBC, and ER strategies were adopted based on the subplots in the first, second and third columns, respectively. We discover that the AL methods on SVMs achieve better accuracy than the ELMs do. Moreover, the AL-DC methods are clearly superior to the traditional ALs when using the same classifier. When the number of selected instances N increases, the accuracy curves of AL-DC

Accuracy Increasing Process
To explore the reason that AL-DC methodologies obtain excellent results, we show the accuracy increasing process for both the AL and AL-DC methodologies in this subsection. We choose a situation that existed in both Setting 1 and 2, that is, Batch 1 is used as the initial training set, while Batch 2 is treated as online drifted data. This means that the data pool and the testing data for the AL methods are all generated from Batch 2. As shown in Figure 5a-f, the heights of the red and black bars denote the accuracies of various AL and AL-DC methodologies, respectively. We set the maximum number of selected instances as N = 20 for one AL process, n (1 < n ≤ N) denotes the increasing number of selected instances during the AL process. Each subfigure describes the accuracy fluctuation with increasing n by a certain AL framework. The first to third columns in Figure 5 represent the results by using US, QBC, and ER strategies, respectively. The first line refers to the methods utilizing ELM as classifier, while the second line is based on SVM. We discovered two interesting findings: one is that the AL-DC-based methodologies achieve higher accuracy than traditional AL methods at the end of the AL process (n = N) in most cases. The other is that AL-DC-based methodologies have a faster convergence speed in all scenarios. In other words, they can obtain near-optimal results with smaller n (lower labelling cost) compared with the referenced methods. This is a favorable characteristic for online drift corrections considering the labelling cost. For the AL-DC methodologies, no matter which classifier or instance-selection strategy is used, as long as n exceeds 6, the recognition rates will be almost the same as the final value. Coincidentally, the category number of the dataset is 6, as well. We believe that this phenomenon is attributable to the DC strategy, which can select all kinds of samples if n ≥ category number. A full-category training set would obviously promote the performance of the classifiers.
In terms of instance-selection strategy, the US strategy is more effective than the QBC and ER strategies under the AL-DC framework. All three strategies on AL-DC demonstrate a high speed of accuracy convergence. When the US strategy is adopted, the highest accuracy reaches around 99%. For classifiers, although ELM and SVM have different convergence speeds in some cases, they will eventually achieve almost the same recognition rate at the end of the AL process. slightly upward with increasing N. Considering large N leads to extra computational and labor costs; we assume that the most favorable value of parameter N should be beyond and close to the category number. Of course, larger N is preferred if there is sufficient computational power, human resource and storage space. To explore the reason that AL-DC methodologies obtain excellent results, we show the accuracy increasing process for both the AL and AL-DC methodologies in this subsection. We choose a situation that existed in both Setting 1 and 2, that is, Batch 1 is used as the initial training set, while Batch 2 is treated as online drifted data. This means that the data pool and the testing data for the AL methods are all generated from Batch 2. As shown in Figure 5a-f, the heights of the red and black bars denote the accuracies of various AL and AL-DC methodologies, respectively. We set the maximum number of selected instances as N = 20 for one AL process, n (1 < n ≤ N) denotes the increasing number of selected instances during the AL process. Each subfigure describes the accuracy fluctuation with increasing n by a certain AL framework. The first to third columns in Figure 5 represent the results by using US, QBC, and ER strategies, respectively. The first line refers to the methods utilizing ELM as classifier, while the second line is based on SVM. We discovered two interesting findings: one is that the AL-DC-based methodologies achieve higher accuracy than traditional AL methods at the end of the AL process (n = N) in most cases. The other is that AL-DCbased methodologies have a faster convergence speed in all scenarios. In other words, they can obtain near-optimal results with smaller n (lower labelling cost) compared with the referenced methods. This is a favorable characteristic for online drift corrections considering the labelling cost. For the AL-DC methodologies, no matter which classifier or instance-selection strategy is used, as long as n exceeds 6, the recognition rates will be almost the same as the final value. Coincidentally, the category number of the dataset is 6, as well. We believe that this phenomenon is attributable to the DC strategy, which can select all kinds of samples if n ≥ category number. A full-category training set would obviously promote the performance of the classifiers.

Execution Efficiency
When continuous drifted instances enter, the drift correction process switches to an online mode. Thus, shorter time, less computation, and less labelling for drift correction becomes crucial. To evaluate the efficiency of AL methods, we use LEI and define it as follows: where A is the accuracy before the last update, ∆A is the accuracy increment after the updating with the recent labelling instance. a is an adjustment parameter belonging to [0, 1], and we set a to 0.5 in this discussion. Higher LEI means that instances picked out by certain AL methods are beneficial to classification. Table 5 shows the LEIs of AL and AL-DC with three different strategies in a long-term scenario when N ∈ [11,20]. The LEIs of the AL-DC-US, AL-DC-QBC, and AL-DC-ER are higher than those of AL-US, AL-QBC, and AL-ER, respectively. From each row in the table, we discover that the value does not always increase as N increases. In other words, a peak may occur when N is in a certain range. In the long-term scenario, the optimal N values corresponding to the AL-US, AL-DC-US, AL-QBC, AL-DC-QBC, AL-ER, and AL-DC-ER methods are 18,20,19,14,20, and 16, respectively. For the short-term scenario, most of the LEIs in Table 6 are larger than the LEIs in the corresponding locations in Table 5, because the timescale of the short-term drift data is smaller than that in the long-term drift data. According to the relationship between N and LEI in Table 6, the optimal N corresponding to the AL-US, AL-DC-US, AL-QBC, AL-DC-QBC, AL-ER, and AL-DC-ER methods are 19,18,18,11,20, and 16 in the short-term scenario, respectively. From both Tables 5 and 6, we conclude that AL-DC-type methodologies have larger LEI values than traditional AL methods. This proves that AL-DC is an efficient approach with respect to labelling cost and accuracy increasing speed.

Conclusions
To solve the drift problem of E-noses in an online situation, we proposed the AL-DC methodology, which selects instances using a specific strategy under drift effects. Initialization, clustering and instance selection are the three main stages of AL-DC, and these are compatible with common AL strategies, including US, QBC, and ER. We introduce a drift benchmark for E-noses to evaluate the proposed method with other state-of-art drift compensation approaches. The experimental results prove that AL-DC outmatches the other presented methods in long-term and short-term drift scenarios. Moreover, AL-DC is superior to other AL methods with the same selected instances, and has a faster convenience speed with respect to recognition rates. Additionally, the LEI value is discussed to assess the efficiency of AL methods. The results indicate that AL-DC is a low-consumption approach with respect to limited labelling time.