Gas-Sensor Drift Counteraction with Adaptive Active Learning for an Electronic Nose

Gas sensors are the key components of an electronic nose (E-nose) in violated odour analysis. Gas-sensor drift is a kind of physical change on a sensor surface once an E-nose works. The perturbation of gas-sensor responses caused by drift would deteriorate the performance of the E-nose system over time. In this study, we intend to explore a suitable approach to deal with the drift effect in an online situation. Considering that the conventional drift calibration is difficult to implement online, we use active learning (AL) to provide reliable labels for online instances. Common AL learning methods tend to select and label instances with low confidence or massive information. Although this action clarifies the ambiguity near the classification boundary, it is inadequate under the influence of gas-sensor drift. We still need the samples away from the classification plane to represent drift variations comprehensively in the entire data space. Thus, a novel drift counteraction method named AL on adaptive confidence rule (AL-ACR) is proposed to deal with online drift data dynamically. By contrast with conventional AL methods selecting instances near the classification boundary of a certain category, AL-ACR collects instances distributed evenly in different categories. This action implements on an adjustable rule according to the outputs of classifiers. Compared with other reference methods, we adopt two drift databases of E-noses to evaluate the performance of the proposed method. The experimental results indicate that the AL-ACR reaches higher accuracy than references on two E-nose databases, respectively. Furthermore, the impact of the labelling number is discussed to show the trend of performance for the AL-type methods. Additionally, we define the labelling efficiency index (LEI) to assess the contribution of certain labelling numerically. According to the results of LEI, we believe AL-ACR can achieve the best effect with the lowest cost among the AL-type methods in this work.


Introduction
An electronic nose (E-nose) is a potential approach to perform odour analysis. A typical E-nose system ordinarily consists of two parts: a gas-sensor array and a pattern-recognition unit. The former part generates odour fingerprints from gas-sensor responses with high cross sensitivity while the latter performs data analysis using computational algorithms. With this structure, E-noses can identify complicated gas components with low-cost gas-sensor arrays. Owing to the advantages in cost and operation, E-noses have been adopted in environmental monitoring [1], the food industry [2], agriculture [3] and medicine [4]. The modern E-nose system has appeared as an artificial intelligence machine since the 1990s [5]. There are several scientific fields, such as artificial intelligence, system integration and sensor technology supporting the development of E-noses.
Gas-sensor drift in E-nose systems is a kind of concept drift, which frequently occurs by surface aging, environmental disturbance and sensor poisoning. This phenomenon deteriorates the compatibility between gas-sensor responses and algorithm models over time, and finally causes the performance degradation of E-noses. In other words, the algorithm models would be meaningless without any drift calibrations over time. Thus, anti-drift methods are necessary for E-nose systems in their life-long working process. To reduce the drift effect of E-noses, possible solutions have been focused on sensor improvement [6] and algorithm modification [7]. According to current studies, algorithm-modification approaches have received increasing attention due to the achievements of artificial intelligence and machine learning. Signal pre-processing and classifier updating are two mainstreams in algorithm modification. The former manner extracts drift-like component from original gas-sensor responses and then restructures responses without drift. Huang et al. adopted principal component analysis (PCA) and common PCA (CPCA) at different gas concentrations to compensate drift signals [8]. Ziyatdinov et al. used CPCA to discover a drift direction for all gasses in feature space [9]. Besides PCA, orthogonal signal correction (OSC) [10] and independent component analysis (ICA) [11] are alternatives for drift counteraction in the time domain. Furthermore, the wavelet is another way to remove drift component in time-frequency domain [12]. In terms of classifier updating, the classifier ensemble is a popular strategy for long-term drift migration [13][14][15]. This strategy provides a weighted result based on several sub-classifiers generated from different samples. Moreover, some scholars used adaptive classifiers including self-organize maps (SOM) [16,17], adaptive resonance theory (ART) [18,19] and immune algorithm [20], to handle drift distribution adaptation dynamically. Recently, the algorithm modification manner has been extended to solve the drift problem in semi-supervised learning situations. Liu et al. put a transfer learning paradigm into a semi-supervised scene for drift elimination [21]. All the above methods have a common premise that a calibration set with complete categories and well-labelled samples should be prepared for drift correction. However, this demand would be barely satisfied in online scenarios due to inadequate time and labour to obtain such a calibration set, for example, in online toxic-gas monitoring. In this application, toxic gases rare appear while non-hazardous gases are majority. Therefore, the collected calibration set is imbalanced, which would decline the performance of the E-nose in toxic-gas recognition. Additionally, we assume that the toxic-gas recognition would be a failure according to the unreliable labels of calibration samples from the classifiers.
In this study, we use active learning (AL) to solve the calibration-set problem. AL methods can select a certain number of instances from recent incoming samples, then label the selected instances and update classifiers reliably. We believe this working process can adapt well to online scenarios due to the fact that AL methods have the ability to select relatively optimal instances from undetermined calibration samples for classifier upgrading. Moreover, conventional AL instance-selection strategies select instances with low confidence, which cannot represent the current distribution of drifting samples entirely. Consequently, we have proposed an innovative AL method on the adaptive confidence rule (AL-ACR) to balance the categories of selected instances. It is helpful to enhance the uniformity of drift information in a calibration set. We use two E-nose drift databases as benchmarks to evaluate the performance of the proposed method in drift counteraction. One dataset is public and the other was collected by the authors. According to the experimental results, we infer that the proposed AL-ACR is suitable for online drift calibration. The AL-ACR shows adequate adaptability and feasibility in online scenarios and reaches better accuracy comprehensively than other references. Considering the time-consuming and laborious requirements of manual labelling, it is crucial to obtain a higher accuracy with a lower labelling number. We therefore define the labelling efficiency index (LEI) to assess the efficiency of AL methods. The results of LEI show that AL-ACR has obvious advantages in most cases.
In this paper, we explain the development of AL in Section 2. Section 3 describes the details of the proposed AL method. In Section 4, we show and discuss the comparative results of the proposed and reference methods upon two databases. Finally, we summarize some conclusions from the experimental results in Section 5.

Related Work
The AL paradigm generally consists of two modules, a selecting engine and a learning engine [22]. The selecting engine is designed to achieve high-quality unlabelled samples according to the classifiers constructed by the learning engine. It labels the selected samples manually and adds the labelled instances to the training set. The learning engine is responsible for renewing classifiers by the both training set and the manual labelled instances, which makes the performance of the classifier improve continuously. The cycle of "selecting-learning" would be stopped if certain termination condition were reached. It is obvious that how to select high-quality instances from unlabelled samples on feature information is a key part of the AL method. It plays an important role in promoting the classification performance of AL. Therefore, studies on instance-selection strategies have gradually increased.
Membership query synthesis (MQS), stream-based and pool-based methods are typical methods proposed to select unlabelled samples. MQS is the earliest idea of AL using a query to learn [23]. In this method, the hypothetical learning system can ask questions from experts, that is, the MQS determines labels of certain instances by query. As for the instance selection of MQS, all unlabelled samples are handed to the expert for labelling, regardless of labour cost and data distribution. To avoid unnecessary labelling, scholars have proposed stream-based AL which selects instances according to either information entropy or distribution similarity and labels them in an instance-by-instance manner [24]. Although the stream-based AL has solved the drawbacks of the MQS to a certain extent, it still demands a fixed threshold to measure the information contained in each input sample. Thus, unsteady labelling density and rigid parameter adaptation would limit the performance of stream-based AL. Consequently, Lewis et al. proposed a pool-based AL method to form a "pool" for expert labelling [25]. This pool-based AL selectively labels instances that exist in a dataset to enhance the performance of classifiers. In fact, this method becomes equivalent to the stream-based AL when just one unlabelled sample exists in the pool. In other words, the stream-based AL is a special case of the pool-based one. For the instance-selection methods of pool-based AL, how to select the most representative instances from the candidates in order to label is crucial. Uncertainty sampling (US) and query-by-committee (QBC) strategies are main approaches for instance selection. The former identifies samples on distributions while the latter selects instances according to the outputs of classifiers.
Recently, in the studies of E-noses, Jiang X et al. proposed an enhanced QBC (EQBC) -radial basis function neural network (RBFNN) [26] for the recognition improvement on the QBC strategy. However, there are hardly any pubic reports for drift counteraction of E-noses with active learning. As far as we know, this study is the first research on drift compensation of E-noses with AL methods.

Baseline Methods
There are three AL methods including US, QBC and EQBC presented as baseline for comparison. Considering the various variants of them, we use US proposed in Ref. [27], QBC and EQBC in Ref. [26].
The US strategy uses the classifiers to identify the largest indeterminacy of unlabelled samples for subsequent labelling. The focus of US is to measure the uncertainty of the sample. It assumes the most uncertain sample contributes to the improvement of classification at most. We can obtain the optimal sample for recognition improvement by calculating posterior probability. We compute the margin i of instance p i as follow: whereŷ c1 andŷ c2 are, respectively, the class with the maximum posteriori probability and the second most posteriori probability. The margin-based metric is prone to select instances with minimum margin i between posteriori probabilities of the two most likely class labels.
The QBC method is the most popular AL method based on version space reduction. It aims to select samples in a simplified space for man-made labelling, which has been proposed by Seung et al. [28] and Freund et al. [29]. The QBC sets up a committee group to vote a sample with the highest disagreement for expert labelling. We choose Kullback-Leibler (KL) [30] as the metric for QBC to measure the valuable instances for classification promotion. The calculation formula of KL can be given by Equations (1) and (2) as follows: where P k (c m |p m ) represents the probability that sample p i is labelled as Class c m by member k and K denotes the total number of committee members. The EQBC method is based on a weighted combination of Kullback-Leibler divergence (KL−d) e i KL−d and vote entropy (VE) e i VE as shown in Equation (4).
where ω 1 and ω 2 are two adjustable weighted parameters.

Adaptive Instance Selection
Baseline AL methods regularly select useful instances with a single criterion with either high information entropy or low similarity, which is suitable to static data due to the constant data distribution. The classification plane is more distinguished by the increasing number of selected instances, regardless of the instance expiration. Unfortunately, in online mode, instance expiration is bound to exist under drift influence. This limitation may lead AL methods based on single criterion to barely reflect current drift trend by category in time. Finally, in a period, the performance of one class may rise, whereas others would incline. Thus, we proposed an innovative pool-based AL method, AL-ACR, with different instance-selection criteria adaptively. There are two criteria used here to discover the instances evenly in distribution by category. We adopt information entropy to index the necessity of samples for labelling. A high information-entropy value shows low confidence, vice versa.
As Figure 1 described, the AL-ACR method implements instance selection and classifier promotion iteratively. F is the index of min/max confidence and its default value equals 0. F = "0" denotes that the minimum confidence should be used while F = "1" infers that the maximum confidence should be selected. For classifiers, the chairman classifier C s0 is trained by the whole training set, and the member classifiers C s1 and C s2 are generated from parts of the training set randomly. In an instance-selection process, we primarily compare the chairman outputs of the current instance before and after classifier updating. On the one hand, if they are different, it means the classification plane of the chairman needs to be distinguished, then the sample with maximum information entropy to specify the classification boundary should be selected in the next round. On the other hand, if they are the same, we assume the category of current instance is well classified, then instances belonging to other kinds should be designated with minimum information entropy according to label diversity and distribution equilibrium. For classification promotion, when entering a new instance, the corresponding label is assigned by expert opinion. After that, both the new instance and corresponding label are added to the training sets belonging to the members of the committee. Next, the members are respectively retrained by their training sets.

Initialize training set, pool set P and test set
Compute the KL divergence ei for each sample pi in P as (2) and (3) Initialize the selected-instance amount N, matrix of pool samples P, P'=P, index of min/max confidence F, the number of selected instances i = 1 and classifier set {Cs0, Cs1, Cs2} Label current selected instance vs as the recognition output ovs by chairman Cs0 Delete vs from P and add it to training set

Public Dataset
The public dataset (http://archive.ics.uci.edu/ml/machine-learning-databases/00224/) is from the machine learning repository of University of California Irvine [31]. We denote this dataset as Dataset A for short in following discussion.
There are totally 13,910 samples collected by an E-nose over three years (36 months). The gas-sensor array consists of sensor devices (4 of each) as TGS2600, TGS2602, TGS2610 and TGS2620; in total 16 gas sensors are adopted in the E-nose system. This dataset includes six-gas samples dosed at different concentrations. The entire response of a gas sensor in an experiment can be divided into three phases: injection, measurement and sweeping. Consequently, 8 features (2 steady-state features and 6 transient features) have been abstracted from these three phases for each sensor. On account of the sensor array made up of 16 gas sensors, one experiment can collect 128 (8 × 16) features. Thus, the total size of Dataset A is 13,910 × 128. All 10 batches have been arranged by time order. To reduce the

Public Dataset
The public dataset (http://archive.ics.uci.edu/ml/machine-learning-databases/00224/) is from the machine learning repository of University of California Irvine [31]. We denote this dataset as Dataset A for short in following discussion.
There are totally 13,910 samples collected by an E-nose over three years (36 months). The gas-sensor array consists of sensor devices (4 of each) as TGS2600, TGS2602, TGS2610 and TGS2620; in total 16 gas sensors are adopted in the E-nose system. This dataset includes six-gas samples dosed at different concentrations. The entire response of a gas sensor in an experiment can be divided into three phases: injection, measurement and sweeping. Consequently, 8 features (2 steady-state features and 6 transient features) have been abstracted from these three phases for each sensor. On account of the sensor array made up of 16 gas sensors, one experiment can collect 128 (8 × 16) features. Thus, the total size of Dataset A is 13,910 × 128. All 10 batches have been arranged by time order. To reduce the imbalance of batch sizes, we integrated Batch 4 and 5 as Batch 4&5, Batch 8 and 9 as Batch 8&9. Thus, we actually prepared 8 bunches of drifting samples for evaluation. Additionally, the working temperature of the gas sensors was maintained at 400 • C and the sampling rate was set to 100 Hz. Figure 2a-h shows the distributions of the 8-bunch samples in principal component analysis (PCA) plots. We can infer that the drift effect acts apparently on the samples in all batches. The distribution of data varies continuously with time, especially between different batches. We believe that classifiers worked with no drift counteraction would be invalid rapidly. imbalance of batch sizes, we integrated Batch 4 and 5 as Batch 4&5, Batch 8 and 9 as Batch 8&9. Thus, we actually prepared 8 bunches of drifting samples for evaluation. Additionally, the working temperature of the gas sensors was maintained at 400 °C and the sampling rate was set to 100 Hz. Figure 2a-h shows the distributions of the 8-bunch samples in principal component analysis (PCA) plots. We can infer that the drift effect acts apparently on the samples in all batches. The distribution of data varies continuously with time, especially between different batches. We believe that classifiers worked with no drift counteraction would be invalid rapidly.

Collected Dataset
Besides Data A, the other dataset was collected from an E-nose system we designed. This E-nose system consists of three parts: gas sensor array, control unit and upper computer. The gas-sensor array includes 32 gas sensors with solid electrolyte, electrochemical and metal oxide types. The control unit transfers the fingerprints from the gas-sensor array to the upper computer for further processing. We gathered 63, 189 and 189 samples in the first, third and fourth months, respectively. Consequently, we divided the 4-month dataset into three batches according to chronological order and each batch had all kinds of samples equally in quantity. For each sample, we used the steady-state value of certain gas-sensor responses as a feature. Thus, one gas sensor corresponds one feature and one sample can be denoted as a 32-dimensional vector according to 32 gas sensors. Here, we define this dataset as Dataset B for following discussions.

Collected Dataset
Besides Data A, the other dataset was collected from an E-nose system we designed. This E-nose system consists of three parts: gas sensor array, control unit and upper computer. The gas-sensor array includes 32 gas sensors with solid electrolyte, electrochemical and metal oxide types. The control unit transfers the fingerprints from the gas-sensor array to the upper computer for further processing. We gathered 63, 189 and 189 samples in the first, third and fourth months, respectively. Consequently, we divided the 4-month dataset into three batches according to chronological order and each batch had all kinds of samples equally in quantity. For each sample, we used the steady-state value of certain gas-sensor responses as a feature. Thus, one gas sensor corresponds one feature and one sample can be denoted as a 32-dimensional vector according to 32 gas sensors. Here, we define this dataset as Dataset B for following discussions. Figure 3a- imbalance of batch sizes, we integrated Batch 4 and 5 as Batch 4&5, Batch 8 and 9 as Batch 8&9. Thus, we actually prepared 8 bunches of drifting samples for evaluation. Additionally, the working temperature of the gas sensors was maintained at 400 °C and the sampling rate was set to 100 Hz. Figure 2a-h shows the distributions of the 8-bunch samples in principal component analysis (PCA) plots. We can infer that the drift effect acts apparently on the samples in all batches. The distribution of data varies continuously with time, especially between different batches. We believe that classifiers worked with no drift counteraction would be invalid rapidly.

Collected Dataset
Besides Data A, the other dataset was collected from an E-nose system we designed. This E-nose system consists of three parts: gas sensor array, control unit and upper computer. The gas-sensor array includes 32 gas sensors with solid electrolyte, electrochemical and metal oxide types. The control unit transfers the fingerprints from the gas-sensor array to the upper computer for further processing. We gathered 63, 189 and 189 samples in the first, third and fourth months, respectively. Consequently, we divided the 4-month dataset into three batches according to chronological order and each batch had all kinds of samples equally in quantity. For each sample, we used the steady-state value of certain gas-sensor responses as a feature. Thus, one gas sensor corresponds one feature and one sample can be denoted as a 32-dimensional vector according to 32 gas sensors. Here, we define this dataset as Dataset B for following discussions.

Experimental Setup
Our evaluation contains four parts. We primarily evaluated the accuracy of proposed AL-ACR compared with those of other start-of-the-art methods (AL-US, AL-QBC, AL-EQBC, RBF-SVM, GFK-SVM, Comgfk-SVM, Comgfk-ML and RBF-ML [21]) in long-term online drift counteraction. Secondly, we defined 4 different long-term scenarios to assess the performance between AL methods with labelled-instance number, category balance and labelling efficiency. The defined Scenarios 1-4 are as follows: For each scenario, data pre-processing has been performed to normalize sample features in [0, 1] by: where x represents a 128-dimensional sample, x is the output after pre-processing, x max and x min is the maximum and minimum feature value in x respectively. In terms of classifiers, we adopt three types: k-nearest neighbour (k-NN), support vector machine (SVM) and radial basis function neural network (RBFNN). The parameter k of k-NN is set to 3. For SVM, we adjust kernel function, kernel parameter σ (kernel function only) and penalty factor C to optimize the performance of SVM. During optimization of SVM, either radial basis or linear function could be selected. We set the scope of kernel parameter σ and the finest value of C to the range 10 −3 to 10 3 with variable step. Considering the computational complexity and overall performance, we finally choose linear function as our option and set C = 0.2 (dataset A) or C = 2 (Dataset B). For RBFNN, we set the number of hidden layers to 50, the training error to 10 −5 and the variance of radial basis kernel to 1.
For AL methods, we chose roughly 30% of the data as pool samples per batch while the remains are used for testing only. We consider that it is impossible to label instances at any time in reality. Hence, the time of labelling should be concentrated in a period per batch. In the following section, on account of the fairness, the accuracies of the AL methods are obtained on whole batches including both pool and testing data in Table 1. Others are concluded on the testing data due to the comparison just between AL methods. Table 1 has shown the accuracies of different paradigms in the Scenario 1. The supervised paradigm methods, RBF-SVM, GFK-SVM, Comgfk-SVM, are SVM with radial basis, geodesic flow and associated combined kernel, respectively. Another paradigm is semi-supervised learning including Comgfk-ML and RBF-ML. In terms of the AL paradigm, AL-US, AL-QBC, AL-EQBC and the proposed AL-ACR methods are used for comparison. We abstract the results of RBF-SVM, GFK-SVM, Comgfk-SVM, Comgfk-ML and RBF-ML directly from Ref. [21]. In this subsection, we calculate the results of AL methods in scenario 1 and use SVM only for performance comparison owing to the same settings adopted in Ref. [21].

Performance Evaluation of Paradigms
We set the number of selected instance N to 6 for AL methods, a comparatively small number leading to the indistinctness of drift description. Reasonably, the performance of AL methods would be restricted to a poor level in the range of N. Among all the methods with different paradigms, AL-ACR has clearly reached the best recognition performance. The above phenomena indicate that the proposed method is suitable for online drift suppression in most cases. For the paradigm, we believe the supervised learning manner cannot compensate drift well for a long time; the semi-supervised learning methods are weak in robustness due to the unreliability of online labels. Table 1. Accuracy of methods in Scenario 1 (%).

Effect of Instance Number
The number of labelled instances N for each batch is an important parameter of AL methods. We investigate the recognition-rate shift of the AL-ACR models by increasing N value from small to large. In order to avoid class imbalance by the number of labelled instances, we set the increasing step of N to the category size of the dataset. Thus, for Scenarios 1-3, the step of N is 6 while that of Scenario 4 is 7.
As Figures 4a-d, 5a-d and 6a-d show, the green line with cross, indigo line with diamond, blue line with star and red line with star represent the accuracy of AL-US, AL-QBC, AL-EQBC and AL-ACR, respectively. The recognition rates of three AL methods are presented. We can infer that larger N causes higher recognition rate in all cases. We believe the drift information has been described more abundantly and accurately since N increased. Compared with the other three methods, AL-ACR has achieved the best accuracy in most cases. This confirms the significant effectiveness of AL-ACR for online drift suppression of E-noses once again and the performance of reference methods is lagging behind AL-ACR. In other words, AL-ACR is an optimal choice almost under different parameters N from the results of Figures 4a-d, 5a-d and 6a-d.
For both datasets A and B, the performance of algorithms has entered the bottleneck or even a certain decline since N became large. This is reasonable due to drift-information redundancy caused by the mismatch between N and the drift-information time scale. Such redundant information is meaningless to the AL models, and even lead to negative effects.
Additionally, we use three classifiers, k-NN, SVM and RBFNN, to test whether the performance of the proposed methods has selectivity for classifiers. We note that all the classifiers share almost the same trend in terms of accuracy. On the one hand, SVM often shows stronger recognition ability than k-NN and RBFNN. This may be caused by the fact that SVM searches for optimal solutions globally whereas k-NN is based on local discriminate and RBFNN may suffer over-fitting. On the other hand, the performance difference of all four methods seems smaller on SVM than both k-NN and RBFNN. We infer that the excellent recognition ability of SVM may have offset the drawbacks of AL-US, AL-QBC and AL-EQBC.

Distribution of Labeled Instances
As previously described, the AL-ACR method tries to select instances evenly by category. This action would help the classifiers to capture entire drift details near and away from the classification boundaries during online drift compensation.
As Figure 7 and Table 2 show, we present the true labels of selected instances with different colours in all scenarios. Black, ochre, blue and red dotted box denotes the bar of AL-US, AL-QBC, AL-EQBC and AL-ACR, respectively. The height of the bar denotes the instance quantity of a certain category. This allows us to see 4 bars in each scenario. Furthermore, we set total number of labelled instances to 210, 216, 210 and 70 for Scenarios 1-4 respectively. We can infer from the results that AL-ACR has the most balanced instance category to achieve a complete drift trend in online mode. The other three methods are slightly worse than AL-ACR in label balance. According to the ratios of instance classes in Table 2, we can find more balanced category distribution appeared in AL-ACR, which can properly explain why AL-US, AL-QBC and AL-EQBC perform weaker in terms of accuracy.

Distribution of Labeled Instances
As previously described, the AL-ACR method tries to select instances evenly by category. This action would help the classifiers to capture entire drift details near and away from the classification boundaries during online drift compensation.
As Figure 7 and Table 2 show, we present the true labels of selected instances with different colours in all scenarios. Black, ochre, blue and red dotted box denotes the bar of AL-US, AL-QBC, AL-EQBC and AL-ACR, respectively. The height of the bar denotes the instance quantity of a certain category. This allows us to see 4 bars in each scenario. Furthermore, we set total number of labelled instances to 210, 216, 210 and 70 for Scenarios 1-4 respectively. We can infer from the results that AL-ACR has the most balanced instance category to achieve a complete drift trend in online mode. The other three methods are slightly worse than AL-ACR in label balance. According to the ratios of instance classes in Table 2, we can find more balanced category distribution appeared in AL-ACR, which can properly explain why AL-US, AL-QBC and AL-EQBC perform weaker in terms of accuracy.

Distribution of Labeled Instances
As previously described, the AL-ACR method tries to select instances evenly by category. This action would help the classifiers to capture entire drift details near and away from the classification boundaries during online drift compensation.
As Figure 7 and Table 2 show, we present the true labels of selected instances with different colours in all scenarios. Black, ochre, blue and red dotted box denotes the bar of AL-US, AL-QBC, AL-EQBC and AL-ACR, respectively. The height of the bar denotes the instance quantity of a certain category. This allows us to see 4 bars in each scenario. Furthermore, we set total number of labelled instances to 210, 216, 210 and 70 for Scenarios 1-4 respectively. We can infer from the results that AL-ACR has the most balanced instance category to achieve a complete drift trend in online mode. The other three methods are slightly worse than AL-ACR in label balance. According to the ratios of instance classes in Table 2, we can find more balanced category distribution appeared in AL-ACR, which can properly explain why AL-US, AL-QBC and AL-EQBC perform weaker in terms of accuracy.

Values of Labelling Efficiency Index (LEI)
In general, we hope the AL methods can not only achieve high accuracy, but also reduce the number of labels to save cost in terms of time and laboriousness. In other words, raising the accuracy increment per labelling is crucial in AL. Therefore, we define LEI to demonstrate the cost of a certain AL method in recognition improvement. The LEI is a hybrid index as follows: where Acc is the accuracy under the number of selected instances N, ∆Acc is the increment of accuracy since the last labelling and α is a adjust parameter belongs to [0, 1]. We define LEI as a sum of two terms that represent current performance and associated increment, respectively. A qualified AL method should have excellent classification ability and economic cost of performance enhancement simultaneously. We set α to 0.2 for LEI computation. Tables 3-8 illustrate the LEI values in all scenarios on the three classifiers with different N. In Scenario 1, no matter either SVM, k-NN or RBFNN, AL-ACR reaches the highest LEI score from beginning to end. In Scenario 2, AL-ACR is still the winner in all cases on both k-NN and RBFNN while 7 out of 10 cases on SVM are best with AL-ACR. As for Scenarios 3 and 4, although the performance of AL-ACR sometimes falls down compared with other references, it still keeps the highest LEI values in most cases. In a word, the AL-ACR has the higher labelling efficiency compared with AL-US, AL-QBC and AL-EQBC in different scenarios almost. We assume that AL-ACR is a very economical and practical approach to use for drift counteraction of E-noses in online mode.

Conclusions
To solve the online drift problem, we redefine a novel AL method, namely AL-ACR, to overcome the imbalance problem of drift information. Instance selection is performed with adaptive rules. Experimental results prove that the proposed method overmatches other references in online working. Furthermore, AL-ACR has obvious advantages in recognition, parameter sensitivity, instance equilibrium and labelling efficiency. It is a favorable choice for the online drift counteraction of E-noses. Future work should be focused on the online mechanism of pool refreshing and online AL implementation in a limited storage resource.