One-Class Drift Compensation for an Electronic Nose

: Drift compensation is an important issue in an electronic nose (E-nose) that hinders the development of E-nose’s model robustness and recognition stability. The model-based drift compensation is a typical and popular countermeasure solving the drift problem. However, traditional model-based drift compensation methods have faced “label dilemma” owing to high costs of obtaining kinds of prepared drift-calibration samples. In this study, we have proposed a calibration model for classiﬁcation utilizing a single category of drift correction samples for more convenient and feasible operations. We constructed a multi-task learning model to achieve a calibrated classiﬁer considering several demands. Accordingly, an associated solution process has been presented to gain a closed-form classiﬁer representation. Moreover, two E-nose drift datasets have been introduced for method evaluation. From the experimental results, the proposed methodology reaches the highest recognition rate in most cases. On the other hand, the proposed methodology demonstrates excellent and steady performance in a wide range of adjustable parameters. Generally, the proposed method can conduct drift compensation with limited one-class calibration samples, accessing the top accuracy among all presented reference methods. It is a new choice for E-nose to counteract drift effect under cost-sensitive conditions.


Introduction
Over the past three decades, a bionic olfactory system named electronic nose (E-nose) has been applied to sense and identify volatilized organic compounds with a customized gas sensor array and associated intelligent algorithm models [1][2][3][4]. Behind the development of E-noses, gas sensor drift caused by inherent characteristics of metal-oxide-semiconductor sensors has played a negative role degrading the reproducibility of E-noses during longterm detections [5]. In the view of intelligent algorithm models, the gas sensor drift leads to a slowly random fluctuation of input signals, which can be seen as a data distribution movement in a multi-dimensional vector space. To maintain recognition effectiveness, drift compensation becomes an important issue, adjusting the models to adapt the time-varying data.
A number of researchers have made efforts to solve the drift problem of E-noses. In addition to straightforward attempts on gas sensor material, structure, and fabrication improvements [6][7][8], the algorithm approach is a popular choice counteracting the negative effect of drift. Commonly, the algorithm approach can be divided into two manners based on the usage of category information of drift correction samples. The first one is a supervised manner, using both drift correction samples and associated class information (labels) for drift compensation. The supervised manner provides complete drift information, but acquires an independent collection process to obtain sufficient and full-category drift correction samples [9][10][11][12], which leads to a costly, laborious, and time-consuming drift 2 of 13 compensation. To overcome this issue, researchers have tried the second manner, a flexible process allowing drift correction on fragmentary category information. Accordingly, semisupervised learning [13,14] and active learning [15,16] methods have been introduced to use a relatively small size of full-category drift correction samples selected from massive unlabeled drift data. Once the size of labeled drift correction samples reduces to zero, that is, all drift correction samples become unlabeled data, some dimension reduction methods can be used as long as the drift disturbance is regarded as an abnormal component [17][18][19][20]. Moreover, domain adaptation has been utilized, projecting the drift data and initial training samples for a shorter distance. Following this approach, Zhang et al. enhanced the distribution consistency between drift and initial training samples in an obtained subspace to adapt drift sensor responses [21]. Yi et al. conducted a further mathematical model by using label information of the source data, distinguishing different sample classes [22]. Recently, Liu et al. have achieved an optimized data space for drift compensation with maximum label-feature correlation and minimum feature redundancy [23]. Although the second manner decreases the cost of drift compensation by removing the independent collection process of labeled drift correction samples, the movement of relative distributions between different categories has been ignored. Therefore, we need to find a new drift compensation approach overcoming the drawbacks of the above two manners.
We selected the preferred method of utilizing one-class (one-category) drift compensation instead of full-category or none drift correction samples in this study. Such one-class drift compensation not only provides definite label information to determine the relative distribution changes but also decreases the category demands of drift correction samples. Accordingly, we have established a multi-task learning model [24,25] to obtain a class-label predictor of drift data, considering the data and class label distributions of both initial training and one-class correction samples comprehensively. Specifically, domain adaptation and linear predictor model inspired us to mine the unlabeled and labeled sample information, respectively. Furthermore, we have presented an intact solution process, gaining a closed-from solution of the proposed multi-task model. We used two long-term experimental datasets from two E-nose systems as testing benchmarks. From the results on the benchmarks, the proposed method has demonstrated an obvious superiority to the other state-of-the-art methods on drift compensation.
The objectives of this study were to: (1) simplify sample preparation by using a single class of drift correction samples, (2) establish a specific mathematical model for one-class drift compensation, and (3) provide a fast solution process for the mathematical model.
The rest of this article is arranged as follows: Section 2 introduces the used drift datasets, the E-nose systems, and the details of the proposed methodology. Section 3 provides related settings, experimental results, and related discussions. Finally, the last part summarizes this study.

Experimental Data
In this study, we have employed two E-nose drift datasets from previous studies as the drift observations. Dataset A is a public benchmark from [26] while Dataset B is collected from an E-nose system we have designed [15].

Dataset A
This dataset was generated from an E-nose system consisting of 16 sensors of four different types (TGS2600, TGS2602, TGS2610, and TGS2620, four sensors of each type), aiming to distinguish several simple volatile organic substances in a long-term period. Eight geometric features, including two steady state features, three transient features from the rising phase, and three transient features from the declining phase, have been extracted from each gas sensor response curve. Thus, one experiment can be represented as a sample vector with 128 (16 sensors × 8 features) dimensions. In total, 13,910 samples have been collected and recorded in a 36-month long period. The testing objects include six categories, Chemosensors 2021, 9, 208 3 of 13 namely, ethanol, ethylene, ammonia, acetaldehyde, acetone, and toluene. According to the time of experiments, these samples have been divided into 10 batches (shown in Table 1). The sample distributions of the 10 batches are visualized by 2-dimensional principal component analysis (PCA) plots in Figure 1.  Table 1). The sample distributions of the 10 batches are visualized by 2-dimensional principal component analysis (PCA) plots in Figure 1.

Dataset B
Dataset B was obtained from a self-designed E-nose system composed of 32 gas sensors [15]. The gas sensor array information is recorded in Table 2. We used this E-nose system to analyze complex aroma compounds from different beverages. Each experiment has been conducted including three phases: baseline, testing, and clean. Both baseline and testing phases lasted 3 min., maintaining the flow rate at 100 mL/min. The clean phase lasts 10 minutes with 3 L/min, the maximum flow rate of the E-nose system. Clean air was injected in both baseline and clean phases, while the headspace vapors of beverages were

Dataset B
Dataset B was obtained from a self-designed E-nose system composed of 32 gas sensors [15]. The gas sensor array information is recorded in Table 2. We used this E-nose system to analyze complex aroma compounds from different beverages. Each experiment has been conducted including three phases: baseline, testing, and clean. Both baseline and testing phases lasted 3 min, maintaining the flow rate at 100 mL/min. The clean phase lasts 10 minutes with 3 L/min, the maximum flow rate of the E-nose system. Clean air was injected in both baseline and clean phases, while the headspace vapors of beverages were sampled in the testing phase. We abstracted one feature s in an experiment from each sensor response curve as follows: where R s and R 0 , respectively, denote the stable response and baseline value of a testing object. Hence, the data of one experiment were refined to a 32-dimensional sample vector considering 32 sensors in a gas sensor array. We sampled the headspace volatile compounds of seven beverages, including beer, liquor, wine, pu'erh tea, oolong tea, green tea, and black tea. With regard to each type of tea, 2 g of solid tea leaves was soaked with 200 mL of distilled water for 5 min. Afterwards, the original solution of tea can be attained by filtering out the liquid, while the original solutions of beer, liquor, and wine were bought directly from the manufacturers. Then, we formulated samples at different concentrations with both original solution and distilled water, which maintained the temperature around 25 • C. Accordingly, low, medium, and high concentration samples were formulated for each beverage according to the ratio of original solution at 14%, 25%, and 100%. Dataset B covers a 4-month experimental period, collecting 63, 189, and 189 samples in Month 1, 3, and 4, respectively. For each month, we tested seven beverages in three concentrations (14%, 25%, and 100%) created by different dilution rates. The experiments on a certain concentration were repeated one, three, and three times in Month 1, 3, and 4, respectively. Accordingly, 441 samples have been recorded in Dataset B, and we gathered these samples into Batch S1-S3 by month. Figure 2 has demonstrated the sample distributions of Batch S1-S3 in 2-dimensional PCA plots.

Notations for Methods
Some specific notations should be determined for better understanding and introduction of the following models and methodologies. Primarily, the initial and following drift samples can be assumed to be two-domain data with discrepant but correlated data distribution. The domain adaptation is a kind of transfer learning paradigm, aiming to explore a common data space that makes these two-domain data be identically distributed.
In this paper, we, respectively, set initial training samples is a label matrix that contains all the class-label vectors of the source domain data, where one-hot coding (mainly uses n-bit status to encode N states. Each state is independent and only one bit is effective at any time) was used for each label vector, and C is the number of classes. In order to reduce the task complexity, time, and material expenditures, we tried to minimize the sizes of both calibration samples and associated sample categories. Here, we set the category size of the calibration samples to one, the minimum value we can access. We defined Moreover, we selected a first-order linear decision function to conduct classification in terms of its simple structure and low computational loads.
, D C S T × ∈ P P  were two weight matrices should be solved in decision functions for the source and target domains, respectively. Additionally, we have adopted ⋅ , and * ⋅ to represent the transpose operator, frobenius, and nuclear norms, respectively.

Transfer-Sample-Based Coupled Task Learning
Transfer-sample-based coupled task learning (TCTL) [27] aims to learn a prediction model for E-nose drift samples through a small number of transfer samples (drift correction samples). It is a typical cost-saving drift compensation method, and its objective function can be represented as a loss function as follows:

Notations for Methods
Some specific notations should be determined for better understanding and introduction of the following models and methodologies. Primarily, the initial and following drift samples can be assumed to be two-domain data with discrepant but correlated data distribution. The domain adaptation is a kind of transfer learning paradigm, aiming to explore a common data space that makes these two-domain data be identically distributed.
In this paper, we, respectively, set initial training samples ∈ R N T ×D as the source domain and target domain data, where D is the data dimension, N S and N T represent the numbers of the source domain and target domain samples. Y S = y 1 s , y 2 s , · · · , y N S s ∈ R N S ×C is a label matrix that contains all the class-label vectors of the source domain data, where one-hot coding (mainly uses n-bit status to encode N states. Each state is independent and only one bit is effective at any time) was used for each label vector, and C is the number of classes. In order to reduce the task complexity, time, and material expenditures, we tried to minimize the sizes of both calibration samples and associated sample categories. Here, we set the category size of the calibration samples to one, the minimum value we can access. We defined T S (n×D) ⊂ X S and T T (n×D) ⊂ X T as the calibration samples with a unique class label in the source and target domains, respectively, where n is a preset number of the drift correction samples (transfer samples). Moreover, we selected a first-order linear decision function to conduct classification in terms of its simple structure and low computational loads. P S , P T ∈ R D×C were two weight matrices should be solved in decision functions for the source and target domains, respectively. Additionally, we have adopted (·) T , · F , and · * to represent the transpose operator, frobenius, and nuclear norms, respectively.

Transfer-Sample-Based Coupled Task Learning
Transfer-sample-based coupled task learning (TCTL) [27] aims to learn a prediction model for E-nose drift samples through a small number of transfer samples (drift correction samples). It is a typical cost-saving drift compensation method, and its objective function can be represented as a loss function as follows: where β S , β T ∈ R D are the source domain and target domain prediction models, respectively. w j is the deviation of j-th sample between the source and target domains. λ, λ 1 , and λ 2 are term coefficients. In Formula (2), the first term is used to guarantee the correctness of β S ; the second and third items try to keep the similarity between two domains via recognition results and prediction models; the last one is a Tikhonov regularization term, which restores predictor information of source domain to the one of target domain. As a result, β T can be solved from Formula (2) as a linear predictor for drift data.

Transfer-Sample-Based Multiple Task Learning
In Zhang et al. [28], an improved model named transfer-sample-based multiple task learning (TMTL) was proposed by slacking the third term in TCTL's objective function. Then, the objective function of TMTL can be represented as where N T and n are numbers of transfer samples and source domain samples, respectively. Afterwards, a standard analytical solving process can be performed, obtaining a closedform expression of β T as a calibrated classifier.

Proposed Methodology
Both TCTL and TMTL demand multiple categories of drift correction samples, which causes extra payment of experimental materials and workloads. Therefore, we have attempted to use one-category drift correction samples for E-nose predictor updating.

Loss Function Formulation
We aim to establish a comprehensive loss function by multi-task learning, which helps us to gain optimized P S and P T that projecting initial training and drift samples to a label space. Several essential demands have been considered in the modeling with one-class correction samples.
Demand 1: empirical prediction error. The class labels of the source domain samples can be predicted by the first-order linear modelŶ S = X S P S , whereŶ S is the estimated form of Y S . Accordingly, we can minimize the empirical prediction error by Demand 2: rank of one-class transfer samples' labels. The computed labels of the transfer samples in the source and target domains can be, respectively, expressed aŝ Y T S = T S P S andŶ T T = T T P T , whereŶ T S ,Ŷ T T ∈ R n×C . Reasonably, bothŶ T S andŶ T T should be low-rank matrices since the transfer samples are all belonging to a single class (one-hot encoding was used for classification outputs). Lower rank indicates a much purer category of the transfer samples. To maintain the class uniformity, we presented the formulation as follows: Demand 3: prediction error of one-class transfer samples between source and target domains. We should guarantee the prediction correctness of the one-class transfer samples in both source and target domains via P S and P T . In other words, ideally, the Chemosensors 2021, 9, 208 7 of 13 predicted labels of the transfer samples should be equal on all two domains. To achieve this goal, we minimized the prediction error of the transfer samples as follows: Demand 4: dependency between samples and their class labels. In theory, identical distributions in label space lead to similar data locations in feature space, that is, sample distributions are correlated with associate class labels. Therefore, we introduced the maximum dependency criterion (MDDM) [29] maximizing the dependency between the one-class transfer samples (T S and T T ) and their class labels (L S and L T ) by where H = I − 1 N ee T , e is an all-one column vector. Demand 5: correlation between the decision functions of different domains. The task of the decision functions P S and P T is to recognize discrepancy and correlation data from the source and target domains. It is bound to generate similar P S and P T . Thus, a certain degree of similarity between P S and P T must be reserved as follows: Total loss function: combining Demand 1-5 (represented by Formulas (4)-(8)), we can obtain a loss function named one-class drift compensation model (ODCM) as follows: minLoss(P S , P T ) = min where λ, λ 1 , λ 2 , λ 3 , λ 4 > 0 are adjustable coefficients for the terms of the ODCM model, P S F and P T F are two regular terms used to prevent overfitting. Based on Formula (9), both P S and P T can be determined. Finally, P T is the decision function to be solved, classifying drift samples X T by X T P T .

Solution
In order to gain a closed-form solution of P T , we primarily converted Formula (9) to the following formation: Then, we made partial derivatives of Formula (10) with respect to P S and P T . Letting the derivatives be 0, we can achieve: Chemosensors 2021, 9, 208 8 of 13 Therefore, we can consider Formulas (12) and (13) as a pair of equations with P S and P T . The closed-form solution of the ODCM model can be obtained by

Data Arrangement
We defined two settings (shown in Table 3) to restructure the drift datasets for various validation scenarios. In Table 3, K is the total number of batches, Setting 1 represents a short-term drift scenario with varied initial training samples and following drift samples in a relatively short period of time, while Setting 2 simulates a long-term scenario with fixed initial training samples and durative drift samples. Finally, we use "X-Y" to represent a certain scenario, in which X and Y are batch serial numbers corresponding to the initial training and drift samples, respectively. Table 3. Scenario setting.

Parameter Optimization
Before validation, a number of the pre-settable parameter should be optimized. For the proposed ODCM methodology, three types of parameters should be preset before usage: adjustable coefficient, number, and category of the one-class transfer samples. We used the grid search method to optimize the adjustable coefficients in the range [10 −4 , 10 4 ]. The grid size is flexible and selected from 10 −4 , 10 −3 , 10 −2 , 10 −1 , 1, 10, 10 2 , 10 3 according to parameter scales. We chose ethylene (the fourth category) as the one-class transfer samples for Dataset A, because it appeared in all batches with relatively small quantities. For Dataset B, considering all categories are equal in quantity, we chose pu'erh tea (the fourth category) as the one-class transfer samples arbitrarily. In terms of Dataset A, Batch 6 contained 29 transfer samples (the least number among all batches), which limited the transfer sample size up to 29 in following validation. Considering that more transfer samples provide more accurate drift information, we set n = 29 for Dataset A. For Dataset B, we set n = 9 due to the fact that nine transfer samples existed in all batches.

Reference Methods
We have employed three representative drift compensation methods as reference methods: common component PCA (CCPCA) [30], TCTL, and TMTL. All the three methods can be conducted with one-class transfer samples according to their principles, which ensures the fairness of the following evaluation. Among them, CCPCA is a classic measure to abstract signals from the drift background without any labeled drift correction samples. Considering CCPCA is a preprocessing method, we adopted a popular classification model named support vector machine (SVM) for recognition. We used the linear kernel for the adopted SVM due to the fast speed and satisfying performance. The penalty coefficient of SVM was set to 10 −4 after grid optimization. On the other hand, TCTL and TMTL are two state-of-the-art algorithm approaches based on transfer samples. Traditionally, CCPCA, TCTL, and TMTL adopt multi-class transfer samples during a drift compensation process. To adapt the one-class acquirement, we had to restrict the transfer samples of CCPCA, TCTL, and TMTL to one class with identical settings (category and quantity) to the proposed methodologies and name these methods CCPCA+, TCTL+, TMTL+. Specifically, all the algorithm parameters of one-class type methods were optimized as Section 3.1.2 illustrated. In addition, all the mentioned methodologies have been realized and implemented on Matlab 2018.

Recognition Results and Analysis
We assess the drift compensation performance of ODCM and other reference methods by drift sample recognition rate. Here, a higher recognition rate means a greater drift compensation effect. We have gathered all the recognition rates under different scenario settings and datasets in Tables 4-6. The ODCM method achieves the highest average recognition rate in both scenario settings on Dataset A. It infers that the ODCM is stronger in robustness than all the reference methods. From Table 4, the recognition rate of ODCM reaches 90.99% in Scenario "9-10". It is 12.57% higher than the runner-up method, TCTL. As well, in Scenario "1-10", ODCM gain a recognition score 91.33%, 11.19% higher than the second one (as shown in Table 5). Upon reference methods, the results demonstrate rare discrepancy between multi-class and one-class type methods, which is reasonable because these reference methods are designed for universal usages.
The recognition rate of each method on Dataset B is demonstrated in Table 6. Similarly, the recognition rate of the ODCM method is the favorite one under both scenario settings, 7.10% and 15.42% higher than the second-place methods at average recognition rate. It is clearly confirmed that the learned model by the proposed ODCM can reduce the negative effect of drift on recognition results.

Parameter Sensitivity Analysis
We intend to assess the suitability and robustness of the proposed methodology through the sensitivity analysis of the settable parameters. For the ODCM model, the adjustable coefficients {λ, λ 1 , λ 2 , λ 3 , λ 4 } and the number of transferred samples n are variable parameters of the model. In order to observe the performance impact of these two coefficients, we optimized them in the range: λ, λ 1 , λ 2 , λ 3 , λ 4 ∈ 10 k , k = −4, −3, . . . , 3, 4 , n = {0, 2, 4, . . . , 20} (Dataset A) and n = {0, 1, 2, . . . , 9} (Dataset B). If one coefficient varied, the others were fixed at the optimal value. We selected two representative scenarios, "3-4" of Dataset A and "S1-S3" of Dataset B, to observe the performance movement along with {λ, λ 1 , λ 2 , λ 3 , λ 4 } and n. The influences on the recognition rate of the adjustable coefficients {λ, λ 1 , λ 2 , λ 3 , λ 4 } are shown in Figure 3. It can be seen that the performance keeps stable in a wind range according to λ and λ 1 -λ 3 . But for λ 4 , the recognition accuracy fluctuates drastically, which shows that the corresponding regularization term ( P S F + P T F ) plays a vital role in this model. Additionally, parameter λ has the least impact on recognition accuracy. Figure 4 demonstrates the average recognition accuracy with the number of transfer samples n. The proposed ODCM methodology has the highest average accuracy in the settable range of n. If n increases, the average accuracy is also improved. When n reaches a certain degree, the recognition rate just shows a slight change. As a result, 8 and 4 were the most suitable choices for the number transfer samples considering computational cost and recognition performance for Dataset A and B, respectively.

Time Complex Analysis
Implementation efficiency is an important factor that needs to be evaluated. Primarily, we have compared the theoretical time complexity between the proposed ODCM and

Time Complex Analysis
Implementation efficiency is an important factor that needs to be evaluated. Primarily, we have compared the theoretical time complexity between the proposed ODCM and the reference methods. For CCPCA, we should perform PCA and classifier training processes simultaneously. Therefore, we can gain the computational complexity of CCPCA as follows: O CCPCA = O PCA d 2 n + d 3 + O SV M n sv 3 + n · n sv 2 + d · n · n sv (15) where d, n, and n sv are sample dimension, quantity, and support vector quantity. According to the principle of ODCM, its computational complexity is equivalent to the ones of TCTL and TMTL. Thus, we can achieve the computational complex relation as follows: Based on Formulas (15) and (16), The computational time per sample on Dataset B have been recorded to validate above theoretical analysis in Table 7. We conducted all the methods on a computational platform with the following configuration: CPU: Intel I5-8400 RAM: 8 GB Hard disk: 256 GB solid-state drive Operation system: Windows 10.
It can be seen from Table 7 that the execution time of CCPCA is much longer than other methods. This is because CCPCA requires the participation of SVM during training and testing, which is very time-consuming. However, the other three methods can give the predictor directly, so the execution time is greatly reduced. Through the average statistics, it can be known that the time complexity of TCTL, TMTL, and ODCM are at the same level, which is completely consistent with the deduction of Formulas (15) and (16).

Conclusions
In this study, a novel drift compensation manner named one-class calibration has been presented to simplify the category acquirement of drift correction sample. Based on the one-category assumption, we have proposed a specific machine learning model to learn a calibrated classifier. Moreover, we provided a closed-form solution acquisition method for the proposed model, which avoids the time-consuming iterative calculation. In addition, we used two drift datasets to validate the advantages of the proposed methodology, achieving the highest average recognition rate on one-class drift correction samples. Satisfied suitability and computational efficiency have been proven in parameter sensitivity and time complex analysis, respectively.