An Integrated Handheld Electronic Nose for Identifying Liquid Volatile Chemicals Using Improved Gradient-Boosting Decision Tree Methods

: The main ingredients of various odorous products are liquid volatile chemicals (LVC). In human society, identifying the type of LVC is the inner logic of many applications, such as exposing counterfeit products, grading food quality, diagnosing interior environments, and so on. The electronic nose (EN) can serve as a cost-effective, time-efﬁcient, and safe solution to LVC identiﬁcation. In this paper, we present the design and evaluation of an integrated handheld EN, namely SMUENOSEv2, which employs the NVIDIA Jetson Nano module for running the LVC identiﬁcation method. All components of SMUENOSEv2 are enclosed in a handheld case. This all-in-one structure makes it convenient to use SMUENOSEv2 for quick on-site LVC identiﬁcation. To evaluate the performance of SMUENOSEv2, two common odorous products, i.e., perfumes and liquors, were used as the samples to be identiﬁed. After sampling data preprocessing and feature generation, two improved gradient-boosting decision tree (GBDT) methods were used for feature classiﬁcation. Extensive experimental results show that SMUENOSEv2 is capable of identifying LVC with considerably high accuracies. With previously trained GBDT models, the time spent for identifying the LVC type is less than 1 s.


Introduction
Liquid volatile chemicals (LVC) are chemicals that can be volatilized from the original liquid form to gaseous form by themselves or external blowing.In daily life, the main ingredients of multiple beverages, flavorings, and cosmetics are LVC.For example, we can smell the fragrance when getting close to a woman wearing perfume, since the fragrant constituents of perfume have been volatilized into fragrant gas and carried into our nose.As the electronic counterpart of the human nose, many types of chemical sensors have been developed, including metal oxide semiconductor (MOS) [1], quartz crystal microbalance [2], electrochemical [3], optical [4], catalytic combustion [5], gravimetric [6], and carbon nanotubes [7] sensors.All these types of chemical sensors can be used to detect the existence of LVC by measuring the chemical concentration.However, it is hard to discriminate between different types of LVC or to identify the LVC type solely using an individual chemical sensor.As being defined in [1], a single chemical sensor is an element of a basic electronic circuit that senses the chemical concentration fluctuations and outputs valid values for further processing.
Identifying the LVC type can help in distinguishing counterfeit products [8], protecting environments [9], monitoring food quality [10], and so on.Electronic techniques for LVC identification include gas chromatography (GC) [11], GC combined with mass spectrometry (GC-MS) [12,13], and EN [14].GC comprises the sample inlet, chromatographic column, thermostat, and detector.The sample inlet and detector are mounted at the start and end of the chromatographic column, respectively.Being driven by the standard carrier gas, which novel statistics combination is proposed.Extensive real experiments were conducted to verify the applicability of SMUENOSEv2.
Compared with our previous divide-body EN used in the work [14], which does not contain the detail about EN design, the novelties of SMUENOSEv2 mainly comprise the integrated EN design scheme and new gas route.The MOS sensors in SMUENOSEv2 remained the same as those used in the previous version.However, these sensors are readily replaceable with other voltage-output MOS chemical sensors.The reason for selecting MOS sensors is that, according to the authors' experimental experience, MOS sensors have the merit of fast responsiveness.Moreover, compared with other sensor types, there are more cost-effective MOS sensors available in commercial markets.To our knowledge, SMUENOSEv2 is the first integrated EN using the NVIDIA Jetson Nano as the computing kernel, which demonstrates the originality of this paper.The main contribution of this paper is three-fold.First, the presented integrated EN design can provide important guidance for on-site quick LVC identification, e.g., paroxysmal inspection of counterfeit perfumes at the cosmetics counter in shopping arcades.Second, the experimental quantitative comparison results of the two improved GBDT methods are valuable references for further related investigation.Third, the presented statistics combination for feature generation can be considered as a time-efficient solution to EN-based LVC identification.
The rest of this paper is organized as follows: Section 2 details the design of our new EN system, and introduces the LVC identification methods and experimental setups.Section 3 presents the experimental results and discussions.Section 4 concludes the whole paper.

Integrated Handheld EN
The operation scene of using SMUENOSEv2 for LVC identification is shown in Figure 1.All components of SMUENOSEv2, such as volatilization, sampling, powering, and computing units, were enclosed in an 18.5 cm × 12.0 cm × 11.0 cm cuboid case.The net weight of SMUENOSEv2 is about 1.0 kg.Thus, it is convenient to hold SMUENOSEv2 by hand and conduct the LVC identification operation.
Electronics 2023, 12, 79 3 of 21 component analysis [33], support vector machine [34], artificial neural network [35], and so on, are higher identification accuracies in terms of mean and variance.For feature generation, a novel statistics combination is proposed.Extensive real experiments were conducted to verify the applicability of SMUENOSEv2.Compared with our previous divide-body EN used in the work [14], which does not contain the detail about EN design, the novelties of SMUENOSEv2 mainly comprise the integrated EN design scheme and new gas route.The MOS sensors in SMUENOSEv2 remained the same as those used in the previous version.However, these sensors are readily replaceable with other voltage-output MOS chemical sensors.The reason for selecting MOS sensors is that, according to the authors' experimental experience, MOS sensors have the merit of fast responsiveness.Moreover, compared with other sensor types, there are more cost-effective MOS sensors available in commercial markets.To our knowledge, SMUENOSEv2 is the first integrated EN using the NVIDIA Jetson Nano as the computing kernel, which demonstrates the originality of this paper.The main contribution of this paper is three-fold.First, the presented integrated EN design can provide important guidance for on-site quick LVC identification, e.g., paroxysmal inspection of counterfeit perfumes at the cosmetics counter in shopping arcades.Second, the experimental quantitative comparison results of the two improved GBDT methods are valuable references for further related investigation.Third, the presented statistics combination for feature generation can be considered as a time-efficient solution to EN-based LVC identification.
The rest of this paper is organized as follows: Section 2 details the design of our new EN system, and introduces the LVC identification methods and experimental setups.Section 3 presents the experimental results and discussions.Section 4 concludes the whole paper.

Integrated Handheld EN
The operation scene of using SMUENOSEv2 for LVC identification is shown in Figure 1.All components of SMUENOSEv2, such as volatilization, sampling, powering, and computing units, were enclosed in an 18.5 cm×12.0cm×11.0cm cuboid case.The net weight of SMUENOSEv2 is about 1.0 kg.Thus, it is convenient to hold SMUENOSEv2 by hand and conduct the LVC identification operation.The structure block diagram of SMUENOSEv2 is shown in Figure 2. The physical blocks of SMUENOSEv2 comprise gas transportation and electronic hardware blocks, which were denoted in Figure 2 as dashed and solid boxes, respectively.The gas route components are used to compose the gas routes needed during the PI process.Electronic which were denoted in Figure 2 as dashed and solid boxes, respectively.The gas route components are used to compose the gas routes needed during the PI process.Electronic hardware blocks are responsible for all electronic functions.Moreover, apart from the physical blocks, the normal operation of our EN also relies on the software components.Sections 2.1.1-2.1.3detail the design of gas routes, electronic hardware, and software components, respectively.

Gas Route Design
Figure 3 shows the gas route of SMUENOSEv2.At the beginning of LVC identification, the LVC sample was injected into the volatilization pot.Then, fast air flow is generated using the air pump to accelerate the volatilization of LVC sample in the volatilization pot.The volatilized gaseous LVC sample is then carried to pass through the three-way valve and gas chamber, which are linked by silicone rubber tubes.In the meantime, by controlling the three-way valve, the LVC flow can be switched between gas route branches 1 and 2, which can in turn change the EN's working mode between sampling and washing modes.SMUENOSEv2 can work in two different modes: sampling mode and washing mode.It is readily seen that SMUENOSEv2 only employs one valve.The gas routes with respect to the two modes can be described as follows:

•
Gas route in the sampling mode: As preparation for sampling, a fixed volume of LVC sample was dripped into the volatilization pot in advance.The fast airflow can accelerate the LVC volatilization process and drive the gaseous LVC towards the threeway valve.Through gas route branch 1, gaseous LVC can get into contact with the gas sensors mounted in the gas chamber and stimulate the sensor responses.

•
Gas route in the washing mode: After each sampling time, some LVC residuals still clung to the gas route components, including volatilization pot, three-way valve, gas chamber, and the silicon tubes.To reduce the negative influence among different types of LVC samples, the EN should be thoroughly washed between successive sampling spans.The washing process comprises two steps: First, by switching to gas route branch 2, the EN enters the washing mode.The volatilization pot, three-way valve, and upstream silicon tubes can be washed by clean air flow.Second, by switching back to gas route branch 1, the upstream clean air is driven to wash the gas chamber and downstream silicon tubes.

Gas Route Design
Figure 3 shows the gas route of SMUENOSEv2.At the beginning of LVC identification, the LVC sample was injected into the volatilization pot.Then, fast air flow is generated using the air pump to accelerate the volatilization of LVC sample in the volatilization pot.The volatilized gaseous LVC sample is then carried to pass through the three-way valve and gas chamber, which are linked by silicone rubber tubes.In the meantime, by controlling the three-way valve, the LVC flow can be switched between gas route branches 1 and 2, which can in turn change the EN's working mode between sampling and washing modes.which were denoted in Figure 2 as dashed and solid boxes, respectively.The gas route components are used to compose the gas routes needed during the PI process.Electronic hardware blocks are responsible for all electronic functions.Moreover, apart from the physical blocks, the normal operation of our EN also relies on the software components.Sections 2.1.1-2.1.3detail the design of gas routes, electronic hardware, and software components, respectively.

Gas Route Design
Figure 3 shows the gas route of SMUENOSEv2.At the beginning of LVC identification, the LVC sample was injected into the volatilization pot.Then, fast air flow is generated using the air pump to accelerate the volatilization of LVC sample in the volatilization pot.The volatilized gaseous LVC sample is then carried to pass through the three-way valve and gas chamber, which are linked by silicone rubber tubes.In the meantime, by controlling the three-way valve, the LVC flow can be switched between gas route branches 1 and 2, which can in turn change the EN's working mode between sampling and washing modes.SMUENOSEv2 can work in two different modes: sampling mode and washing mode.It is readily seen that SMUENOSEv2 only employs one valve.The gas routes with respect to the two modes can be described as follows:

•
Gas route in the sampling mode: As preparation for sampling, a fixed volume of LVC sample was dripped into the volatilization pot in advance.The fast airflow can accelerate the LVC volatilization process and drive the gaseous LVC towards the threeway valve.Through gas route branch 1, gaseous LVC can get into contact with the gas sensors mounted in the gas chamber and stimulate the sensor responses.

•
Gas route in the washing mode: After each sampling time, some LVC residuals still clung to the gas route components, including volatilization pot, three-way valve, gas chamber, and the silicon tubes.To reduce the negative influence among different types of LVC samples, the EN should be thoroughly washed between successive sampling spans.The washing process comprises two steps: First, by switching to gas route branch 2, the EN enters the washing mode.The volatilization pot, three-way valve, and upstream silicon tubes can be washed by clean air flow.Second, by switching back to gas route branch 1, the upstream clean air is driven to wash the gas chamber and downstream silicon tubes.SMUENOSEv2 can work in two different modes: sampling mode and washing mode.It is readily seen that SMUENOSEv2 only employs one valve.The gas routes with respect to the two modes can be described as follows:

•
Gas route in the sampling mode: As preparation for sampling, a fixed volume of LVC sample was dripped into the volatilization pot in advance.The fast airflow can accelerate the LVC volatilization process and drive the gaseous LVC towards the three-way valve.Through gas route branch 1, gaseous LVC can get into contact with the gas sensors mounted in the gas chamber and stimulate the sensor responses.

•
Gas route in the washing mode: After each sampling time, some LVC residuals still clung to the gas route components, including volatilization pot, three-way valve, gas chamber, and the silicon tubes.To reduce the negative influence among different types of LVC samples, the EN should be thoroughly washed between successive sampling spans.The washing process comprises two steps: First, by switching to gas route branch 2, the EN enters the washing mode.The volatilization pot, three-way valve, and upstream silicon tubes can be washed by clean air flow.Second, by switching back to gas route branch 1, the upstream clean air is driven to wash the gas chamber and downstream silicon tubes.

Electronic Hardware Design
The electronic hardware structure of SMUENOSEv2 is shown in Figure 4.In terms of functionality, the components in Figure 3 can be categorized into sensing, sampling, computing, displaying, and powering components.The design of these components is detailed as follows.The electronic hardware structure of SMUENOSEv2 is shown in Figure 4.In terms of functionality, the components in Figure 3 can be categorized into sensing, sampling, computing, displaying, and powering components.The design of these components is detailed as follows.The eight different gas sensors mounted in SMUENOSEv2 are MiCS-6814, MiCS-5914, and MiCS-5521 from SGX SENSORTECH Corporation, as well as TGS-2620, TGS-2602, TGS-2600, TGS-2611, and TGS-8100 from Figaro Engineering Inc.All these sensors are MOS sensors, which enable fast response to the contact with the gaseous LVC.The criteria of selecting these sensors are two-fold: reactivity to more target gases, and selective sensitivities for different gases.According to the sensors' datasheets, typical compositions of common volatile chemicals, such as ethanol, carbon monoxide, and iso-butane, can be detected by the selected sensors.In addition, the sensitivity of different selected sensors to an individual target gas are different.For example, the sensitivity coefficient of six selected sensors to the target gases are shown in Figure 5.It is readily seen that the vertexes of each sensor's radar plot differ significantly.The largest sensitivity coefficients of different sensors are different, which means the most sensitive gas of the selected sensors is different.Moreover, the different sensitivity coefficients with respect to an individual gas are different.It is expected that the sensing voltages detected by the selected sensors provide sufficient useful distinction information for subsequent identification subprocesses.The eight different gas sensors mounted in SMUENOSEv2 are MiCS-6814, MiCS-5914, and MiCS-5521 from SGX SENSORTECH Corporation, as well as TGS-2620, TGS-2602, TGS-2600, TGS-2611, and TGS-8100 from Figaro Engineering Inc.All these sensors are MOS sensors, which enable fast response to the contact with the gaseous LVC.The criteria of selecting these sensors are two-fold: reactivity to more target gases, and selective sensitivities for different gases.According to the sensors' datasheets, typical compositions of common volatile chemicals, such as ethanol, carbon monoxide, and iso-butane, can be detected by the selected sensors.In addition, the sensitivity of different selected sensors to an individual target gas are different.For example, the sensitivity coefficient of six selected sensors to the target gases are shown in Figure 5.It is readily seen that the vertexes of each sensor's radar plot differ significantly.The largest sensitivity coefficients of different sensors are different, which means the most sensitive gas of the selected sensors is different.Moreover, the different sensitivity coefficients with respect to an individual gas are different.It is expected that the sensing voltages detected by the selected sensors provide sufficient useful distinction information for subsequent identification sub-processes.
The electronic hardware structure of SMUENOSEv2 is shown in Figure 4.In terms of functionality, the components in Figure 3 can be categorized into sensing, sampling, computing, displaying, and powering components.The design of these components is detailed as follows.The eight different gas sensors mounted in SMUENOSEv2 are MiCS-6814, MiCS-5914, and MiCS-5521 from SGX SENSORTECH Corporation, as well as TGS-2620, TGS-2602, TGS-2600, TGS-2611, and TGS-8100 from Figaro Engineering Inc.All these sensors are MOS sensors, which enable fast response to the contact with the gaseous LVC.The criteria of selecting these sensors are two-fold: reactivity to more target gases, and selective sensitivities for different gases.According to the sensors' datasheets, typical compositions of common volatile chemicals, such as ethanol, carbon monoxide, and iso-butane, can be detected by the selected sensors.In addition, the sensitivity of different selected sensors to an individual target gas are different.For example, the sensitivity coefficient of six selected sensors to the target gases are shown in Figure 5.It is readily seen that the vertexes of each sensor's radar plot differ significantly.The largest sensitivity coefficients of different sensors are different, which means the most sensitive gas of the selected sensors is different.Moreover, the different sensitivity coefficients with respect to an individual gas are different.It is expected that the sensing voltages detected by the selected sensors provide sufficient useful distinction information for subsequent identification subprocesses.The voltage sampling front-end circuit shown in Figure 6 is used in SMUENOSEv2.As shown in Figure 6, a load resistor R L is connected with the sensor's sensing resistor R s in series.All the power supply voltage values V sup are 5 V.By measuring the voltage u L , the sensing resistance value can be calculated as follows: The voltage sampling front-end circuit shown in Figure 6 is used in SMUENOSEv2.As shown in Figure 6, a load resistor  is connected with the sensor's sensing resistor  in series.All the power supply voltage values Vsup are 5 V.By measuring the voltage  , the sensing resistance value can be calculated as follows: Thus, the gas concentration measurement problem becomes the problem of sampling the voltage  of the load resistor.To amplify the voltage variation, improve the load capacity, and filter the measurement noises, an integrated operation amplifier is used to form a low-pass filtering amplifier circuit.As shown in Figure 6, the voltage  is connected to a typical RC low-pass filter [36] before being passed to the in-phase end of the integrated operation amplifier A. The cut-off frequency of the low-pass filter composed of  and C is which means the input voltage's constituents with a frequency higher than f0 will be significantly suppressed.In SMUENOSEv2, the value of C and R2 are 2.2 uF and 16 kΩ, respectively.Afterwards, the filtered voltage  is transmitted into an in-phase proportion amplifying circuit composed of A,  , and  .The amplified voltage uo can be calculated as Apart from the gas sensors, the parameters of other components in the circuit shown in Figure 6 were empirically determined to keep the value of uo within the valid range of AD7606.As an 8-channel analog-to-digital data acquisition chip, a piece of AD7606 was used to simultaneously sample the output of the 8 sensors' front-end circuits.The sampling data was sent through the parallel interface to STM32F407VGT6, which was also used to control the three-way valve, air pump, and the timer chip DS-1338.The timer chip was used to provide accurate time information for controlling the sampling duration.
The sampling data was then sent to NVIDIA Jetson Nano.A touch-sensitive monitor was used to interact with Jetson Nano.The identification results of Jetson Nano were displayed in the monitor.By touching the monitor's display screen, user commands can be sent to Jetson Nano.Last but not the least, a voltage converter component, which was The front-end circuit for sampling the i-th sensor's sensing voltage.IOA is an integrated operation amplifier.V sup is the power supply voltage of the sensors.
Thus, the gas concentration measurement problem becomes the problem of sampling the voltage u L of the load resistor.To amplify the voltage variation, improve the load capacity, and filter the measurement noises, an integrated operation amplifier is used to form a low-pass filtering amplifier circuit.As shown in Figure 6, the voltage u L is connected to a typical RC low-pass filter [36] before being passed to the in-phase end of the integrated operation amplifier A. The cut-off frequency of the low-pass filter composed of R 2 and C is which means the input voltage's constituents with a frequency higher than f 0 will be significantly suppressed.In SMUENOSEv2, the value of C and R 2 are 2.2 uF and 16 kΩ, respectively.Afterwards, the filtered voltage u L is transmitted into an in-phase proportion amplifying circuit composed of A, R 3 , and R 4 .The amplified voltage u o can be calculated as Apart from the gas sensors, the parameters of other components in the circuit shown in Figure 6 were empirically determined to keep the value of u o within the valid range of AD7606.As an 8-channel analog-to-digital data acquisition chip, a piece of AD7606 was used to simultaneously sample the output of the 8 sensors' front-end circuits.The sampling data was sent through the parallel interface to STM32F407VGT6, which was also used to control the three-way valve, air pump, and the timer chip DS-1338.The timer chip was used to provide accurate time information for controlling the sampling duration.
The sampling data was then sent to NVIDIA Jetson Nano.A touch-sensitive monitor was used to interact with Jetson Nano.The identification results of Jetson Nano were displayed in the monitor.By touching the monitor's display screen, user commands can be sent to Jetson Nano.Last but not the least, a voltage converter component, which was bought from the commercial market, is incorporated to convert the 12 V voltage from the battery to stable 5 V and 3.3 V voltages for the above-mentioned components.

Software Design
The software of SMUENOSEv2 comprises three software components: perception, pattern recognition, and human-machine-interface (HMI) components.The perception component is composed of codes in C programming language running on the STM32, while the other two components were codes in Python programming language running on the NVIDIA Jetson Nano board.
Figure 7 shows the flow charts of the perception and pattern recognition components, which interact with each other through the serial port.The SAMPLING command is originally sent from the HMI component to the pattern recognition component, and then sent to the perception component.Once the SAMPLING command is received, the ADC chip AD7606 begins to sample the sensing voltages of the eight sensors with a sampling frequency of 150 Hz.A single sampling cycle lasts for a predefined time span.The sampling duration is determined by periodically reading the current time output by DS1338.After an individual sampling cycle, the received sampling data is saved in a matrix.If the sampling data is valid, a SAVING command can be sent from the HMI component, which can trigger the storage of the data matrix in an Excel file.In real applications, to identify a certain LVC sample, the sampling subprocess is followed by an identification subprocess.The data obtained in an individual sampling cycle is used as the test dataset.For model learning, the training subprocess loads multiple previously saved Excel data files to form the training dataset.The detail of training and identification subprocesses is related to the specific employed ML methods.
pattern recognition, and human-machine-interface (HMI) components.The pe component is composed of codes in C programming language running on the while the other two components were codes in Python programming language on the NVIDIA Jetson Nano board.
Figure 7 shows the flow charts of the perception and pattern recognition comp which interact with each other through the serial port.The SAMPLING command inally sent from the HMI component to the pattern recognition component, and t to the perception component.Once the SAMPLING command is received, the A AD7606 begins to sample the sensing voltages of the eight sensors with a samp quency of 150 Hz.A single sampling cycle lasts for a predefined time span.The s duration is determined by periodically reading the current time output by DS133 an individual sampling cycle, the received sampling data is saved in a matrix.If t pling data is valid, a SAVING command can be sent from the HMI component, w trigger the storage of the data matrix in an Excel file.In real applications, to id certain LVC sample, the sampling subprocess is followed by an identification sub The data obtained in an individual sampling cycle is used as the test dataset.Fo learning, the training subprocess loads multiple previously saved Excel data files the training dataset.The detail of training and identification subprocesses is relate specific employed ML methods.2. Sector II: the sampling operation sector.By pushing the "Sampling" and "Sav tons in sector II, the SAMPLING and SAVING commands can be sent to the p recognition components, respectively.As samples to be saved for forming the ing set, the sample type is previously known.For example, to form the trainin in real applications, multiple samples with known type should be collected certified LVC in advance.This prerequisite can be satisfied by selecting the righ in the ComboBox next to the "Save" button.Moreover, since SMUENOSEv2 w signed to be capable of identifying different liquid or gaseous volatile organi pounds, it is important to set the sample classes (e.g., perfume, wine, herb) in t ComboBox, before sending the SAVING command.

2.
Sector II: the sampling operation sector.By pushing the "Sampling" and "Save" buttons in sector II, the SAMPLING and SAVING commands can be sent to the pattern recognition components, respectively.As samples to be saved for forming the training set, the sample type is previously known.For example, to form the training sets in real applications, multiple samples with known type should be collected using certified LVC in advance.This prerequisite can be satisfied by selecting the right type in the ComboBox next to the "Save" button.Moreover, since SMUENOSEv2 was designed to be capable of identifying different liquid or gaseous volatile organic compounds, it is important to set the sample classes (e.g., perfume, wine, herb) in the left ComboBox, before sending the SAVING command.

3.
Sector III: the identification and training operation sector.Empirically, the identification and training operations are mutual exclusive, and, thus, are placed in two different tabs sharing sector III.The operation processes for training and identification are detailed as follows: • By pushing the "Training" button in the bottom-right corner of Figure 8b, an individual training process can be activated.The training dataset and ML method should be selected before pushing the "Training" button.In the central area of this sector, a table lists all previously saved sensing voltage samples.By clicking the table items, multiple samples can be selected to form the training dataset.The ML method used for training can be selected in the top-right ComboBox.Moreover, the hyper-parameters of the employed method can be set by pasting the specially formatted parameter values in the TextBox above the "Training" button.Finally, the resulting trained ML model, which can be used for identification, is saved in the external memory of NVIDIA Jetson Nano.

•
By pushing the "Identification" button in the bottom-right corner of Figure 8a, an individual identification process can be activated.As mentioned in the previous paragraph, the identification process follows the sampling process, and uses the newly collected samples as input.Moreover, as the identification tool, a previously trained and saved ML model should be selected from the model lists before pushing the "Identification" button.During the identification process, the process status is shown in the blank space at the left of the "Identification" button.Finally, the identification result is displayed in the blank space above the model lists.

Improved GBDT
To evaluate the performance of SMUENOSEv2, two improved GBDT methods, i.e., XG-Boost [31] and LightGBM [32], were implemented in SMUENOSE2 to realize LVC identification.For coherence and clarity, XGBoost and LightGBM are in turn introduced.Since the two improved GBDT methods were developed based on GBDT, GBDT is also sketched as follows.

GBDT
GBDT [30] is a framework originally designed for function regression.It begins with a set of data {(x i , y i )} and an unsatisfactory model f (x), which cannot accurately repeat the relationship between the data.In the k-th iteration of GBDT, an additional model h k (x) with the form of a regression tree is constructed by fitting the residuals y i −f k-1 (x i ).Inspired by the fact that the true value equals the sum of residual and current model output, the new model fitting the residuals is expected to compensate for the inaccuracy of current model.However, the new model h k (x) can only approximate the theoretical residuals, which means the new additive model f k−1 (x i ) + h k (x) is still not satisfactory.Thus, the sub-sequent iterations are sequentially conducted to further improve the accuracy of the additive model.Moreover, in order to avoid overfitting, a learning rate η, 0 < η < 1, is incorporated in the additive model to prevent full optimization in each step.The additive mode in GBDT turns into: The above residual-fitting process is correlated with gradients by defining the square loss: Then, the problem transfers to minimization of J = ∑ i L(y i , f (x i )) by adjusting f (x 1 ), f (x 2 ), . . ., f (x n ).Taking f (x i ) as parameters, the gradient can be represented as follows: According to Equation ( 4), the residuals defined in GBDT can be interpreted as negative gradients.Thus, the model in GBDT is actually updated using the gradient descent method.In general cases, the residuals and square loss are replaced with negative gradients and any differentiable loss function, respectively.
For solving multi-class classification problems using GDBT, the label information is turned into a true probability distribution.For example, if the label of the i-th instance is C, then the probability distribution Y L (x i ) can be represented as follows: Afterwards, the problem turns into the regression of the true probability distribution function.The KL-divergence between the true and predicted probability distributions is considered as the loss function in GBDT.

XGBoost
In XGBoost [31], the implementation of GBDT is mainly increased in three aspects: incorporating a regularization term in the objective function, usage of second-order gradient statistics in the loss function approximation, and providing an approximation algorithm for split finding in the decision tree construction.The three aspects are sketched as follows: 1. Incorporating a regularization term in the objective function.
The incorporated regularization term serves as a measure of the model complexity.Instead of directly minimizing the loss function, XGBoost utilizes the sum of loss function and the regularization term as the objective function for minimization.As a measure of model complexity, the minimization of regularization term can help avoid the over-fitting problem, which means the learned model fits well on the training data but badly on the test data.Empirically, simpler models are prone to demonstrating smaller performance variance between training and test data sets.The regularization term with l 2 norm in XGBoost can be represented as follows: where γ and λ are two constant coefficients, and w and T are the leaf score and the number of tree leaves, respectively.2. Usage of second-order gradient statistics in the loss function approximation.
Inspired by the Taylor expansion, the first-order and second-order gradients are used to construct a simplified objective function as follows: where g i and h i are the first-order and second-order partial derivative of the loss function to f k (x i ).Compared with only using the negative first-order gradient in GBDT, the secondorder gradients used in XGBoost contain more information about the loss function, and thus, enable more accurate approximation of the loss function.
3. The approximation algorithm for split finding in the decision tree construction.
Traditional exact greedy algorithm used for split finding in the decision tree construction should enumerate all the possible splits, which is time-consuming.To adapt to large training datasets and resource-constrained applications, XGBoost provides an approximation algorithm for split finding.Main steps of the approximation algorithm can be summarized as follows: (1) For an individual feature, all the possible split points are mapped into buckets according to the percentile of feature distribution.(2) Calculate the cumulative first-order and second-order gradients for each bucket.
(3) Consider the bucket with the largest cumulative gradient statistics as the optimal bucket.Then, the split point with the largest gradient statistics in the optimal bucket is selected as the final split point.
Apart from the above three aspects, XGBoost also additionally utilizes the column subsampling technique in Random Forest to further prevent over-fitting and accelerating the computation.

LightGBM
Aiming at increasing the efficiency of GBDT, LightGBM [32] reduces the volume of employed training sample data with two innovation techniques: gradient-based oneside sampling (GOSS), and exclusive feature bundling (EFB).In GOSS, the under-trained samples are paid more attention, on the premise of only slightly changing the original sample data distribution.First, the sample data are sorted in a descending order of the absolute gradients.Then, the sample dataset is divided into two parts: the part of a × n, 0 < a < 1, samples with the largest absolute gradients; and the part of other samples.All samples in the first part, and b × 100%, 0 < b < 1 of other samples are used in the training of next decision tree.
EFB reduces the training data volume by decreasing the number of features engaged in the training process.This is realized by bundling the exclusive features, which seldom get non-zero values at the same time, into a single feature.The recognition of exclusive features is modeled as a reduced graph-coloring problem, which is then solved using a greedy algorithm.To merge the exclusive features in the same bundle, the feature bundle is created by adding offsets to the original feature values of exclusive features to make sure they are placed in different bins.
Moreover, LightGBM utilized a histogram-based algorithm to improve the efficiency in both running speed and storage consumption.For split finding in the decision tree construction, histogram-based algorithm transfers continuous feature values into discrete values, and places them in different bins.Then, the bins are used to construct feature histograms in the training process.In LightGBM, the combination of GOSS and EFB with the histogram-based split finding serves as an efficient solution to ignore the features with zero values.

Experimental Setup
In our experiments, we tested two common sets of LVC: perfume and liquor.Six different perfumes of the same brand "Scent Library" and four liquors of different brands were used as samples in the series of perfume and liquor experiments, respectively.The model names, abbreviates, and main compositions of these samples are listed in Table 1.The two series of experiments were both conducted on our newly designed E-nose platform SMUENOSEv2.Thus, the sampling and identification processes for them are the same.To obtain the raw samples, each of the specific perfume and liquor samples was sampled for 50 cycles, which means a total of 500 sampling cycles were conducted.In a single sampling cycle, 1 uL of the LVC material was injected into the E-nose's volatilization pot.Then, the air pump was activated to accelerate the volatilization of the LVC material and carry the volatilized gases through the gas chamber.Stimulated by the gaseous LVC, the fast-fluctuating sensing voltages of the eight sensors were sampled and recorded.The time span of each sampling cycle was set as 100 s and 60 s for the perfume and liquor experiments, respectively.The sampling cycle for perfume was longer, since we found the sensors' recovering time in perfume experiments was longer than that in liquor experiments.This phenomenon is probably because the perfume samples can stick to the gas route for longer time.
The obtained raw measurements were then preprocessed.Afterwards, multiple features were generated based on the preprocessed measurements.Finally, the generated features were used as the training and test data to evaluate the LVC identification per-formance of SMUENOSEv2 via the two ensemble learning methods.Specifically, 10-fold cross-validation experiments were conducted: The feature set was randomly divided into 10 equal parts.Each part was in turn used as the test set while the rest of the parts were used as the training set.Thus, in each group of 10-fold cross-validation experiments, we conducted 10 experiments with different training and test data sets.With respect to each training set, the optimal combination of model parameters was selected during the model training process.The preprocessing, feature generation, and parameter selection processes are detailed as follows.

Preprocessing
First, the outliers in the raw measurements, which were mainly caused by the transmission errors in the high-speed serial communication, were removed based on the mean and standard deviation.If the difference between a voltage measurement u and the mean voltage is bigger than 5 times of the standard deviation, then all the values obtained at the same time as u were deleted.
Second, the high-frequency measurement noises, which were mainly introduced by the circuits, were removed using a low-pass filtering algorithm.As an important supplement of the hardware low-pass filter shown in Figure 6, a second-order Butterworth low-pass software filter with a cut-off frequency of 0.016 half-cycles/sample was constructed.
Third, the base-line voltages, which means the voltages obtained right before the sensors got into contact with the gaseous LVC material, were subtracted from all the 15,000 voltage values in the corresponding sampling cycle.The base-line removement operation can help reduce the negative influence of background gas residuals.

Feature Generation
As mentioned in Section 2.1, the sampling frequency of AD7606 in SMUENOSEv2 was set as 150 Hz.Therefore, in an individual sampling cycle, 15,000 voltage values were obtained for each sensor, which means a total of 120,000 sensing voltage values were collected by the eight sensors in our E-nose.Due to the curse of dimensionality, it is not feasible to directly train the ML models with the raw sampling data, especially on the resource-limited NVIDIA Jetson Nano in SMUENOSEv2.Therefore, it is necessary to reduce the dimension of the training data.For the sake of efficiency, we chose to combine multiple statistics of the sampling data as the features, rather than rely on feature extraction algorithms, e.g., principal component analysis, to determine the features used for training.
As listed in Table 2, the statistics used in our experiments are the characteristics of the voltage curves in three different period: peak, rising, falling.The statistics were calculated based on the voltage u L (t), since it can be mapped into the sensing resistance in a oneto-one manner according to Equation (1).The characteristics in Table 2 were empirically selected, since they are capable of characterizing the voltage dynamics in a single sampling cycle.Based on the sampling data of a single sensor in each cycle, five feature values can be calculated according to the equations in Table 2. Forty features were obtained based on the measurements in each sampling cycle, since there are 8 different gas sensors in our E-nose platform.The forty feature values with respect to all 8 sensors were then combined and indexed in an end-to-end manner to form the feature vector, which means the feature dimension in our experiments is forty.gradients of preprocessed u L .All curves in Figure 9a demonstrate fast-rising and slowfalling dynamics.This phenomenon coincides with the fast reaction and slow recovery characteristic of MOS sensors.The peak values and time, falling speed, and the maximum first-order gradients are distinctive for different sensors.Thus, it is expected that the statistics in Table 2 contain specific "fingerprint" information about the corresponding LVC sample.
Based on the data obtained in 500 sampling cycles, we have conducted five groups of ten-fold cross-validation experiments for each combination of LVC type and identification method.Since we tested two LVC types (i.e., perfume and liquor) and two identification methods (i.e., XGBoost and LightGBM), we have conducted a total of 200 individual experiments with different training and test data sets.
To demonstrate the validation of using SMUENOSEv2 for sampling the transient voltage values stimulated by the volatized LVC, Figure 9 shows the value and first-order gradients of preprocessed  .All curves in Figure 9a demonstrate fast-rising and slowfalling dynamics.This phenomenon coincides with the fast reaction and slow recovery characteristic of MOS sensors.The peak values and time, falling speed, and the maximum first-order gradients are distinctive for different sensors.Thus, it is expected that the statistics in Table 2 contain specific "fingerprint" information about the corresponding LVC sample.
( The statistical results obtained in the perfume and liquor experiments are presented as follows.

Experiments in the Perfume Group
Figure 10a shows the boxplot of identification accuracies in the perfume experiments.The identification accuracy is defined as the proportion of correct identification instances.As mentioned in Section 2.1.3,in an identification instance, a previously trained model took the sample with the true type t as input, and then output a predicted type t' as the identification result.If the predicted type t' equals to the true type t, then the identification instance is considered as correct.Otherwise, the identification is considered as incorrect.As shown in Figure 10a, apart from the mean values, other boxplot elements are similar for the two tested methods.The identification accuracy means of XGBoost and LightGBM are 94.8% and 95.5%, respectively.All obtained identification accuracies are higher than 90%.The high identification accuracies verify that it is feasible to conduct perfume identification using our newly designed E-nose platform.
Figure 10b,c show the boxplot of time spent for training and identification, respectively.The considerably short training and identification time also verified the feasibility of using our resource-constrained integrated E-nose for perfume identification.The training times spent in all trials are shorter than 45 s.Compared with the identification times, which are shorter than 1 s, the training times are much longer.Fortunately, for on-site The statistical results obtained in the perfume and liquor experiments are presented as follows.

Experiments in the Perfume Group
Figure 10a shows the boxplot of identification accuracies in the perfume experiments.The identification accuracy is defined as the proportion of correct identification instances.As mentioned in Section 2.1.3,in an identification instance, a previously trained model took the sample with the true type t as input, and then output a predicted type t' as the identification result.If the predicted type t' equals to the true type t, then the identification instance is considered as correct.Otherwise, the identification is considered as incorrect.As shown in Figure 10a, apart from the mean values, other boxplot elements are similar for the two tested methods.The identification accuracy means of XGBoost and LightGBM are 94.8% and 95.5%, respectively.All obtained identification accuracies are higher than 90%.The high identification accuracies verify that it is feasible to conduct perfume identification using our newly designed E-nose platform.To thoroughly compare the identification accuracy with respect to different perfume types, the confusion matrices obtained in the perfume experiments are shown in Figure 11.The elements of confusion matrices are the times of experiments.As mentioned, to test each method based on the perfume samples, five groups of ten-fold cross-validation experiments were conducted, which means a total of 50 individual perfume experiments for each of XGBoost and LightGBM.Moreover, as mentioned in Section 2.3, each perfume type was sampled for 50 cycles.The 50 samples were divided into 10 equal parts, and an individual part with five samples was used as the test set in each perfume experiment.Thus, for each method, each perfume type was tested 250 times, which coincides with the fact that the elements in each row of the confusion matrices sum up to 250.The auxiliary diagonal elements stand for the times of correctly identified experiments.To thoroughly compare the identification accuracy with respect to different perfume types, the confusion matrices obtained in the perfume experiments are shown in Figure 11.The elements of confusion matrices are the times of experiments.As mentioned, to test each method based on the perfume samples, five groups of ten-fold cross-validation experiments were conducted, which means a total of 50 individual perfume experiments for each of XGBoost and LightGBM.Moreover, as mentioned in Section 2.3, each perfume type was sampled for 50 cycles.The 50 samples were divided into 10 equal parts, and an individual part with five samples was used as the test set in each perfume experiment.Thus, for each method, each perfume type was tested 250 times, which coincides with the fact that the elements in each row of the confusion matrices sum up to 250.The auxiliary diagonal elements stand for the times of correctly identified experiments.In both confusion matrices, the LBK perfume samples were all successfully identified.Compared with other perfume types, the MR perfume samples were incorrectly identified the most times.The differences among the identification accuracies of different perfume types mainly came from the different chemical constituents of the perfume samples, which are not the main concern of this paper.The numbers in both confusion matrices demonstrate the best and worst performance of using our E-nose platform with the two ensemble learning methods for perfume identification.Even in the worst case, an identification accuracy of 88.4% was obtained in the perfume experiments, which further validates the reliability of our E-nose platform SMUENOSEv2.
Figure 12 shows the feature importance obtained in perfume experiments.The feature importance is defined as the proportion of times the feature was used in a model, which can be represented as follows: ∑ 40 j=1 e j , i = 1, 2, . . ., 40 (10) where e i denotes the number of times the i-th feature was used in a model.As mentioned in Section 2.3.2, the feature dimension of our problem is forty.The more times a feature was used to construct the decision trees means the feature has taken more effect in the ensemble learning method.Thus, the feature importance measures the contribution degree of each feature.According to Figure 12 and the feature indexes listed in Table 2, at least two conclusions can be summarized as follows: 1.
For both methods, the normalized voltage peaks were generally more important than other characteristics.On the contrary, the normalized times of the maximum first-order gradient were slightly less important than other characteristics, although the importance of feature 38 in Figure 10b existed as an exception.It can be concluded that the peak values contain more distinguishing information about the different perfume constituents.The maximum first-order gradient, which characterizes the time of maximum rising velocity, is less correlated with the stimulant perfumes.

2.
XGBoost obviously focused more on features 1, 8, and 18 than other features.In comparison, LightGBM concentrated more on almost a half of the features while less on the other features.The decentralized attention of LightGBM could be attributed to its EFB mechanism.
In both confusion matrices, the LBK perfume samples were all successfully identified.Compared with other perfume types, the MR perfume samples were incorrectly identified the most times.The differences among the identification accuracies of different perfume types mainly came from the different chemical constituents of the perfume samples, which are not the main concern of this paper.The numbers in both confusion matrices demonstrate the best and worst performance of using our E-nose platform with the two ensemble learning methods for perfume identification.Even in the worst case, an identification accuracy of 88.4% was obtained in the perfume experiments, which further validates the reliability of our E-nose platform SMUENOSEv2.
Figure 12 shows the feature importance obtained in perfume experiments.The feature importance is defined as the proportion of times the feature was used in a model, which can be represented as follows: where ei denotes the number of times the i-th feature was used in a model.As mentioned in Section 2.3.2, the feature dimension of our problem is forty.The more times a feature was used to construct the decision trees means the feature has taken more effect in the ensemble learning method.Thus, the feature importance measures the contribution degree of each feature.According to Figure 12 and the feature indexes listed in Table 2, at least two conclusions can be summarized as follows: 1.For both methods, the normalized voltage peaks were generally more important than other characteristics.On the contrary, the normalized times of the maximum firstorder gradient were slightly less important than other characteristics, although the importance of feature 38 in Figure 10b existed as an exception.It can be concluded that the peak values contain more distinguishing information about the different perfume constituents.The maximum first-order gradient, which characterizes the time of maximum rising velocity, is less correlated with the stimulant perfumes.
2. XGBoost obviously focused more on features 1, 8, and 18 than other features.In comparison, LightGBM concentrated more on almost a half of the features while less on the other features.The decentralized attention of LightGBM could be attributed to its EFB mechanism. (a)

Experiments in the Liquor Group
With respect to the liquor experiments, Figure 13 shows that the identification accuracies and time-efficiencies are also considerably high, which validates the feasibility of using our E-nose for liquor identification.Correspondingly, Figures 14 and 15

Experiments in the Liquor Group
With respect to the liquor experiments, Figure 13 shows that the identification accuracies and time-efficiencies are also considerably high, which validates the feasibility of using our E-nose for liquor identification.Correspondingly, Figures 14 and 15

Experiments in the Liquor Group
With respect to the liquor experiments, Figure 13 shows that the identification accuracies and time-efficiencies are also considerably high, which validates the feasibility of using our E-nose for liquor identification.Correspondingly, Figures 14 and 15      Moreover, compared with the perfume experiments, the identification accuracies obtained in the liquor experiments are more divergent.The ranges with 1.5 × IQR of Figure 13a are 15%, which are 5% higher than those of Figure 10a.This phenomenon coincides with the relatively smaller elements of the confusion matrices in Figure 14  Moreover, compared with the perfume experiments, the identification accuracies obtained in the liquor experiments are more divergent.The ranges with 1.5 × IQR of Figure 13a are 15%, which are 5% higher than those of Figure 10a.This phenomenon coincides with the relatively smaller elements of the confusion matrices in Figure 14.The relatively lower identification accuracies obtained in liquor experiments are mainly because the constituents in different liquor types are closer with each other than those in different perfume types.The increased difficulty of liquor identification brings about more challenges for the E-nose platform and identification methods.It is noteworthy that the mean identification accuracies in the liquor experiments are 92.3% and 94.6, which are still considerably high.

Conclusions and Future Works
In this paper, an integrated handheld EN was designed for identification.By LVC, we mean the chemical that can be volatilized from original liquid form to gaseous form by themselves or external blowing.The computing core of our newly designed EN is an NVIDIA Jetson Nano module.Owing to its miniature volume, the NVIDIA Jetson Nano module can be mounted together with other EN components in a handheld case.The newly designed EN consists of gas transportation and electronic hardware components.A small air pump was used to accelerate the volatilization of LVC samples and transport their gaseous form towards the sensor array.In the meantime, a STM32 processor was used to acquire the sensing voltages of the sensor array and transmit the acquired data to NVIDIA Jetson Nano, on which two improved GBDT methods (i.e., XGBoost and LightGBM) were separately employed for LVC identification.Compared with common divide-body EN designs, our integrated EN is more suitable for on-site quick LVC identification.With GBDT models previously trained by both methods, our EN can realize highly accurate identification of perfumes and liquors in less than one second.In comparison, LightGBM spent generally less time for the model training and identification processes.
Despite its considerably high performance on LVC identification, our newly designed EN could be further improved in the following aspects: (1) Adapting the EN for identifying original gaseous chemicals.In the current EN design, it is supposed that an LVC in its original liquid form was dripped into the EN and then volatilized to the gas form.The scheme for adding original gaseous chemicals into the EN can be further investigated.(2) Adding a temperature and humidity control module to the EN.Controlling the temperature and humidity around the sensor array at a fixed value was not considered in the current EN design.The sensing data acquired at significantly different temperature and humidity values could influence the identification results.Future research works should cover adding a temperature and humidity control module to our EN.

Figure 1 .Figure 1 .
Figure 1.The scene of using SMUENOSEv2 for LVC identification.The structure block diagram of SMUENOSEv2 is shown in Figure2.The physical blocks of SMUENOSEv2 comprise gas transportation and electronic hardware blocks, Figure 1.The scene of using SMUENOSEv2 for LVC identification.

Figure 2 .
Figure 2. The structure block diagram of SMUENOSEv2.

Figure 3 .
Figure 3.The gas route of SMUENOSEv2.Red bold numbers 1 and 2 indicate the two gas route branches in the downstream side of the three-way valve.

Figure 2 .
Figure 2. The structure block diagram of SMUENOSEv2.

Figure 2 .
Figure 2. The structure block diagram of SMUENOSEv2.

Figure 3 .
Figure 3.The gas route of SMUENOSEv2.Red bold numbers 1 and 2 indicate the two gas route branches in the downstream side of the three-way valve.

Figure 3 .
Figure 3.The gas route of SMUENOSEv2.Red bold numbers 1 and 2 indicate the two gas route branches in the downstream side of the three-way valve.

Figure 4 .
Figure 4. Structure of the electronic hardware in SMUENOSEv2.The blocks inside the dashed rectangle belong to the sensing voltage sampling board.

Figure 5 .Figure 4 .
Figure 5. Radar plot of the sensitivity coefficients looked up from the sensors' datasheets.

Figure 4 .
Figure 4. Structure of the electronic hardware in SMUENOSEv2.The blocks inside the dashed rectangle belong to the sensing voltage sampling board.

Figure 5 .Figure 5 .
Figure 5. Radar plot of the sensitivity coefficients looked up from the sensors' datasheets.

Figure 6 .
Figure 6.The front-end circuit for sampling the i-th sensor's sensing voltage.IOA is an integrated operation amplifier.Vsup is the power supply voltage of the sensors.

Figure 7 .
Figure 7. Flow charts of the perception and pattern recognition components.The dashe stand for the interaction between the two components.

FigureFigure 7 .
Figure 8a,b show the HMI surface of identifying an individual liquor sample model training process, respectively.The graphical user interface of the HMI com comprises three sectors: 1. Sector I: the real-time sensing voltage displaying sector.During the sampling the real-time sensing voltages received from the sampling component are p

FigureFigure 8 .
Figure 8a,b show the HMI surface of identifying an individual liquor sample and the model training process, respectively.The graphical user interface of the HMI component comprises three sectors: 1. Sector I: the real-time sensing voltage displaying sector.During the sampling process, the real-time sensing voltages received from the sampling component are plotted in this sector.The curves in different colors correspond with the data of different sensors.

3 .
Sector III: the identification and training operation sector.Empirically, the iden tion and training operations are mutual exclusive, and, thus, are placed in tw ferent tabs sharing sector III.The operation processes for training and identif are detailed as follows: • By pushing the "Training" button in the bottom-right corner of Figure individual training process can be activated.The training dataset an method should be selected before pushing the "Training" button.In the area of this sector, a table lists all previously saved sensing voltage samp clicking the table items, multiple samples can be selected to form the tr dataset.The ML method used for training can be selected in the top-righ boBox.Moreover, the hyper-parameters of the employed method can be pasting the specially formatted parameter values in the TextBox abo "Training" button.Finally, the resulting trained ML model, which can b for identification, is saved in the external memory of NVIDIA Jetson Nan

Figure 8 .
Figure 8.The human-machine interfaces (HMI) of SMUENOSEv2.(a) HMI for sampling and identification process; (b) HMI for model-training process.

Figure 9 .
Figure 9.Typical values obtained in an individual cycle of liquor sampling.(a) The value of preprocessed  ; (b) The first-order gradient of preprocessed  in the first 10 s of a liquor sampling cycle.

Figure 9 .
Figure 9.Typical values obtained in an individual cycle of liquor sampling.(a) The value of preprocessed u L ; (b) The first-order gradient of preprocessed u L in the first 10 s of a liquor sampling cycle.

Electronics 2023 ,Figure 10 .Figure 10 .
Figure 10.Statistical results obtained in perfume experiments.(a) Boxplot of the identification accuracies.(b) Boxplot of the time spent for training.(c) Boxplot of the time spent for identification.To thoroughly compare the identification accuracy with respect to different perfume types, the confusion matrices obtained in the perfume experiments are shown in Figure11.The elements of confusion matrices are the times of experiments.As mentioned, to test each method based on the perfume samples, five groups of ten-fold cross-validation ex-

Figure 10b ,
Figure 10b,c show the boxplot of time spent for training and identification, respectively.The considerably short training and identification time also verified the feasibility of using our resource-constrained integrated E-nose for perfume identification.The training times spent in all trials are shorter than 45 s.Compared with the identification times, which are shorter than 1 s, the training times are much longer.Fortunately, for on-site usage in real applications, the training component is less frequently used than the identification component.The well-trained model can be repeatedly used for identification.The much longer training time can be attributed to the time spent for parameter selection, which was reckoned with in the training time since parameters are also part of the trained model.During the training process, we found the parameter selection time dominate the training time.As mentioned in Section 2.3.3, the parameter selection was realized by solving function optimization problems, which involve a large number of model training with different parameter combinations.Moreover, the time-spans spent by XGBoost are generally longer, but more concentrated, than those spent by LightGBM.The generally shorter but more divergent training and identification time of LightGBM is mainly attributed to its GOSS and EFB mechanisms, which reduced the number of engaged features and incorporated more random operations.To thoroughly compare the identification accuracy with respect to different perfume types, the confusion matrices obtained in the perfume experiments are shown in Figure11.The elements of confusion matrices are the times of experiments.As mentioned, to test each method based on the perfume samples, five groups of ten-fold cross-validation experiments were conducted, which means a total of 50 individual perfume experiments for each of XGBoost and LightGBM.Moreover, as mentioned in Section 2.3, each perfume type was sampled for 50 cycles.The 50 samples were divided into 10 equal parts, and an individual part with five samples was used as the test set in each perfume experiment.Thus, for each method, each perfume type was tested 250 times, which coincides with the fact that the elements in each row of the confusion matrices sum up to 250.The auxiliary diagonal elements stand for the times of correctly identified experiments.

Figure 10 .
Figure 10.Statistical results obtained in perfume experiments.(a) Boxplot of the identification accuracies.(b) Boxplot of the time spent for training.(c) Boxplot of the time spent for identification.

Figure 11 .Figure 11 .
Figure 11.Confusion matrices in perfume experiments.(a) The confusion matrix obtained by XGBoost; (b) The confusion matrix obtained by LightGBM.

Figure 12 .
Figure 12.The boxplot of feature importance in perfume experiments.(a) The feature importance obtained by XGBoost; (b) The feature importance obtained by LightGBM.

Figure 12 .
Figure 12.The boxplot of feature importance in perfume experiments.(a) The feature importance obtained by XGBoost; (b) The feature importance obtained by LightGBM.

Figure 12 .
Figure 12.The boxplot of feature importance in perfume experiments.(a) The feature importance obtained by XGBoost; (b) The feature importance obtained by LightGBM.

Figure 13 .
Figure 13.Statistical results obtained in liquor experiments.(a) Boxplot of the identification accuracies.(b) Boxplot of the time spent for training.(c) Boxplot of the time spent for prediction.

Figure 13 .Figure 14 .
Figure 13.Statistical results obtained in liquor experiments.(a) Boxplot of the identification accuracies.(b) Boxplot of the time spent for training.(c) Boxplot of the time spent for prediction.18 of 21

Figure 15 .
Figure 15.The boxplot of feature importance in liquor experiments.(a) The feature importance obtained by XGBoost; (b) The feature importance obtained by LightGBM.
blocks are responsible for all electronic functions.Moreover, apart from the physical blocks, the normal operation of our EN also relies on the software components.Sections 2.1.1-2.1.3detail the design of gas routes, electronic hardware, and software components, respectively. hardware

Table 1 .
Ranges of main tunable hyper-parameters of the tested methods.