Design and Validation of a Portable Machine Learning-Based Electronic Nose

Volatile organic compounds (VOCs) are chemicals emitted by various groups, such as foods, bacteria, and plants. While there are specific pathways and biological features significantly related to such VOCs, detection of these is achieved mostly by human odor testing or high-end methods such as gas chromatography–mass spectrometry that can analyze the gaseous component. However, odor characterization can be quite helpful in the rapid classification of some samples in sufficient concentrations. Lower-cost metal-oxide gas sensors have the potential to allow the same type of detection with less training required. Here, we report a portable, battery-powered electronic nose system that utilizes multiple metal-oxide gas sensors and machine learning algorithms to detect and classify VOCs. An in-house circuit was designed with ten metal-oxide sensors and voltage dividers; an STM32 microcontroller was used for data acquisition with 12-bit analog-to-digital conversion. For classification of target samples, a supervised machine learning algorithm such as support vector machine (SVM) was applied to classify the VOCs based on the measurement results. The coefficient of variation (standard deviation divided by mean) of 8 of the 10 sensors stayed below 10%, indicating the excellent repeatability of these sensors. As a proof of concept, four different types of wine samples and three different oil samples were classified, and the training model reported 100% and 98% accuracy based on the confusion matrix analysis, respectively. When the trained model was challenged against new sets of data, sensitivity and specificity of 98.5% and 98.6% were achieved for the wine test and 96.3% and 93.3% for the oil test, respectively, when the SVM classifier was used. These results suggest that the metal-oxide sensors are suitable for usage in food authentication applications.


Introduction
Among the human senses, olfaction is a sensitive method involving the detection of small numbers of molecules in the environment. There are many applications of olfaction in everyday life; however, in many of these areas, such as food analysis, the use of odors has been limited since humans have several limitations when discriminating between different smells. For example, a high amount of experience and training is required, sensory results can be subjective rather than objective, and the nose is subject to sensory fatigue with time and has a limited ability to distinguish between very similar odors. Until relatively recently, there have been few electronic sensors sensitive or selective enough to replace operators trained to discriminate between different odors. Electronic noses have the ability to offer improvements in all of these aspects.
There are several types of sensors and instruments for sensing odors. Gas chromatography (GC) passes the sample through a small tube filled with a stationary phase in order to separate components by retention time [1]. This method can easily separate some compounds, while others produce overlapping peaks. Mass spectrometry (MS) also achieves high selectivity and, when combined with gas chromatography, is able to separate and data on its own. For this task, support vector machines or neural networks, similar to biological systems, are used [7]. An SVM attempts to find a hyperplane that best separates the data [16]. K nearest neighbors (KNN) can also be used to classify odors.
Recognizing the growing interest in portable electronic noses and previously reported systems, this paper presents a portable metal-oxide sensor-based electronic nose that consists of ten different metal-oxide sensors. The battery-powered system has a fan to control the air flow, and the overall size fits within an adult's palm, making it suitable for field-deployable applications. For validation of the proposed device, four wine and three oil samples were tested and the classification performance was compared among 24 types of machine learning classifiers.

Sensor Selection
Since the goal was to build a low-cost, portable electronic nose, metal-oxide sensors were chosen. These sensors have small footprints, have low power consumption, and still provide high sensitivity for certain VOCs. Since VOCs are compounds that indicate the presence of certain types of biochemical activity, a matrix of potential sensors with their sensitivities was composed to assess the potential of the list of commercially available sensors ( Figure 1). These data were obtained from the datasheets for each of the 10 sensors. Each sensor's data were normalized based on the maximum sensitivity of each sensor, which was the slope of the concentration versus resistance change. In addition, they also showed the cross-sensitivities and response graphs of each sensor, allowing for more detailed responses for each sensor to be predicted. From this information, a list of sensors was selected in order to detect the maximum number of VOCs while limiting overlap between the sensors. The final set of sensors used were MQ-2 for flammable gases, MQ-5 for natural gas, MICS-5524 for carbon monoxide and natural gas, SGAS-707 for VOCs, MQ-3 for alcohol, MQ-4 for natural gas, MQ-6 for flammable gases, TGS-822 for VOCs, and TGS-2602 for air contaminants, based on Figure 1. Supplementary Materials Table  S1 shows detailed information of the full list of surveyed sensors and the boldface row displays the selected sensors. components and contains a fraction of the variance found in the original dataset. While PCA is useful for reducing the dimensionality of the data, it cannot be used to classify data on its own. For this task, support vector machines or neural networks, similar to biological systems, are used [7]. An SVM attempts to find a hyperplane that best separates the data [16]. K nearest neighbors (KNN) can also be used to classify odors.
Recognizing the growing interest in portable electronic noses and previously reported systems, this paper presents a portable metal-oxide sensor-based electronic nose that consists of ten different metal-oxide sensors. The battery-powered system has a fan to control the air flow, and the overall size fits within an adult's palm, making it suitable for field-deployable applications. For validation of the proposed device, four wine and three oil samples were tested and the classification performance was compared among 24 types of machine learning classifiers.

Sensor Selection
Since the goal was to build a low-cost, portable electronic nose, metal-oxide sensors were chosen. These sensors have small footprints, have low power consumption, and still provide high sensitivity for certain VOCs. Since VOCs are compounds that indicate the presence of certain types of biochemical activity, a matrix of potential sensors with their sensitivities was composed to assess the potential of the list of commercially available sensors ( Figure 1). These data were obtained from the datasheets for each of the 10 sensors. Each sensor's data were normalized based on the maximum sensitivity of each sensor, which was the slope of the concentration versus resistance change. In addition, they also showed the cross-sensitivities and response graphs of each sensor, allowing for more detailed responses for each sensor to be predicted. From this information, a list of sensors was selected in order to detect the maximum number of VOCs while limiting overlap between the sensors. The final set of sensors used were MQ-2 for flammable gases, MQ-5 for natural gas, MICS-5524 for carbon monoxide and natural gas, SGAS-707 for VOCs, MQ-3 for alcohol, MQ-4 for natural gas, MQ-6 for flammable gases, TGS-822 for VOCs, and TGS-2602 for air contaminants, based on Figure 1. Supplementary Materials Table S1 shows detailed information of the full list of surveyed sensors and the boldface row displays the selected sensors.

Circuit Design
The main controller was chosen to be STM32F031 (STMicroelectronics, Geneva, Switzerland). This CPU utilizes ARM Corex-M0 with 48 MHz frequency and operates on 3.3 V input. It also provides a 12-bit analog to digital converter (ADC) with input range of 0-3.6 V with several standard communication interfaces, such as universal asynchronous receiver transmitter (UART), serial port interface (SPI), and I2C protocol. Sensors chosen (see Section 2.1) were designed in a voltage divider arrangement from voltage common collector (VCC) of 5 V and ground. Metal-oxide sensors change their respective resistances, R S , as target VOCs adsorb to the sensor surface, and connecting to a precision resistor of R 0 in series generates a voltage division as where V out is the voltage read by the controller ADC channel. With this arrangement, an increase in gas concentration decreases R S , which in turn increases the voltage at the output so that a positive correlation between voltage and concentration is formed. In addition, the baseline voltage output for each sensor can be adjusted. Rather than designing an individual PCB board for each sensor, the sensor package was mounted in a 3D-printed case with custom-designed sockets for the sensors, and the breakout boards were soldered together. Potentiometers of various values were exposed in the top of the electronic nose in order to allow the baseline value for each sensor to be adjusted. The programming and serial port of the microcontroller were exposed to allow for easy programming and data transfer. A fan (MC20080V1-000U-A99, DigiKey Electronics, Thief River Falls, MN, USA) was also included in order to draw the odors into the electronic nose.
Two 2600 mAh, 3.7 V battery cells (18,650 size, Sparkfun Electronics, Niwot, CO, USA) were serially connected. A functional diagram of the overall system is shown in Figure 2, with the electronic wiring diagram provided in Figure 3 and Figure S1. The portable instrument developed was named electronic nose modules with machine learning algorithm (EMLA).

Circuit Design
The main controller was chosen to be STM32F031 (STMicroelectronics, Geneva, Switzerland). This CPU utilizes ARM Corex-M0 with 48 MHz frequency and operates on 3.3 V input. It also provides a 12-bit analog to digital converter (ADC) with input range of 0-3.6 V with several standard communication interfaces, such as universal asynchronous receiver transmitter (UART), serial port interface (SPI), and I2C protocol. Sensors chosen (see Section 2.1) were designed in a voltage divider arrangement from voltage common collector (VCC) of 5 V and ground. Metal-oxide sensors change their respective resistances, RS, as target VOCs adsorb to the sensor surface, and connecting to a precision resistor of R0 in series generates a voltage division as where Vout is the voltage read by the controller ADC channel. With this arrangement, an increase in gas concentration decreases RS, which in turn increases the voltage at the output so that a positive correlation between voltage and concentration is formed. In addition, the baseline voltage output for each sensor can be adjusted. Rather than designing an individual PCB board for each sensor, the sensor package was mounted in a 3D-printed case with custom-designed sockets for the sensors, and the breakout boards were soldered together. Potentiometers of various values were exposed in the top of the electronic nose in order to allow the baseline value for each sensor to be adjusted. The programming and serial port of the microcontroller were exposed to allow for easy programming and data transfer. A fan (MC20080V1-000U-A99, DigiKey Electronics, Thief River Falls, MN, USA) was also included in order to draw the odors into the electronic nose.
Two 2600 mAh, 3.7 V battery cells (18,650 size, Sparkfun Electronics, Niwot, CO, USA) were serially connected. A functional diagram of the overall system is shown in Figure 2, with the electronic wiring diagram provided in Figure 3 and Figure S1. The portable instrument developed was named electronic nose modules with machine learning algorithm (EMLA).

Data Acquisition
The sensors responded slowly, in the order of tens of seconds, to changes in the environment. Since the ADC of the microcontroller used was capable of one mega-sample per second, oversampling was used to increase the effective resolution of the sensors. For every n times the sensor was oversampled, assuming white noise, there was a √ times increase in resolution and a decrease in noise [14]. Since the steady-state readings of the sensors were used for classification, only a low sampling rate was required. One sample per second was chosen as the sample rate. With 10 sensors, this allows for an oversampling rate of 5556×. With a 12-bit ADC, this gives approximately 18 bits of resolution from the STM32F0 series. The very simple software running on the microcontroller performed the oversampling and sent the data over the logic level serial port.

Calibration
To provide sensor selectivity and a quantitative response curve, calibration was performed with isopropyl alcohol. Then, 70% stock solution was diluted to 50, 35, and 15%, and approximately 100 µL of each solution was deposited on the 3D-printed sample tray. EMLA was turned on for 3 min, fan speed was set to the 16/255 level, and the heater to the maximum level (255/255). Once the baseline response was confirmed, 3D-printed sample tray was inserted under the fan. Data were collected for five minutes, recording one reading per second for all 10 sensors. Each sample challenge was repeated three times for statistical calculation. For plotting the calibration data, mean and standard deviation of the sensor responses for three replica were calculated and peak ADC output of each concentration for sensors that responded to isopropyl alcohol was plotted to check the response linearity.

Data Acquisition
The sensors responded slowly, in the order of tens of seconds, to changes in the environment. Since the ADC of the microcontroller used was capable of one mega-sample per second, oversampling was used to increase the effective resolution of the sensors. For every n times the sensor was oversampled, assuming white noise, there was a √ N times increase in resolution and a decrease in noise [14]. Since the steady-state readings of the sensors were used for classification, only a low sampling rate was required. One sample per second was chosen as the sample rate. With 10 sensors, this allows for an oversampling rate of 5556×. With a 12-bit ADC, this gives approximately 18 bits of resolution from the STM32F0 series. The very simple software running on the microcontroller performed the oversampling and sent the data over the logic level serial port.

Calibration
To provide sensor selectivity and a quantitative response curve, calibration was performed with isopropyl alcohol. Then, 70% stock solution was diluted to 50, 35, and 15%, and approximately 100 µL of each solution was deposited on the 3D-printed sample tray. EMLA was turned on for 3 min, fan speed was set to the 16/255 level, and the heater to the maximum level (255/255). Once the baseline response was confirmed, 3D-printed sample tray was inserted under the fan. Data were collected for five minutes, recording one reading per second for all 10 sensors. Each sample challenge was repeated three times for statistical calculation. For plotting the calibration data, mean and standard deviation of the sensor responses for three replica were calculated and peak ADC output of each concentration for sensors that responded to isopropyl alcohol was plotted to check the response linearity. . Each wine sample was kept in a refrigerator (4 • C) and pipetted with the volume of approximately 5 mL to be evaluated. Terminal was opened for serial port connection and the ASCII data stream was recorded in the computer. The same experiment was repeated four times on different days. Three different oil samples were used to challenge the e-nose module. Approximately 2 mL oil was poured into a paper cup and heated to 50 • C on a hotplate. After approximately 30 s, data collection began for all ten sensor modules for the next 300 s, with data acquisition every seconds. The terminal was opened for serial port connection and the ASCII data stream recorded in the computer. The same experiment was repeated four times on different dates.

Data Analysis
Analysis was conducted in two steps: training and testing. For training the model, 160 sensor readings from the ADC for each oil (total of 480 data points) were imported into the Matlab ® classifier learner app. All 24 classifiers were trained simultaneously, and accuracy was reported as an output. The top five classifiers (fine tree, quadratic discriminant, quadratic support vector machine, cubic support vector machine, and k nearest neighbor) were selected, and their individual performance was checked by the confusion matrix and the receiver operating characteristic (ROC) curve. Once validated, each model was exported as an executable command in Matlab and challenged by a new testing dataset (360 data points; 120 each). Each classifier exported the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) results. These values were used to report the statistical parameter as follows: Positive predictive value = TP TP + FP Negative predictive value = TN TN + FN (4)

E-Nose System
The final prototype is shown in Figure 4. Output from the sensor voltage dividers was fed directly into the microcontroller, greatly reducing the complexity and part count of this electronic nose. A fan was used to circulate the odors through the electronic nose, an arrangement much less complex than the system of pumps and valves used by most others. The compact size and light weight of this system gave it excellent portability. Because the data were sent over a standard USB serial adapter, there was less restriction regarding the type of data collection module needed.

Calibration Experiment
The time-dependent sensor response to isopropyl alcohol concentration is shown in Figure 5. Among the 10 sensors, only four of them showed a significant response to the calibration sample (MQ4, MQ5, MQ6, TGS2602). Figure 5A shows ADC counts increasing steeply for the first minute or so after sample insertion, except for TGS2602, whose response gradually increased for 170 s. After reaching peak values, all sensor outputs gradually decreased, since the amount of diffused VOCs was limited by the droplet volume. Line elements representing the average and shaded areas show the standard deviation for the triplicate experimental samples. Figure 5B shows the correlation between the isopropyl concentration and ADC count; a good linear relationship was achieved, with R 2 values of 0.9999 (MQ6), 0.9722 (MQ5), 0.9999 (MQ4), and 0.8945 (TGS2602), respectively.

Calibration Experiment
The time-dependent sensor response to isopropyl alcohol concentration is shown in Figure 5. Among the 10 sensors, only four of them showed a significant response to the calibration sample (MQ4, MQ5, MQ6, TGS2602). Figure 5A shows ADC counts increasing steeply for the first minute or so after sample insertion, except for TGS2602, whose response gradually increased for 170 s. After reaching peak values, all sensor outputs gradually decreased, since the amount of diffused VOCs was limited by the droplet volume. Line elements representing the average and shaded areas show the standard deviation for the triplicate experimental samples. Figure 5B shows the correlation between the isopropyl concentration and ADC count; a good linear relationship was achieved, with R 2 values of 0.9999 (MQ6), 0.9722 (MQ5), 0.9999 (MQ4), and 0.8945 (TGS2602), respectively.  Figure 6A displays the experimental setup. The EMLA unit was positioned on top of the temperature-controlled plate heater. Figure 6B shows the dynamic sensor response when a cabernet sauvignon sample was moved in and out of the interrogation position. While the sample insertion generated a quick response, removal displayed a typical firstorder time response, so any subsequent sample interrogation was conducted after an interval of at least 10 min so that each sensor module was initialized. Figure 6C,D display the schematic diagram of the EMLA unit, 3D-printed platform, and sample holders.  Figure S3 shows an example of training model from linear SVM. Using the test-set data, all 24 classifiers were challenged, and their cross-validation results tabulated. Here, accuracy was plotted against all classifiers and the six best performers were selected for the testing phase. The performance of the trained model is shown in Figure 7 via the crossvalidation matrix and ROC curve. Table 1 shows the testing results for classifiers challenged with a new dataset of wine samples; statistical results were reported using Equations (1)- (5). Among the models tested, both linear and quadratic SVM classifiers provided the best performance, including sensitivity and specificity.  Figure 6A displays the experimental setup. The EMLA unit was positioned on top of the temperature-controlled plate heater. Figure 6B shows the dynamic sensor response when a cabernet sauvignon sample was moved in and out of the interrogation position. While the sample insertion generated a quick response, removal displayed a typical firstorder time response, so any subsequent sample interrogation was conducted after an interval of at least 10 min so that each sensor module was initialized. Figure 6C,D display the schematic diagram of the EMLA unit, 3D-printed platform, and sample holders. Figure S2 displays the survey of the classifiers conducted on the Matlab classifier learner while Figure S3 shows an example of training model from linear SVM. Using the test-set data, all 24 classifiers were challenged, and their cross-validation results tabulated. Here, accuracy was plotted against all classifiers and the six best performers were selected for the testing phase. The performance of the trained model is shown in Figure 7 via the cross-validation matrix and ROC curve. Table 1 shows the testing results for classifiers challenged with a new dataset of wine samples; statistical results were reported using Equations (1)-(5). Among the models tested, both linear and quadratic SVM classifiers provided the best performance, including sensitivity and specificity. Figure S2 shows the representative scatter plot of sensor readings for a sensor combination (TGS822 and MQ2). As expected, some sensors showed overlapping sensitivities to secondary chemical species, so these results were displayed to provide the group separation by different wine types.

Oil Experiment
The EMLA unit was positioned on top of the plate heater and the temperature was set to 50 deg Celsius to enhance the amount of volatile organic compounds available to the sensor. As with the wine sample, training sets were captured, and several models were trained ( Figure S2). The top five classifier models (fine tree, quadratic discriminant, quadratic SVM, cubic SVM, and fine KNN) among the 24 tested were challenged by the testing set. The performance of the training set is shown in Figure S4. The overall results for the testing set varied widely. Except for the quadratic SVM classifier, the methods showed a sensitivity or specificity of less than 50% for an oil sample. Overall results are summarized in Table 2.

Discussion
In the field of portable and field-deployable detection systems, smartphone-based systems have been recently highlighted, for the most part focusing on optical transduction [27][28][29]. However, for odor detection, metal-oxide sensors provide better options in terms of sensor availability and simple transduction of volatile organic compounds into voltage signals. The proposed e-nose design was implemented based on the limited choice of available metal-oxide chemical sensors. While cost reduction and miniaturization were possible by employing this type of sensor, there were a few limitations as well. First, as shown in Table S1, many sensors provide a primary response to certain chemical species along with a large number of secondary (weak) responses that overlap among various sensor types and vendors. This makes the deterministic approach less effective, since EMLA will not generate a unique response when a single species of chemical is present. However, the relative sensor response was not directly comparable among different vendors, so a statistical learning method was employed to teach the sensors to acquire the characteristics of each sample under investigation.
While the calibration experiment was performed observing the sensitivity and dynamics from a single chemical species, some sensor dynamics could be explored. Based on the calibration experiment with isopropyl alcohol, the sensors MQ6 (R 2 = 0.9999), MQ5 (R 2 = 0.9722), MQ4 (R 2 = 0.9999), and TGS2602 (R 2 = 0.8945) resulted in a linear relationship between concentration and ADC count. This means that the EMLA unit has potential for measuring the concentration of compounds and mixtures of foods with quantitation. One of the second characteristics observed in using the metal-oxide sensor module was that the sensors have different response times and cooling times. Based on Figure 6, all the sensors responded with a typical first-order system response, having time constants of 30-40 s, depending on the sensor type. However, one sensor module (SGAS 707) showed an extremely slow response and cool down; for future system development, this module will be replaced by another with similar characteristics and a faster response time. As noted from other reviews, metal-oxide-type sensors' reactions can be affected by local temperature and humidity levels [30,31]. One of the limitations of the proposed device is the passive control of the environment. Therefore, signal fluctuation could be generated from the passive nature of the temperature and humidity control.
For the circuit readout, oversampling was used to improve the effective resolution of the ADC. This allows for the internal ADC to be used, reducing the part count of the e-nose. However, oversampling came at the cost of some extra processing and a reliance on uncorrelated noise in order for oversampling to work properly.
The use of a fan instead of pumps and valves allowed for a simpler system compared to other electronic noses. However, this means that the electronic nose cannot precisely control the sampling conditions. Despite this drawback, the electronic nose was able to successfully distinguish between different brands and types of wine and oil samples. Matlab's classifier learner app provided a quick and broad spectrum of the classification models. A total of 24 linear classifiers can be directly implemented, with the additional help of principal component analysis to reduce the dimensionality. The current EMLA system relies on Matlab-based offline analysis of the captured data. Final implementation of EMLA as a field-deployable unit will require porting the trained model into the micro-processor unit so that on-board analysis can be conducted.

Conclusions
A metal-oxide sensor-based electronic nose system called EMLA was presented. This system utilizes ten metal-oxide sensors that were selected based on their availability and the scope of their response to chemical species. A portable unit was designed to be battery-operated and a fan was used to control the flow rate of the sample inlet. A Matlab-based machine learning algorithm was implemented. The best results overall were achieved by a quadratic SVM classifier, with a minimum classification accuracy of 97% for four wine samples and 93% for three oil samples.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/s21113923/s1. Table S1. List of candidate sensors. Bold face sensors were selected to be used in the EMLA unit (10 different sensors). They were selected based on the wide variety of primary responses to the chemical species. Figure S1: Full electronic CAD diagram for the EMLA unit. Four major components were microcontroller unit, sensors unit, heater and fan control unit, and power switch unit. Figure S2: Survey of accuracy for 24 different classification methods. Tree (6 types), discriminant analysis (2), Baysian (2), SVM (6), and KNN (7). (A) Wine sample (B) Oil sample. Figure S3: Scatterplot of four different wine samples depicted by two sensors: TGS822 and MQ2. Figure S4: