MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis

: Lung cancer is characterized by a tremendously high mortality rate and a low 5-year survival rate when diagnosed at a late stage. Early diagnosis of lung cancer drastically reduces its mortality rate and improves survival. Exhaled breath analysis could offer a tool to clinicians to improve the ability to detect lung cancer at an early stage, thus leading to a reduction in the associated survival rate. In this paper, we present an electronic nose for the automatic analysis of exhaled breath. A total of ﬁve a-speciﬁc gas sensors were embedded in the electronic nose, making it sensitive to different volatile organic compounds (VOCs) contained in exhaled breath. Nine features were extracted from each gas sensor response to exhaled breath, identifying the subject breathprint. We tested the electronic nose on a cohort of 80 subjects, equally split between lung cancer and at-risk control subjects. Including gas sensor features and clinical features in a classiﬁcation model, recall, precision, and accuracy of 78%, 80%, and 77% were reached using a fourfold cross-validation approach. The addition of other a-speciﬁc gas sensors, or of sensors speciﬁc to certain compounds, could improve the classiﬁcation accuracy, therefore allowing for the development of a clinical tool to be integrated in the clinical pipeline for exhaled breath analysis and lung cancer early diagnosis.


Introduction
Lung cancer survival rates are greatly dependent on the stage of the disease at the time of diagnosis. If diagnosed at an early stage, lung cancer has a 70-90% five-year survival rate, which drastically drops to 12% if diagnosed at a late stage [1]. Every year, a total of 1.6 million people die because of lung cancer, and 1.8 million people are diagnosed with this disease [2]. Therefore, lung cancer poses worldwide issues of high mortality and economic burden. The need for lung cancer early diagnosis becomes more important every year and still has to be properly addressed. Imaging techniques can be useful in screening purposes. Low dose computed tomography (LDCT) represents the elective imaging technique for screening purposes, as demonstrated by the National Lung Cancer Screening Trial [3,4]. Nonetheless, for large scale lung cancer clinical screenings, the use of LDCT is not considered appropriate, due to the increasing risk of radiation exposure, the associated economic burden, and the high number of detected false positives [5,6]. Therefore, researchers are taking into consideration alternative techniques which may help in improving lung cancer early diagnosis. Exhaled breath, blood serum, and urine are all examples of mediums from which lung cancer specific biomarkers could be identified and used for early diagnosis [7]. Starting from [8], in the last few years, exhaled breath has been extensively studied and analyzed with the aim of diagnosing several diseases [9,10]. It is now known that exhaled breath could contain more than 3000 different VOCs [11]. In the unfortunate presence of cancer in the human body, the affected cells are characterized by a different metabolism compared to a healthy condition, that could cause an altered VOCs production by cancerous cells and their subsequent release in exhaled breath through the blood-air barrier. Therefore, exhaled breath VOCs could be used as lung cancer biomarkers, useful for early diagnosis. Even though many researchers have focused their efforts on the identification of compounds in healthy and diseased conditions, there is still no consensus on which VOCs are specific to lung cancer, and no compound has been found only in lung cancer subjects [12]. In fact, evidence exists suggesting that a combination of compounds, rather than single compounds, could be specific to lung cancer [13,14]. In order for a compound to be considered as a valid clinical biomarker for disease diagnosis, a proper pipeline must be followed. Pepe et al. suggest a five-step flow for biomarker identification: pre-clinical exploratory phase, clinical validation, retrospective analysis, prospective screening, and cancer control [15]. To date, no compound found in exhaled breath has reached such a level of validation for lung cancer diagnosis. As a consequence of this uncertainty, no clinical tool is currently available to clinicians for lung cancer early diagnosis based on exhaled breath. There are several techniques that could potentially be used for VOCs analysis, but to date, none of them can be considered suitable for clinical practice [9]. Great promises are offered by electronic noses. These devices are based on gas sensor arrays, with each sensor potentially designed with different sensing principles. Examples are colorimetric gas sensors [16,17], conductive polymer gas sensors [18], metal oxide gas sensors [19,20], and also type-different sensor arrays [21]. Even though studies involving the use of electronic noses for exhaled breath analysis have been extensively published in the last few years, the application of this technique in clinical practice as screening tool still remains a hypothesis [22]. As a matter of fact, some issues of such a technique can be pinpointed and need to be properly addressed by researchers before proceeding with application in standard clinical pipelines. First, the most common medium used for the temporary storage of the breath sample, Tedlar bags, do not allow for longterm storage of the sample under analysis without causing VOCs leakage [23]. To date, no consensus has been reached on the best strategy and gases to be used to obtain a proper cleaning of Tedlar bags before breath sampling [23][24][25]. Furthermore, sensor washout and cleaning should be carried out in order to maintain a stable baseline when analyzing multiple samples, without being affected by environmental changes and sensor drift. Even though extensive studies have been published in recent years, no off-the-shelf device is available to clinicians to replace standard screening methods for disease early diagnosis with exhaled breath analysis. Little progress has been made for its clinical application, as we are aware of only one registered clinical trial which is currently underway. This clinical trial (NCT02612532)) is sponsored by Owlstone Ltd. (Cambridge, UK) and has the main aim of defining the diagnostic accuracy of the ReCIVa breath sampler coupled to a gas analyzer based on spectrometer techniques. This clinical trial started in 2015 and involved more than 26 clinical sites for exhaled breath collection. The estimated enrollment was 520 subjects, but no results have been published so far.
In the present paper, we propose an electronic nose, embedding an array of five commercial MOS gas sensors, for the analysis of exhaled breath coupled to machine learning techniques for the analysis of the collected gas sensor responses. The proposed electronic nose was designed with the aim of being composed by commercial components, embedding few gas sensors, and being portable. Two main motivations were behind these design choices. The first one was the development of a device to be used by general practitioners as a large population screening tool, and as such requiring a portable instrument and less expensive equipment compared to standard gas chromatography-mass spectrometry (GC-MS) laboratory tools. The second one was the easier interpretation and readability of the achieved results, as few sensors were embedded in the electronic nose, thus reducing the related number of features extracted from the gas sensor responses. A feasibility study of the proposed device is described in this manuscript. The device was evaluated in a prospective study on a cohort of 80 subjects, equally split between lung cancer and at-risk subjects.
For each subject involved in the experiment, a breathprint was computed through the extraction of features from the gas sensors' responses to exhaled breath. The subject-specific breathprints, together with clinical features, were used as inputs for machine learning algorithms, with the aim of discriminating lung cancer from at-risk subjects. Furthermore, we performed an analysis on the achieved results based on the time difference between breath collection and the subsequent analysis with the electronic nose. This was done with the aim of assessing whether classification accuracy was dependent on this time difference when using Tedlar bags as temporary storage for exhaled breath analysis. The manuscript is organized as follows. Section 2 describes the metal oxide developed electronic nose used in this study, the features extracted from the gas sensors' responses, and the machine learning techniques. Section 3 presents the results obtained from the analysis of the exhaled breath of a cohort of 80 subjects, equally split between lung cancer and control subjects. Section 4 analyzes the obtained results, and Section 5 draws the appropriate conclusions.

Gas Sensors
The gas sensors used in the proposed electronic nose for exhaled breath analysis are commercial metal oxide (MOS) sensors. The basic structure of an MOS sensor is a ceramic support tube which is then coated with S n O 2 [26]. The sensitivity of the sensor can be controlled by acting on the surface material property [27]. For instance, the level of porosity of the sensitive material and its working temperature produces effects on the sensitivity of metal oxide gas sensors [28]. A conductivity change is observed when specific gas molecules, present in the air and to which the sensor is sensitive to, interact with its surface [28]. A measuring circuit, which can be as simple as a voltage divider able to detect resistance changes in the sensor, can be used for measuring conductivity changes. Some MOS sensors offer the possibility to precisely control their working temperature, thus allowing for additional tailoring to specific substance detection [28]. Table 1 describes the commercial gas sensors embedded in the electronic nose described in this study. A total of five gas sensors (Figaro USA Inc., Arlington Heights, IL, USA) sensitive to different compounds were chosen. Previous studies showed the capability of such sensors in discriminating healthy controls from subjects affected by lung cancer, diabetes, or other diseases [19,21,29]. All the sensors contain inside them a heater that can be configured to set the working temperature and to maximize sensor compound detection. In the present work, gas sensors were operated at a constant power, as the total current through the heater resistance was maintained constant. Figure 1 shows the typical gas sensor response curve when exposed to a sample of exhaled breath. In the first phase, namely the cleaning phase, the sensor has a constant value which is considered as its baseline value. When the exposure to the sample of interest starts in the measuring phase, the conductivity of the sensor increases, and as a consequence, resistance decreases upon reaching a plateau. In the last phase, named recovery, the sensor is exposed to environmental air and recovers its initial baseline value. Table 1. Commercial gas sensors embedded in the electronic nose for exhaled breath analysis and the associated sensitive organic compounds. In addition to the reported organic compounds, inorganic compounds also have an effect on the gas sensors' response.

Sensor
Sensitive Organic Compounds  Raw curve collected from an MOS gas sensor when exposed to exhaled breath. Three main phases can be identified, namely cleaning, measuring, and recovery. In the cleaning phase, the sensor maintains a constant value at its baseline. In the measuring phase, the sensor shows a change in conductivity based on the presence of compounds in the sample of air. A decrease in resistance is noticed if the analyzed sample contains a compound the sensor is sensitive to. R 0 , R min , and ∆T are values computed from the gas sensor response to be used for static feature extraction. Right: Gas sensor response in the phase-space. A total of five features were extracted from the phase-space: four are reported in the plot, with the last one being the relative integral in the measuring (adsorption) phase. Data shown in the figure were collected from the electronic nose presented in this study.

Electronic Nose
Gas sensors alone are unable to perform for exhaled breath analysis, as they need to be integrated into a more complex device. Such a device has two main purposes. The first one is controlling gas flow into and out of an analysis chamber where the gas sensors are located, and the second one is measuring the gas sensor resistance changes. These kinds of devices are commonly referred to as electronic noses [26]. A schematic diagram showing the typical components of an electronic nose is shown in Figure 2. The analysis chamber employed in the proposed electronic nose was entirely made of die-cast aluminum with a volume of approximately 500 mL. Two airtight holes, positioned on two opposite sides of the chamber, allowed for gas inflow and outflow. The five gas sensors were all embedded in the analysis chamber. In addition to the gas sensors, a temperature and relative humidity sensor (model SHT75, Sensirion AG, Switzerland) was used to monitor temperature and relative humidity inside the chamber. A 12 V air pump was used to pump air into and out of the analysis chamber (model H085-11, Parker Hannifin, Cleveland, OH, USA). The air flow was controlled by means of three solenoid valves (model L172, Sirai, Italy): two valves controlled air flow from the Tedlar bag and the environment, while an additional valve controlled air flow out of the analysis chamber. The main core controller of the electronic nose was an ARM Cortex-M3 microcontroller (Infineon Technologies, Neubiberg, Germany). The controller took care of all the required tasks for exhaled breath analysis: among them analog-to-digital conversions of gas sensor resistance, air pump and solenoid valves control, and data streaming towards a host device. The sampling frequency of the gas sensor resistance values was maintained constant at 10 Hz, and the sampling resolution was set to 16-bit. The schematic of the circuit handling gas sensor control and resistance measurement is shown in Figure 3. A software running on a host computer and communicating using Bluetooth communication protocol with the electronic nose was developed using Python and Kivy framework. This software allowed for the control of the device, visualization of the collected measurements, and storage of the gas sensor response data.
Heater Control To Analog Mux Figure 3. Schematic of gas sensor resistance measurement and heater control. For sake of clarity, only the schematic of a single gas sensor is shown. The red rectangle shows the components contained inside a gas sensor: R S is the variable sensor resistance, while R H is the heater resistance.

Data Pre-Processing and Feature Extraction
The aim of the feature extraction process was the computation of relevant characteristics of the time series curves, which could be later used as input to a classification algorithm. Prior to performing feature extraction, the raw gas sensors' response curves were filtered with a 4th order low-pass Butterworth filter with a cut-off frequency of 1 Hz to remove noise and high-frequency oscillations affecting the sensors' data. Furthermore, the sampled voltage values were converted to resistance for subsequent analysis. An example of filtered and converted data is shown in Figure 4.
Feature extraction from raw gas sensor data was carried out in order to extract subjectspecific breathprints and later use them for classification purposes. Starting from features described in preceding studies and from our own experience, we extracted a total number of 9 features from each sensor curve [30]. Given that 5 gas sensors were embedded in the proposed electronic nose, the total number of features for each subject was equal to 45. The extracted features were divided into static and dynamic features. Referring to Figure 1, the static features described in Table 2 were extracted. . Example of filtered resistance values collected from all gas sensors when exposed to a sample of breath from a healthy subject. Table 2. Static sensor features extracted from the gas sensor response curves. Refer to Figure 1 for the values used in the feature extraction equations.

Feature Value
Dynamic features, instead, were extracted in the phase-space of the gas sensor data [31,32]. Firstly, starting from the gas sensor response, named S(t), we computed its derivative dS(t)/dt. Secondly, the derivative was plotted with respect to the raw gas sensors' response. A visual explanation of this procedure is reported in Figure 1. The extracted dynamic features are depicted in Figure 1, with only one feature, AreaPS, not reported. This feature is computed as the area enclosed by the curve in the phase-space plot during the measurement phase.
The presence of correlated and redundant features could influence the subsequent step of data classification, making it difficult to complete a proper training of the algorithm, potentially leading to data overfitting. Therefore, principal component analysis (PCA) and sequential forward feature selection (SFFS) were used as feature reduction and selection methods with the aim of diminishing the number of features while still keeping important information embedded in the data. This step allowed reducing the dimensions of the input data used for the classification algorithm aimed at discriminating between lung cancer and control subjects. PCA aims at reducing the total number of features by projecting the features into a lower dimensional space, and it is typically used in gas sensor applications [21,33]. SFFS sequentially adds features from the feature subset in a greedy way, and it was applied with gas sensor data in other literature studies [29,34,35].
In addition to sensor features, clinical features were also considered in the classification process. For each subject participating in the experiments, a set of clinical features was provided by clinicians after analyzing the medical history of each subject. Table 3 summarizes the included clinical features. Smoking level was assessed in terms of both smoking status and pack years. Three main different groups of clinical features were identified: numerical (age, pack years, and BMI); binary (gender and all comorbidities); and categorical (smoking). As PCA requires numerical features to be properly applied, it was used as a pre-processing step for feature reduction only when sensor features were employed in the classification model. When clinical features were embedded in the classification model, only SFFS was used as a feature selection technique, without any feature reduction procedure, as clinical data contained binary and categorical variables.

Classification Algorithms and Metrics
After feature extraction and feature reduction, the next step was the classification of the collected breathprints for the discrimination between lung cancer and control subjects. In this manuscript, two classification methods have been tested. The following algorithms were applied to the collected gas sensor data: support vector machine (SVM) and AdaBoost. In the literature, such algorithms have already been tested for the classification of gas sensor data [18,21,36,37]. A 4-fold cross-validation (CV) scheme was used to assess the algorithm performance, and a grid search allowed for the optimization of the hyperparameters of the models.
Given the confusion matrix reported in Table 4, accuracy, recall, and precision metrics can be computed [38]. These metrics were used to assess the ability of the classification algorithms in discriminating subjects in lung cancer and control groups: • Accuracy: • Recall: • Precision: As the algorithms were tested in a CV scheme, the results will be reported as mean ± standard deviation (µ ± σ) on the 4 folds used for testing purposes.

Positive Negative
Real Results Positive TP FN Negative FP TN

Exhaled Breath Collection
The assessment of the performance of the developed electronic nose in discriminating lung cancer subjects from at-risk subjects required the collection of exhaled breath from human subjects. This study was approved by the European Institute of Oncology (IEO) ethical committee, with approval number R1004/19-IEO 1056. The study started in July 2019 and lasted until December 2020. All the individuals who participated in the experiments were provided with all the required information and instructions regarding the exhaled breath collection procedure. Prior to exhaled breath collection, informed consent was signed by the participants. Each study participant was requested to abide by simple guidelines before the breath sampling procedure. These guidelines were previously described in the literature by our group [39]. Lung cancer subjects were recruited among subjects waiting to undergo lung cancer removal surgery at IEO, while control subjects were recruited among subjects undergoing standard screening procedures as they were considered at-risk of developing lung cancer. At the moment of breath collection, which was carried out at IEO, the participants were asked to perform an inspiration followed by a full expiration. While a subject was performing the expiration, the operator in charge of breath collection controlled a three way valve in order to consecutively fill two Tedlar bags. The first bag, characterized by a maximum volume of 0.5 L, was devoted to dead space collection. The second bag, instead, with a volume of 3 L, allowed for collection of alveolar space breath. Exhaled breath samples were stored in the Tedlar bags and analyzed within the same day of collection with the electronic nose described in this study. For the purpose of this study, only Tedlar bags with alveolar breath were analyzed.

Exhaled Breath Analysis
After the breath collection procedure was completed, Tedlar bags were transferred from the site of collection (IEO) to the analysis laboratory and analyzed within the same day. Before performing any exhaled breath analysis with the electronic nose, the device was powered on so that the gas sensors were kept heated. No measurement was started until the gas sensors showed a stable response, which was checked by computing the standard deviation of the gas sensor responses. Referring to Figure 2, the procedure for the breath bags' analysis was carried out as follows: 1.
The Tedlar breath bag was connected to the electronic nose, and an analysis session was started from the host PC; 2.
Environmental air (see Figure 2) was brought into the analysis chamber by opening the appropriate valve, while keeping the breath bag valve closed and the valve positioned at the outlet of the analysis chamber open (cleaning phase in Figure 1); 3.
During the cleaning phase, the gas sensors stabilize on a baseline value. A check on the standard deviation of the gas sensors response was performed, allowing for a maximum duration of this phase of 120 s; 4.
Upon completion of the cleaning phase, the environmental air valve was closed, and the Tedlar breath bag valve was opened, contemporary to the closing of the valve positioned at the outlet of the analysis chamber. Breath started to flow inside the chamber with a rate of 1 L/min. Once the breath bag was completely empty, all valves were closed; 5.
The gas sensors were exposed to the breath sample for a total of 180 s (measuring phase); 6.
After the measuring phase concluded, gas sensors were again exposed to environmental air until they approached the baseline value (recovery phase). The duration of this phase was set to a maximum of 10 min.
Considering all the described phases, the analysis of a breath bag lasted for a maximum of 14 min.

Study Participants
A total of 80 subjects participated in this study. Table 5 summarizes the main demographic and clinical characteristics of the subjects involved in the experimental study. The control group was characterized by a greater percentage of male subjects with respect to the lung cancer group (62.5% and 52.5%, respectively). A Mann-Whitney test showed a statistically significant difference when comparing age (p = 0.01) across the two groups, while no difference was found when comparing the other clinical features (p > 0.06). The number of non-smoking subjects was equal for both groups (n = 8). The current smokers were present in a higher number than the ex-smokers in both groups (21 and 18 vs. 11 and 14, respectively). Statistical analysis on pack-years showed no significant difference between the two groups (p = 0.11). Hypertension was the most dominant comorbidity, with 10 control and 23 lung cancer subjects reporting it, followed by chronic obstructive pulmonary disease (COPD) and hypercholesterolemia (HCL). As the number of subjects reporting comorbidities other than hypertension was low compared to the dataset dimensions, only hypertension was included in the dataset for classification purposes. Regarding staging of the disease, half of the lung cancer subjects had stage I cancer, followed by stage II (n = 11), and stage III (n = 4). Five lung cancer subjects had benign cancer; thus, no pTNM staging was reported for them.  Figure 5 shows boxplots of all the sensor features for both lung cancer and control subjects, considering all the five sensors embedded in the proposed electronic nose. In order to assess if any significant difference was found for certain features when comparing lung cancer and control groups, statistical analysis was carried out with a Mann-Whitney test. Significant differences were found for the following features with a 1% significance level: ∆R for sensor TGS2620; ratio for sensor TGS822; d for sensors TGS2600, TGS2602, and TGS2620; d for sensor TGS2602; and d for sensor TGS2620.

Lung Cancer Classification
In this section, we report the results obtained with classification algorithms in discriminating lung cancer from control subjects. As described earlier in Section 2.4, algorithm hyperparameters were optimized with a grid-search approach on a fourfold CV. As the total number of subjects did not allow for a proper training-test set split, the CV folds were also used to assess the generalization capabilities of the algorithms. Results are reported in terms of mean and standard deviation of recall, precision, and accuracy metrics on the four CV test folds. Figure 6 reports the PCA plots for the first three principal components extracted from static and dynamic sensor features. Table 6 reports the results obtained when training the classification algorithms only on sensor data. Instead, Table 7 reports the results when using both sensors and clinical features when training the algorithms. The best classification result using sensor data was found when using AdaBoost as a classification algorithm, both static and dynamic sensor features and PCA with five components selected as the feature reduction method. A mean recall, precision, and accuracy of 67%, 64%, and 66% were reported. SVM was found to be the optimal classification algorithm when integrating clinical features in the model, with dynamic and clinical features as inputs to the model without any feature reduction method, as Table 7 reports. The clinical features that were considered in the classification model are the ones reported in Table 3, with only hypertension considered as a comorbidity due to the fact that diabetes, HCL, COPD, and obesity were reported in few subjects of the overall cohort, as highlighted in Table 5. Such a combination resulted in a mean recall, precision, and accuracy of 78%, 80%, and 77%, respectively. The ensemble technique used in this manuscript for classification achieved similar performance when considering the mean precision metric (0.79 ± 0.19 compared to 0.80 ± 0.12) but resulted in worse results for recall and accuracy. The AdaBoost ensemble algorithm employed static and clinical features as inputs for the model, with no feature reduction applied.

Lung Cancer Staging Recall Analysis
An additional analysis was carried out on the classification results on lung cancer and control subjects. Classification recall to lung cancer stage was analyzed with the classifier which achieved the best results on the CV folds used for testing purposes. As reported in Table 5, 20 lung cancer subjects were diagnosed with stage I lung cancer, 11 with stage II, and 4 with stage III. As the total number of subjects diagnosed with stage III was lower in comparison to stage I and stage II, two groups of subjects were considered for the lung cancer recall analysis: stage I and stage II-III. Table 8 reports the results for the best classification algorithms on the four CV folds used in the algorithm performance assessment. The best SVM classifier achieved recall of 78% on lung cancer stage I subjects, with a reduction to 71% for lung cancer stage II and stage III subjects.

Time Dependency Analysis
As a final step in our analysis, we wanted to determine if the elapsed time between breath collection and analysis had an effect on the classification results. Figure 7 shows the time difference in terms of time (hours) between breath collection at IEO and breath analysis at the laboratory where the electronic nose was located. All breath bags were analyzed within the same day, with a minimum time difference of 2 h and a maximum difference time of 9 h and 20 min. For subjects belonging to the lung cancer group, time difference mean and standard deviation were 4 h, 41 min ± 1 h, 8 min, while for the control group, they were 4 h, 33 min ± 1 h, 41 min. A Mann-Whitney U-test showed no significant difference across the two groups at the 5% significance level (p = 0.1). A value of 4.5 h was chosen as a threshold to split the dataset into two different groups. One group consisted of subjects that had their breath analyzed within a time-frame of 4.5 h, and the other group consisted of subjects analyzed after the 4.5 h threshold. This resulted in equally split datasets, each one composed of 40 subjects, with different proportions of lung cancer and control subjects, as reported in Table 9. An analysis of the cross-validation predictions was carried out to determine if significant differences were found when comparing the classification results for subjects who were analyzed before the 4.5 h threshold and for those who were analyzed after the identified time threshold. The results are reported in Table 9. Classification results for the two different models that were tested did not show significant differences in terms of mean recall and accuracy for both lung cancer and control subjects across the four folds used for cross validation purposes when comparing the results across the two datasets.  Table 9. Dataset split based on the time difference between breath collection and breath analysis. The threshold chosen for splitting the dataset was set to 4.5 h. Classification results are reported as mean recall (Re) and accuracy (Acc) over the 4 test folds used for the CV schema. Confidence interval of 95% is also reported.

Discussion
In this manuscript, two main topics related to exhaled breath analysis for disease diagnosis were addressed. The first one was the development of an electronic nose with five commercial gas sensors embedded for exhaled breath analysis, and the second one was the implementation of machine learning techniques for the analysis of gas sensor responses. This setup, composed of a custom device for breath analysis and of data analysis techniques, was tested in a prospective study with a cohort of 80 subjects, equally split between lung cancer and control subjects, with the aim of discriminating the two groups of subjects based on the exhaled breath analysis. Exhaled breath, together with blood and urine, is one of the three main biological fluids in which researchers are looking for lung cancer biomarkers. The identification of biomarkers able to determine the presence of lung cancer could help in reducing lung cancer's high mortality rate, allowing for an early diagnosis and an improvement in lung cancer survival rates. Exhaled breath has been extensively studied since [8] first reported the content of exhaled breath in terms of chemical compounds [8]. Several researchers have later analyzed the chemical content of exhaled breath in terms of VOCs concentration [24,40]. Still, to date there is no consensus on the VOCs which are significant for lung cancer, i.e., those compounds that are present only in exhaled breath of subjects affected by lung cancer [12]. Therefore, rather than analyzing the concentration of specific compounds in exhaled breath, an analysis on the exhaled breath mixture is typically carried out [21,37]. Such analysis can be performed with so-called electronic noses, which are devices with embedded sensors sensitive to compounds in the air. In this study, a custom electronic nose with five commercial gas sensors embedded was described. The electronic nose was used to analyze the exhaled breath of 80 subjects, equally split between lung cancer and control groups. The best performance in terms of discrimination between the two groups was achieved when integrating both dynamic sensor features and clinical features into an SVM model. Such a model resulted in a mean recall, precision, and accuracy of 78%, 80%, and 77% across the four CV folds, respectively.
The results obtained in the study described in this manuscript are lower in comparison to those reported in some of the listed studies. Table 10 reports a comparison between the results achieved with the proposed device and several research studies. From the reported data, we can highlight that the sample size is different throughout the studies, making it difficult to assess a proper comparison. Furthermore, several approaches regarding sensor technologies were used in the last few years, ranging from custom electronic noses based on quartz microbalance sensor arrays, MOS sensors, or type-different sensor arrays to the use of commercial devices such as the Cyranose 320. When comparing our results to what was found in other studies, we can see comparable recall and accuracy with studies that reported similar values in the range of 70-80%, but they are still significantly lower than some studies which reported very high recall and accuracy values in lung cancer classification, even greater than 95%. A proper comparison across studies, characterized by different cohorts and sensing technologies, is a challenging tasks. The same cohort of subjects should be recruited, and the collected exhaled breath must be analyzed with the proposed different methods to address if the differences in classification performance are due to the different cohorts and sampling methods or to the sensing technologies. Liu et al. achieved an accuracy, recall, and specificity of 95.75%, 94.78%, and 96.96% on a cohort of 214 subjects with a sensor system composed of 11 sensors and an ensemble learning framework for classification [37]. Tirzïte et al. reached between 95.8% and 96.2% recall when discriminating 252 lung cancer patients from 223 healthy volunteers, with logistic regression analysis as the classification algorithm and the Cyranose 320 electronic nose as the breath analysis method [18]. Chen et al. designed an electronic nose with custom gas sensors, based on a metal ion induced assembly of graphene oxide, and reached a recall of 95.8% and a specificity of 96.0% when testing it on a cohort of 106 subjects (with 48 affected by lung cancer) [41]. We identified several issues that could cause lower performance with respect to the results reported in the literature, summarized in Table 11 and described hereafter.  Table 11. Confounding factors and issues related to exhaled breath analysis for disease diagnosis associated with the ability to interpret and solve them for a possible clinical use.

Interpretable Solvable
Tedlar Bags Cleaning Bag cleaning should be carried out before breath sampling to remove unwanted VOCs [23].
Yes, but consensus must be reached across researchers to determine a common strategy for bag cleaning.
Sensor Washout MOS Sensors should always have the same baseline value when analyzing different samples.
Not easy. Environmental conditions and sensor drift could cause sensor baseline changes over time.
Sensor A-Specificity Commercial MOS Sensors are aspecific and targeted to generic VOC mixture detection. Difficult to determine the presence of a specific substance in the sample under analysis.
Not easy. One possibility is offered by the integration of commercial MOS sensors with custom sensors targeted to specific substances in the same electronic nose [43].
VOCs Concentration VOCs in exhaled breath could have concentration as low as in the parts per billion range.
Yes. Pre-concentration techniques with sample absorption/desorption could help in increasing detection capabilities [19].
Three main issues related to the usage of Tedlar bags as temporary storage of exhaled breath were determined. First, the use of Tedlar bags, in particular when the exhaled breath analysis is not performed immediately, could lead to the leakage of VOCs if the analysis is not carried out immediately, with VOC concentration decreasing with bag storage time [23]. Our time analysis, with results reported in Table 9, did not show significant differences in terms of classification performance when splitting the subjects based on the time difference between breath collection and analysis. Nonetheless, the time difference between breath sample collection and breath sample analysis, as depicted in Figure 7, can be considered a potential limitation of the presented study, and further analysis should be carried out minimizing the time difference between sampling and analysis. Second, Tedlar bags require the subjects to fill the bag with a single total lung capacity expiration maneuver. Not only could this procedure cause distress to the subject, but it could also result in an incompletely filled bag. Third, in the presented study, no cleaning procedure was carried out on the Tedlar bags prior to breath sampling, as other authors instead reported [23][24][25]40]. As far as the cleaning procedure of Tedlar bags is concerned, no consensus has been reached among researchers for the best strategy to carry out for properly cleaning the bags. In fact, some studies report the use of nitrogen as cleaning gas, while other report the usage of argon [24,25,40]. In addition to the uncertainty on the cleaning gas, there is still no common technique on the heating process to follow during the cleaning procedure [9]. Taking into consideration the issues related to Tedlar bags, the use of other techniques for breath temporary storage could help in concentrating the VOCs found in exhaled breath, which can have a concentration range as low as in the parts per billion range. The great advantage offered by a process of gas pre-concentration, such as gas desorption tubes, is the potential increase in the detection ability of the device used for the subsequent breath analysis (e.g., an electronic nose) [44,45]. In addition to the above described issues, for a proper comparison across studies, it is necessary to consider the sensors embedded in the proposed electronic nose, with three potential issues of the proposed setup that could be pinpointed. As the first issue, we identified the a-specificity of the employed sensors. As described in Table 1, the embedded sensors were not sensitive to specific substances, but rather to mixtures of compounds in the air. As a consequence, a mixture of compounds detected by the sensors may not be sufficiently compound-specific to achieve proper discrimination between lung cancer and control subjects. Second, the studies proposed in the literature often involve a larger number of sensors in the electronic nose array, ranging up to a maximum of 32 sensors in the Cyranose 320 [18,36]. In our electronic nose, a smaller array composed by only five sensors was designed with the aim of developing a smaller and portable electronic nose. As the last issue, the sensor washout and cleaning process has to be seriously taken into consideration. With MOS sensors, baseline changes over time can be noticed due to either environmental changes or sensor drift, causing difficulty in properly interpreting the results as the sensor baseline value is different across multiple samples. All these factors together could have caused the low performance of the sensor array in the discrimination between the two groups of subjects and must be properly addressed, should a clinical application of exhaled breath be carried out in the future. In the present work we carried out an analysis on the ability of our device to discriminate lung cancer subjects with different lung cancer stages according to the pTNM staging method. We found a recall value of 78% for stage I lung cancer subjects and of 71% for stage II-III subjects. A similar result was found in a paper in which the authors found a recall value of 92% for stage I subjects and of 58% for stage II/III/IV subjects [39]. Other authors have reported results on the discrimination of subjects with different lung cancer stages or subtypes. Mazzone et al. reported the ability of discriminating lung cancer stages [46]. Kort et al. were able to diagnose subtypes of lung cancer in a multi-center prospective study [47]. As we previously described, the identification of lung cancer specific VOCs is already a challenging task, and attempting to determine which VOCs are specific to each lung cancer stage can be even more challenging. Linking the progression of lung cancer or different disease sub-types to the compounds detected by an electronic nose is not straightforward, and future studies are required to assess if such diagnosis methodology is feasible [39].
As far as the application of exhaled breath analysis in clinical practice is concerned, we are aware of only one clinical trial currently underway on exhaled breath analysis for lung cancer early diagnosis. This clinical trial (NCT02612532), sponsored by Owlstone Ltd (UK), has the aim of defining the lung cancer diagnostic accuracy of the ReCIVa breath sampler coupled to a gas analyzer based on spectrometer techniques. This clinical trial started in 2015 and had an estimated enrollment of 520 subjects. No results have been published so far. The publication of results on exhaled breath analysis on such a larger population could help researchers in moving towards the right direction for lung cancer early diagnosis based on exhaled breath analysis. In fact, even if studies in the literature focusing on exhaled breath analysis for disease diagnosis had large cohorts of subjects, the enrolled subjects were affected by several different diseases, with subjects diagnosed only with lung cancer composing a small percentage of the overall cohort [29,48]. Therefore, clinical trials involving a large number of subjects are mandatory to understand if exhaled breath analysis has the potential to be used as a valid clinical tool for lung cancer early diagnosis. This study is not a clinical trial either, as it was designed only as a prospective study aiming at assessing whether the proposed electronic nose provided satisfactory results to be used in large scale clinical trials. Furthermore, a database with exhaled breath data collected in several studies and clinical trials should be made public and released to researchers, allowing them to carry out data analysis and validation of the collected data, potentially integrating data from several sources in a single database.
We believe that the integration of exhaled breath with other biological fluids, such as blood serum, urine, and saliva, could potentially improve the ability for early disease diagnosis, in particular for lung cancer [22,49]. In fact, in the last few years, researchers have focused on the analysis of such biological fluids, looking for additional biomarkers in addition to those offered by the VOC mixture in exhaled breath [50][51][52]. Such multi-fluid analysis, designed as a simple test to be administered by general practitioners, could have the potential to improve the early diagnosis of lung cancer, thus decreasing its extremely high mortality rate.

Conclusions
The electronic nose described in this manuscript, composed of an array of five gas sensors targeted to VOC detection, coupled with feature extraction and classification strategies, achieved satisfactory results on a cohort of 80 subjects, equally split into two groups of control and lung cancer. Median recall, precision, and accuracy values of 78%, 80%, and 77% were reached using a fourfold cross-validation approach, embedding both features extracted from the gas sensor response curves and clinical features in the classification model. A recall of 78% was found for stage I lung cancer subjects, with a decrease to 71% for stage II-III subjects. An analysis on the effects of the time distance between breath collection in Tedlar bags and its subsequent analysis showed no significant difference in the classification results. Two main future activities are planned with the aim of improving the classification capabilities of the device: (1) expansion of the gas sensor array with the addition of commercial gas sensors, thus increasing the number of extracted features and potentially improving classification performance, and (2) integration of custom developed gas sensors, based on molecularly imprinted polymers, in the electronic nose array. Once these improvements are be completed, the proposed electronic nose could be considered as a candidate prototype tool for exhaled breath analysis on larger cohorts of subjects.

Conflicts of Interest:
The authors declare no conflict of interest.