Volatile Organic Compounds in Exhaled Breath as Fingerprints of Lung Cancer, Asthma and COPD

Lung cancer, chronic obstructive pulmonary disease (COPD) and asthma are inflammatory diseases that have risen worldwide, posing a major public health issue, encompassing not only physical and psychological morbidity and mortality, but also incurring significant societal costs. The leading cause of death worldwide by cancer is that of the lung, which, in large part, is a result of the disease often not being detected until a late stage. Although COPD and asthma are conditions with considerably lower mortality, they are extremely distressful to people and involve high healthcare overheads. Moreover, for these diseases, diagnostic methods are not only costly but are also invasive, thereby adding to people’s stress. It has been appreciated for many decades that the analysis of trace volatile organic compounds (VOCs) in exhaled breath could potentially provide cheaper, rapid, and non-invasive screening procedures to diagnose and monitor the above diseases of the lung. However, after decades of research associated with breath biomarker discovery, no breath VOC tests are clinically available. Reasons for this include the little consensus as to which breath volatiles (or pattern of volatiles) can be used to discriminate people with lung diseases, and our limited understanding of the biological origin of the identified VOCs. Lung disease diagnosis using breath VOCs is challenging. Nevertheless, the numerous studies of breath volatiles and lung disease provide guidance as to what volatiles need further investigation for use in differential diagnosis, highlight the urgent need for non-invasive clinical breath tests, illustrate the way forward for future studies, and provide significant guidance to achieve the goal of developing non-invasive diagnostic tests for lung disease. This review provides an overview of these issues from evaluating key studies that have been undertaken in the years 2010–2019, in order to present objective and comprehensive updated information that presents the progress that has been made in this field. The potential of this approach is highlighted, while strengths, weaknesses, opportunities, and threats are discussed. This review will be of interest to chemists, biologists, medical doctors and researchers involved in the development of analytical instruments for breath diagnosis.


Introduction
Respiratory diseases-including lung cancer, chronic obstructive pulmonary disease (COPD) and asthma-are increasing worldwide. The World Health Organization (WHO)

Articles Selection
A single reviewer (IAR) undertook an extensive literature search covering the years 2010-2019 (literature search was completed on 12 February 2020), using the keywords "VOCs asthma", "VOCs COPD" and "VOCs lung cancer", with the following databases being used: Springer, Web of Science, Science Direct and Wiley.
By considering only articles written in English and omitting reviews and book chapters, a total of 2268 papers were identified. Subsequently, by checking the reference list of these selected articles, additional studies were identified and included. Figure 1A schematically shows the method used for the article selection. Figure 1B illustrates the number of articles found for each category of disease as a function of year, 2010-2019. This shows that the numbers of asthma and lung cancer studies are comparable.
However, using a well-defined selection criteria (see next section), the number of articles that are reviewed in this paper for lung cancer is considerably higher than those for asthma. Figure 1C presents the number of studies by country.
By considering only articles written in English and omitting reviews and book chapters, a total of 2268 papers were identified. Subsequently, by checking the reference list of these selected articles, additional studies were identified and included. Figure 1A schematically shows the method used for the article selection. Figure 1B illustrates the number of articles found for each category of disease as a function of year, 2010-2019. This shows that the numbers of asthma and lung cancer studies are comparable. However, using a well-defined selection criteria (see next section), the number of articles that are reviewed in this paper for lung cancer is considerably higher than those for asthma. Figure 1C presents the number of studies by country.

Criteria for Selection of Articles
To make this review manageable, articles were excluded using the following criteria: • No investigation of the VOCs profile, but non-volatile markers; • Targeted diseases caused by exposure to harmful VOCs; • VOCs related to the effects of therapy; • Sampling and/or analyses methods only; • Sensitivity, specificity, or accuracy of existing methods, with no focus on clinical studies; • Sensor development used for validation standards of previously reported markers of certain diseases; • Risk assessment and occupational exposure studies; • Nanomaterials with application in clinical diagnosis; • Smoking and/or exposure to tobacco products; • Predictive models constructed using VOCs targets collected from the literature; • Non-clinical, in vitro and animals' studies.
These exclusion criteria dramatically reduced the number of clinical studies to sixty.

Criteria for Selection of Articles
To make this review manageable, articles were excluded using the following criteria: • No investigation of the VOCs profile, but non-volatile markers; • Targeted diseases caused by exposure to harmful VOCs; • VOCs related to the effects of therapy; • Sampling and/or analyses methods only; • Sensitivity, specificity, or accuracy of existing methods, with no focus on clinical studies; • Sensor development used for validation standards of previously reported markers of certain diseases; • Risk assessment and occupational exposure studies; • Nanomaterials with application in clinical diagnosis; • Smoking and/or exposure to tobacco products; • Predictive models constructed using VOCs targets collected from the literature; • Non-clinical, in vitro and animals' studies.
These exclusion criteria dramatically reduced the number of clinical studies to sixty.

Data Structuring
For the sixty clinical studies selected, the following information was extracted: study design, investigated diseases, sampling methods, patient and control characteristics, analytical platform, statistical approach, measured outcomes, identification of VOCs and their quantification (where applicable) and diagnosis performance, e.g., expressed as sensitivity, specificity, accuracy, area under the curve, etc. Owing to the multitude and heterogeneity of the information, the following three tables have been constructed for convenience: • Table 1 presents details on the type of sample that is collected, participants' number, and place (hospital, country) where the samples were collected; • Table 2 summarizes the analytical platforms used, key outputs, statistical approach and diagnosis accuracy; • Table 3 reports the VOCs that have been identified to be associated with the three respiratory diseases.
Smokers have volatiles in their breath that result in confounding biomarkers and, hence, these must be taken into account. For lung cancer patients, 601 participants reported to be active smokers, 602 were former smokers and 328 never smoked. For COPD, 257 people were active smokers, 361 former smokers and 62 never smoked. For asthma, 5 were declared to be active smokers, 52 were former smokers and 38 never smoked. Concerning the smoking status of the controls, a total number of 847 were active smokers, 395 were former smokers and 936 never smoked. The differences between the total number of patients and smokers is because the smoking status was not revealed in all of the papers, but also because a number of the studies (especially those related to asthma) involved children. From the total number of participants, 421 were children with 229 children having asthma and 192 children acting as the controls. Of the selected clinical studies, thirty-two of them reported that they used mixed expired breath [13,17,19,21,33,34,36,38,[41][42][43]45,46,54,[59][60][61][62]64,[66][67][68][69][71][72][73]75,76,78,80,83,85] (consisting normally in a mixture of gaseous breath, that also includes the volatile components) collected by simple expiration in bags, tubes with absorbent materials or directly into the used instrumentation (as in the case of E-noses, for example). A total of twenty-two of them reported the use of alveolar breath [14][15][16]18,22,32,35,37,44,48,[56][57][58]65,70,74,77,79,81,82,84,86] (collected at the appropriate time by monitoring CO 2 levels as a function of time). Three studies collected exhaled breath condensate [20,39,40] (all for COPD investigations). One study collected both mixed and alveolar breath [55] and two studies examined mixed breath plus sputum [47,63]. The clinical studies included in this review were undertaken in 18 different countries. The information summarized above is presented in more detail in Table 1.

Analytical Platforms Used for Investigating Breath Volatiles Associated with Asthma, COPD, and Lung Cancer
Several analytical spectrometric techniques can be used for analyzing volatiles contained in exhaled breath samples. When choosing an analytical method, many aspects need to be considered, including the advantages and disadvantages of a particular analytical technique, and whether offline or on-line sampling is needed. Below, we describe the key analytical instruments that have been used to investigate breath volatiles and lung diseases.

GC-MS Instrumentation
For offline measurements, GC-MS is the most powerful tool, with a high sensitivity (sometimes lower than ppb range) and, more importantly, a high potential for both identification and quantification of unknown components from complex biological matrixes [4,[8][9][10]88,89]. Moreover, by using different columns and detectors a great versatility in targeted analyses can be achieved [90,91].
Owing to its size and length of analysis (tens of minutes to hours) GC-MS cannot be used at clinical points of care, even if, at the research level, GC-MS remains the gold standard for VOC analysis in many fields [92][93][94][95]. GC-MS analysis requires the samples to be collected, either in special bags or onto absorbent materials, and then transported to the laboratories, resulting in samples being stored for days and even weeks before analysis. Of the 60 clinical studies being reviewed in this paper, 29 used various types of GC-MS systems. Two groups used two-dimensional GC, explicitly GC×GC-FID [69] and TD-GC×GC-ToF-MS [13] for lung cancer investigations. Caldeira and co-authors [46] used TD-GC×GC-ToF-MS to investigate exhaled breath metabolomes of patients with allergenic asthma.

PTR-MS and SESI-MS Instrumentation
PTR-MS and SESI-MS can be, and have been, used offline to analyse breath samples, but they come into their own for online analysis. However, the advantages of real-time analysis, which allows rapid changes in volatile concentrations to be detected, comes at the expense of identifying the volatiles with a high level of confidence [96][97][98][99]. Nevertheless, the near patient analyses mean that samples do not need to be transported and hence storage is not necessary. Consequently, deterioration of the breath samples and storage errors are avoided. As for GC-MS, PTR-MS, and SESI-MS require skilled operators. Generally, the cost of a PTR-MS, and particularly PTR-ToF-MS, being between EUR 200,000 and 500,000 are far more than the cost of GC-MS instruments (EUR 60,000-150,000) and, hence, there are fewer PTR-MS studies compared to GC-MS. Although the cost of a SESI-MS is lower than that of a GC-MS, it has only been rarely used. PTR-MS was used in two studies of lung cancer [33,34], and one study for discriminating COPD and emphysema [32]. SESI-MS was involved in a single study for COPD diagnosis [20]. Another soft chemical ionization mass spectrometric technique that could be used in real-time for discovery programmes is the Selected Ion Flow Tube Mass Spectrometry but, to our knowledge, no study of breath volatiles and lung disease involving this instrument has been reported. No study is presented for SIFT-MS.

IMS Based Instrumentation
Another category of analytical instrumentation suitable for VOCs analysis in real or near to real time is ion mobility spectrometry (IMS), both as a standalone tool and coupled with GC columns that provide a pre-separation. The costs of instrumentation are considerably lower than the previous mentioned techniques based on mass spectrometry (ranging from between EUR 7,000 and 30,000 for standard IMS, while GC-IMS can range between EUR 50,000 and 60,000). That no vacuum system is required dramatically reduces the size and power requirements. Together with its ease of use and robustness, IMS, and particularly GC-IMS, is extremely suitable for use in clinical environments at the point of care [100][101][102][103][104]. The most common types are the classical IMS, a-IMS (aspiration IMS), FAIMS (Field Asymmetric wave IMS) and DMS (differential mobility spectrometry). For improved analytical dimensionality, GC-IMS and MCC-IMS (multi-capillary column IMS) are also used [91]. Amongst the clinical studies that we review, MCC-IMS has been used to investigate patients with COPD [18,35] and lung cancer [36,74]. One other study used a double approach by comparing GC-IMS and GC-APCI-MS (atmospheric pressure chemical ionization MS) for investigating breaths samples from patients with COPD [37].

Sensors and Electronic Noses
Analytical instrumentation related to online measurements also comprises simple sensors and electronic noses (e-noses). They are usually cheap, easy to operate and have the capacity of real-time monitoring based on pattern recognition algorithms. Moreover, they are often equipped with software that compares VOCs-emitted profiles of ill patients with those of healthy individuals [15,64,105]. Their main drawback is their lack of selectivity, VOCs are not identified, reproducibility may be affected by interferences, thereby diminishing the reliability, and robustness. E-noses were successfully applied in discriminating exhaled air of patients with asthma from healthy controls; a commercial system model Cyranose 320, consisting of an array of 32 organic polymer sensors, has been used [106]. The same nanosensor array (Cyranose 320) has been utilized for discriminating patients with lung cancer and COPD, when it has been shown that an electronic nose is able to distinguish the VOCs pattern in exhaled breath of lung cancer patients from healthy controls; the authors pointed out in a realistic manner that, although the electronic nose may become a very convenient tool for a physician, this instrument may qualify as either a screening tool or a pre-diagnostic tool by selecting patients for further diagnostic and testing procedures [107]. Analysis of exhaled VOCs in order to discriminate COPD phenotypes, using a Bionote electronic nose (comprising of a seven quartz microbalance (QMB) sensor array, with the sensors being covered with anthocyanins that are used as chemical sensitive materials), has been described in several original research papers [108,109].
Application of e-noses and other types of sensors to breath analysis has been addressed by a review focusing to methodological issues related to applying e-noses to breath analysis. Although they possess strong capabilities in rapidly discriminating samples of exhaled breath (the so-called "breathprint"), the e-nose is not currently ready for point-of-care use [110].
Another valuable review summarizes the role electronic noses play in distinguishing different endotypes by using VOCs in exhaled breath; breath sampling and metabolism of VOC biomarkers are also summarized [111].
Of the 60 clinical studies included in this review, nine studies used sensors or enoses [15,[38][39][40]42,64,66,77,83], while five studies used both sensors or e-noses and an additional GC-MS (or a related) technique as a confirmation method [21,55,58,78,85]. For example, Cyranose 320 (Smiths Detection, Pasadena, CA, USA) e-noses were used to discriminate between asthma and COPD [38,40]; another type of e-nose, Aeonose (The eNose Company, Zutphen, The Netherlands) was utilized to differentiate between children with asthma and cystic fibrosis [42]. The Cyranose 320 system is a portable device that incorporates 32 chemical sensors that provide a different response to various VOC mixtures; these chemiresistor sensors are made from carbon black nanocomposites that have the ability to change their resistance as a response to VOC exposure [39]. Aeonose is an easy-to-use hand-held e-nose, weighing just 650 grams, equipped with three metal-oxide sensors, which behave as semiconductors at higher temperatures [42].
In terms of other sensors, colorimetric sensor array [64], metal oxide gas sensors [15] and nanosensors based on organically functionalized gold nanoparticles [58] have been used to investigate their potential for use in cancer diagnosis.

Fourier-Transform Ion Cyclotron Resonance Mass Spectrometry
Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) is an analytical technique that can be used for targeted detection and quantification of VOCs. Using "a hybrid linear ion trap Fourier transform (FT) ion cyclotron resonance (ICR) mass spectrometer (MS) equipped with a TriVersaNanoMate ion source with an electrospray chip (nozzle inner diameter 5.5 mm)" researchers claim to have identified specific carbonyl cancer markers (mainly 2-butanone, 3-hydroxy-2-butanone, 2-hydroxyacetaldehyde and 4-hydroxyhexenal) that can differentiate benign pulmonary disease from early-stage lung cancer [67,68,71,73].

Trained Dogs
It is worth mentioning that trained dogs have been used to "sniff" for diseases, with claims of good performances being apparently comparable, if not better, to various analytical devices. In two studies included in the present review, trained dogs were used [44,83], while a new article related to two-step investigation of lung cancer detection, where the abilities of sniffer dogs were proved in maintaining their discriminative capacity under long-term, and in different types of environments, appeared after the articles' collection period closed [112].

Features and Performance of Analytical Platforms
All the clinical studies reviewed in this paper describe various methods of optimization at different levels (sampling, analysis, data processing and interpretation, etc.) in order to enhance diagnostic capabilities. A summary of sensitivity and specificity obtained by different studies is presented in Figure 2. Multiple statistical approaches have been used to classify the detected VOCs using different models. Details about each clinical study, including analytical platforms, statistical approaches, and outcomes are presented in Table 2.

Asthma
Asthma is a chronic inflammatory condition, which produces reversible airways obstruction, often beginning in childhood, and characterized by triggering bronchospasms. The common symptoms include short episodes of chest tightness, wheezing, coughing, Box-and-whisker plots -

Asthma
Asthma is a chronic inflammatory condition, which produces reversible airways obstruction, often beginning in childhood, and characterized by triggering bronchospasms. The common symptoms include short episodes of chest tightness, wheezing, coughing, and a shortness of breath, with these symptoms being in some people more pronounced during the night or following strenuous physical exercises [4,17,113]. Although appearing from partially unknown causes, it is considered that asthma is often caused by environmental pollution, irritant agents, allergens (pollen, dust, fur etc.) or drugs (aspirin and beta blockers) [114]. Both asthma and COPD diagnosis is based on symptoms, long term response therapy lung capacity tests, and spirometry tests, which includes: (1) FVC (forced vital capacity): largest volume of air that can be forcefully exhaled and (2) FEV (forced expiratory volume): how much air can be exhaled in one second) [115]. Gastroesophageal refluxes, eosinophilia, neutrophilia, allergic rhinitis, obstructive sleep apnea and atopy are conditions frequently occurring in people with asthma [27,47]. The atopy (the triad of asthma, allergic rhinitis and eczema together) is the predisposition towards developing hypersensitivity reactions and triggering exacerbations. An exacerbation is an asthma attack crisis that may also appear in non-atopic asthmatics. Asthma cannot be cured; the prevention includes avoiding the allergenic and irritants agents and the use of inhaled corticosteroids. In 2015, 358 million people were globally registered as diagnosed with asthma, with 397,100 deaths attributed to the disease [2,3].

Diagnosis of Asthma Based on Specific VOCs
The main disadvantage of traditional tests used for diagnosing asthma resides in the fact that they are time consuming and some of them are invasive. Both invasive and non-invasive (spirometry and fractional exhaled nitric oxide) diagnostic techniques are used. However, non-invasive diagnosis based on exhaled VOCs is promising, and hence has recently been gaining increasing attention. In eight studies, asthma diagnosis was tested using GC-MS analysis. For example, Dallinga et al. [17] analyzed the breath samples of 63 asthmatic children and compared them to breath samples from 57 healthy controls (5 to 16 years old). Only eight VOCs were found to be needed to discriminate diseased from healthy children (with a claim of 92% correct classification, a sensitivity of 89% and a specificity of 95%) [17]. A set of eight compounds was used in another study to discriminate between healthy and asthmatic children; however just one of them, 2-octenal, was proposed as a certain marker of asthma, because the authors concluded that the others may have other possible origins [65].
The ability to diagnose allergenic asthma-sometimes combined with allergic rhinitis in children-was tested using HS-SPME/GC-qMS and a comprehensive two-dimensional GC×GC-ToF-MS [46,61]. Almost similar statistical tools were involved for data processing, and the two-dimensional GC×GC-ToF-MS proved its superiority in comparison to GC-MS. In the first study by Caldeira et al., [61] a set of 28 VOCs was used to discriminate between asthmatic and control group, with a classification rate of 88% [61]. In their second study [46], a pattern of just six chemicals, namely nonane, 2,2,4,6,6-pentamethylheptane, decane, 3,6dimethyldecane, dodecane, and tetradecane, were used, with a classification rate of 98% being achieved, with 96% sensitivity (meaning that only ∼4% allergic asthma children were misclassified as controls) and 95% specificity (meaning only ∼5% were classified as false positives). All six chemicals were, therefore, proposed as biomarkers of asthma [46]. Exacerbations in case of atopic asthmatics children were predicted based on emitted VOCs analyzed by GC-MS [45,76]. In the first study, the applied classification model used seven VOCs that provided a correct classification rate of 91% for those patients, who experienced exacerbations (sensitivity of 79% and specificity 100%). Moreover, they demonstrated that the FeNO and lung function were not predictive for exacerbations [45]. The classification model used in the second study was based on seven selected VOCs, three aldehydes, one hydrocarbon, one ketone, one aromatic compound, and one unidentified VOC, which achieved a sensitivity of 88% and a specificity of 75%, with AUC of ROC 90% [76]. Electronic noses were used for discrimination between asthma and COPD, asthma and cystic fibrosis, and for asthma diagnosis [38,40,42].
Aeonose, a patient-friendly and easy to use e-nose device, was utilized to test the discrimination and diagnostic accuracy for children with asthma and cystic fibrosis. The reported mean values for discrimination between asthma and cystic fibrosis were as follow: AUC = 0.90 (95% CI), sensitivity 89%, specificity 77%, while for differentiation between healthy controls and cystic fibrosis the mean scores were slightly lower: AUC = 0.87, sensitivity 85% and specificity 77%. However, the authors reported that diagnostic accuracy in the case of asthma and healthy controls discrimination was lower compared with the first two cases (AUC = 0.79, with a sensitivity of 74% and specificity 91%) [42]. A Cyranose 320 instrument was also used discriminate between asthma and COPD in two studies [38,40]. Consequently, an 88% accuracy for distinguishing between asthma and COPD was obtained in the first study [38]. In the second study, two groups (asthmatics and COPD patients), both with and without gastro-esophageal reflux disease (GORD) were investigated, in an attempt to distinguish patients with GORD from those without. The discrimination between patients with COPD, with and without GORD, achieved an accuracy of 67.6%, while the asthmatic group with GORD was differentiated from asthmatics without GORD with an 85% accuracy.

COPD
COPD can coexist with asthma and can actually occur as a complication of chronic asthma. Generally, after the age of 65, most people with asthma will also develop COPD. In this setting, COPD can be differentiated by increased airway neutrophils, abnormally increased wall thickness, and increased smooth muscle in the bronchi [116]. Although having most of the common symptoms of asthma, unlike asthma, COPD is a progressive disease characterized by sputum production and irreversible airways obstruction, which does not improve much with the use of bronchodilators [116]. The most common cause of COPD is tobacco smoking [115,116]. In 2015 only, COPD globally affected about 174.5 million people and it resulted in 3.2 million deaths [2,3].

Diagnosis of COPD Based on Specific VOCs
The COPD diagnosis is almost similar to that of asthma, while a VOC analysis is also possible. COPD was investigated by GC-MS in six studies included in this review [14,19,54,63,79,81]. Phillips and co-authors involved 119 patients with COPD and 63 controls in their study. Machine learning approaches were used and models were automatically generated, which correctly predicted the diagnosis in 64% of controls and 79% of patients, obtaining an AUC of ROC of 0.82 [14]. Better discrimination was obtained by Van Berkel et al., [54] who used six VOCs that correctly classified 92% of the subjects with a sensitivity and specificity of 98 and 88%, respectively. Moreover, 14 out of 15 steroidnaïve patients were also correctly classified [54]. Besides discriminating between COPD group and healthy controls, the identification of various COPD subgroups has also been achieved [63]. Notwithstanding, Pizzini and al. went into more details and succeeded to perform differential diagnosis between patients with COPD and COPD with acute exacerbations-a complication caused by infectious and non-infectious agents [81].
It is widely acknowledged that smoking results in respiratory disease development, including COPD. In support of this, Gaida and co-authors developed a dual center study in order to compare the VOCs emitted by smokers and non-smokers, with the volunteers having or not having COPD [19]. Their results highlighted that active smokers are clearly discriminated from the non-smokers. Furthermore, by characterizing 134 VOCs, they were able to provide evidence for 14 VOCs related to COPD.
Real time SESI-HRMS (Secondary Electrospray Ionization-High-Resolution Mass Spectrometry) was used as a diagnostic tool for COPD. A number of 1441 different VOCs were identified, but only 43 were used to discriminate between groups, obtaining an accuracy of 89%, a sensitivity of 93% and a specificity of 86% [20].
PTR-MS was utilized to explore breath samples of heavy smoker patients with emphysema [32], patients who are at risk to develop COPD, based on the hypothesis that emphysema is defined by airways inflammation that alters the composition of the exhaled air. Even if the authors reported that in COPD/emphysema screening the proposed method did not provide a valuable diagnostic tool, a series of VOC markers associated with this disease are presented [32].
A multi-capillary column (MCC-IMS) was used to diagnose COPD in comparison with COPD plus bronchial carcinoma (BC). The statistical learning methods applied were able to distinguish between the patients groups. Healthy and COPD groups were discriminated with a 94% accuracy, while BC on COPD/no-COPD was classified with a 79% accuracy [35]. Besa et al. also used MCC-IMS to differentiate COPD patients from healthy subjects. A number of 137 spectral peaks proved to be statistically significant between the COPD, healthy smokers and nonsmoker groups, while just six VOCs correctly discriminated the COPD patients from healthy controls with a 70% accuracy [18]. Moreover, 15 peaks discriminated between healthy smokers and healthy nonsmokers [18]. A prototype of a compact, closed gas loop GC-IMS was developed and used in an attempt to find correlations between volatiles from COPD patients and controls [37]. A second approach was made to provide a comparison between the results obtained and those acquired by using a modified mass spectrometer with atmospheric pressure chemical ionization with GC pre-separation (GC-APCI-MS). In the case of GC-IMS, three VOCs highlighted significant differences between the COPD and healthy groups, while in the case of GC-APCI-MS, one distinctive VOC, 2-pentanone, has been identified as a COPD specific marker [37].
Ultrafast gas chromatography equipped with an electronic nose detector (FCG eNose) has been used to discriminate between patients with COPD and healthy controls, using a set of 17 VOCs, which correctly classified the groups with an 82% accuracy, 96% sensitivity and 91% specificity [85]. Hattesohl et al. used a Cyranose 320 eNose instrument to measure VOCs patterns of patients with COPD with and without alpha 1-antitrypsin (AAT) deficiency [39]. These authors proved that an e-nose system can differentiate VOC prints of COPD patients with AAT deficiency by obtaining a cross-validation value of 82% (with a sensitivity of 100% and a specificity of 100%) when exhaled breath condensates of AATdeficiency and COPD groups were compared. In pure exhaled breath, the cross-validation value was lower, being just 58.3% (with a sensitivity of 1.00 and a specificity of 1.00).

Lung Cancer
Malignant tumors, which are formed due to uncontrolled cell growth tissues localized in the lungs, are defined as lung cancers. The most common symptoms that could predict the onset of lung cancer are coughing, a shortness of breath, a pain into the chest and weight loss. It is considered that about 85% of lung cancers are caused by tobacco smoking, with the remaining maximum 15% of cases resulting from exposure to radiation, radon, asbestos, and various forms of air pollution. Other causes result from passive smoking or genetic factors [117].

Types of Lung Cancer
The primary lung cancers are known as carcinomas (LC) that, according to the histological type, belong to two main categories: small-cell lung carcinoma (SCLC) and non-small-cell lung carcinoma (NSCLC). SCLC consists of dense cells containing neurosecretory granules in the form of blisters full of neuroendocrine hormones; a reason why these kinds of tumors are associated with endocrine or paraneoplastic syndromes. SCLC accounts for about 15% of all lung cancer worldwide [118]. NSCLC accounts for approximately 85% of lung cancers. The most common types of NSCLC are squamous cell carcinoma, non-squamous cell carcinoma (which include adenocarcinoma), large cell carcinoma, and several other types that occur less frequently. The most frequently appear-ing is adenocarcinoma, located generally peripherally in the lungs [119]; this form of LC accounts for approximately 40% of all lung cancers [120]. Molecular testing allows for possible mutations in the adenocarcinomas to be identified; the most frequently appearing mutations are summarized in Figure 3. Squamous cell carcinomas tend to be centrally located in the lungs; they are more common in men than in women, and are mostly associated with smoking [122]. Large cell carcinoma is a malignant neoplasm composed of large tumor cells resulting from transformed epithelial cells in the lungs. It can be differentiated from squamous cell carcinomas and adenocarcinomas by light microscopy [123].

Diagnosis of Lung Cancer Based on Specific VOCs
Lung cancer is often diagnosed by chest radiographs or by computed tomography; however, the diagnosis needs to be confirmed by biopsy, which is an invasive, time consuming and expensive diagnosis method with risks. Therefore, many lung cancer studies of breath VOCs have been undertaken in the hope to discover breath biomarkers of the disease and thereby realise a simple non-invasive test. However, despite intense work, to date, no breath test for lung cancer has been forthcoming. A major reason for this is that there has been little consensus between studies, with limited agreement as to which breath volatiles (or pattern of volatiles) can be used to discriminate people with lung cancer from those without. Although many breath volatiles have been proposed to result from lung cancer, not a single study, thus far, has specifically pinpointed the origins of the breath volatiles exclusively to lung cancer nodules and not oxidative stress in any other organ resulting from cancer or any other disease.
In the research related to the diagnosis of lung cancer, GC-MS has been widely used, accounting for more than 50% of studies. In the 15 studies included in this present review, GC-MS was used for analyses [  Squamous cell carcinomas tend to be centrally located in the lungs; they are more common in men than in women, and are mostly associated with smoking [122]. Large cell carcinoma is a malignant neoplasm composed of large tumor cells resulting from transformed epithelial cells in the lungs. It can be differentiated from squamous cell carcinomas and adenocarcinomas by light microscopy [123].

Diagnosis of Lung Cancer Based on Specific VOCs
Lung cancer is often diagnosed by chest radiographs or by computed tomography; however, the diagnosis needs to be confirmed by biopsy, which is an invasive, time consuming and expensive diagnosis method with risks. Therefore, many lung cancer studies of breath VOCs have been undertaken in the hope to discover breath biomarkers of the disease and thereby realise a simple non-invasive test. However, despite intense work, to date, no breath test for lung cancer has been forthcoming. A major reason for this is that there has been little consensus between studies, with limited agreement as to which breath volatiles (or pattern of volatiles) can be used to discriminate people with lung cancer from those without. Although many breath volatiles have been proposed to result from lung cancer, not a single study, thus far, has specifically pinpointed the origins of the breath volatiles exclusively to lung cancer nodules and not oxidative stress in any other organ resulting from cancer or any other disease.
Different statistical approaches and machine learning algorithms have been used in order to classify the samples analyzed by GC-MS, coming from patients with lung cancer and from healthy controls [48,64,70,71,74,86]. In an attempt to get closer to a standardization of lung cancer diagnosis, Kischkel et al. applied five different algorithms to process their GC-MS data [48]. Their results concluded that exhaled VOCs are dependent on a multitude of factors, other than the investigated diseases (i.e., patients' medical history, environmental conditions) [48].
GC-MS profiles of potential markers of lung cancer were investigated in four different studies by a Polish group [22,44,62,84]. They carried out qualitative and quantitative measurements by sampling human breath using solid phase SPME and gas chromatographytime-of-flight mass spectrometry (GC-TOF/MS), obtaining possible biomarkers (19 to 32 VOCs) at the level of parts per billion, when more subtypes of lung cancer were investigated (SCLC, NSCLC, adenocarcinoma, planoepitheliale, squamous cell carcinoma). Sons et al. [60] used GC-MS to investigate two types of lung cancer: adenocarcinoma and squamous cell carcinoma, covering all four stages of the disease, and proposed just two key volatile biomarkers that were found at significantly higher concentrations in the breath of the lung cancer patients compared to the controls: 1-butanol and 3-hydroxy-2-butanone (acetoin). For 1-butanol, the obtained AUC was 0.940, with a sensitivity and specificity of 0.953 and 0.854, respectively, while for acetoin, the AUC was 0.964, whereas the sensitivity was 0.930 and specificity 0.927. Moreover, other important conclusions were revealed: higher concentrations of both targets were found in adenocarcinoma than in squamous cell carcinoma, and the concentrations of the VOCs could not be correlated with the stage of disease [60]. Adenocarcinoma and squamous cell carcinoma subtypes were discriminated in a PTR-MS study. The authors claim that breath volatiles from adenocarcinoma and squamous cell carcinoma patients can help in identification of cancer subtypes [34].
Three types of lung cancer (adenocarcinoma, squamous cell carcinoma, and small cell carcinoma) that were histologically proven were analyzed using MCC-IMS, with the obtained VOC profiles were compared with a healthy control group. In addition, adenocarcinoma samples, with and without epidermal growth factor receptor (EGFR) mutation, were also compared. The decision tree algorithm used was able to discriminate the groups of patients based on the 115 detected VOCs. Moreover, n-dodecane was found to be significantly higher in 14 patients with EGFR mutation than in those negative for EGFR (p = 0.01). The applied decision tree algorithm differentiated therefore the positive EGFR samples from those negative with a sensitivity of 85.7% and a specificity of 78.6% [36]. Almost similar results were obtained by Shlomi and co-authors, who discriminated patients with EGFR mutation from other groups investigated with 83% accuracy, while the sensitivity and specificity were 79% and 85%, respectively. For samples analyses, a highly sensitive nanoarray of sensors, containing 40 cross-reactive chemically diverse chemiresistors, was used [77].
The interference of benign pulmonary diseases (BPD) in the selection of VOCs markers for lung cancer has been reported [82]. SPME and TD (thermal desorption) were used together with GC-MS to classify four groups of samples: from patients with lung cancer, from patients with BPD, the group with lung diseases (including lung cancer and BPD) and the group of healthy controls. The main scope was to check if the benign lung tumors led to the generation of VOCs that interfere with those considered to be associated with lung cancer. The authors concluded that the discrimination between the lung cancer group and the control group, and between the BPD group and the control group, is possible with an accuracy of 70-80%. However, no VOCs could discriminate between the lung cancer group and BPD group [82]. A kindred study was developed by Zou et al. [70], where the breath samples coming from 171 volunteers divided into three groups (with lung cancer, with BPD and controls) were analyzed by GC-MS. They suggested that five detected volatiles are associated with lung cancer. They reported that they succeeded in discriminating the three preselected groups, avoiding the interference between lung cancer and pulmonary non-malignant diseases [70]. However, only an AUC higher than 0.80 can state a good predictability of diagnosis [10]; consequently, the authors obtained good diagnosis accuracy just in case of one volatile (AUC = 0.84), but satisfactory to low in case of the other four VOCs reported (AUC = 0.67 to 0.78). Moreover, by applying PCA, a partial discrimination of lung cancer group from control and BPD group was obtained [70]. Good discrimination of lung cancer from benign nodules (with an 87% accuracy) was obtained in another study, by using an electronic nose system consisting of highly sensitive nanoarray sensors [77]. In addition, discrimination feasibility between BPD and lung cancer was proved in other four studies [67,68,71,73]. The authors succeeded in proving a good diagnosis prediction for lung cancer, avoiding the BPD interferences, when FT-ICR-MS was used for analyzing breath samples.
Feinberg et al. used PTR-MS to study volatile fingerprints in the exhaled breath of patients with lung cancer before and after an oral glucose tolerance test, to investigate whether tumor cells hyper-glycolysis can affect the volatile signatures [33]. The authors concluded that oral glucose tolerance test has a minimal effect on the VOC profile of patients group, while the profiles coming from the control group were significantly changed after the induced hyper-glycolysis. It was proposed that this is due to the ceiling effect present in cancerous patients [33].
Malignant Pleural Mesothelioma (MPM, which is predominantly caused by asbestos exposure) was investigated using MCC-IMS. Discrimination of MPM patients from control groups was achieved with an overall accuracy of 76%, a ROC-curve of 0.81, an 87% sensitivity and a 70% specificity [74]. MPM screening using an e-nose was investigated by the same group of researchers, while GC-MS was used in parallel [78]. MLM group was discriminated by control group with a 97% accuracy when GC-MS analyses were processed and with only a 74% accuracy when the data obtained with the e-nose were interrogated. The sensitivity and specificity were at 100 and 91%, respectively, for GC-MS data, and at 82 and 55%, respectively, for e-nose data.

The Origin of VOCs Related to the Investigated Diseases and a Sum Up of the Diagnostic Prediction Using the Markers Reported in the Reviewed Studies
A breath sample is composed of a mixture of N2, O2, CO2 and vapors of H2O, together with a small fraction of VOCs that consists of more than 1000 compounds [24]. In terms of their origin, these VOCs can be endogenous (generated by the organism, as a normal process of metabolism or as a response to diverse pathologies) or exogenous (absorbed by the organism from the environment and then eliminated through exhaled breath), or both. Unfortunately, the metabolic pathways for the production of endogenous biomarkers associated with various diseases are mostly unknown. The metabolic fates for a limited number of exogenous compounds is well known. The challenge in VOC selection from a complex exhaled breath matrix is the correct identification to a given disease, and this needs to be based on an in-depth knowledge of inflammatory processes. Asthma, COPD and lung cancer are conditions characterized by chronic inflammation and oxidative stress that can be diagnosed through endogenous volatiles. Clinical studies have proven the link between the condition and inflammatory or peroxidative activity as a result of reactive oxygen species (ROS) reaction with lipid membranes [124]. Unfortunately, the inflammatory processes have different sources. For example, sputum inflammatory profiles were able to predict both neutrophilic and eosinophilic asthma [47]. Another method that can be used for asthma phenotyping is sputum cell count [125,126]. However, other interactions of leukocytes, epithelial and stromal cells, proved their contribution to inflammatory processes in asthmatic patients [127].
Hydrocarbons are stable end products of lipid peroxidation released in breath in real time (seconds) after formation in tissues [23]. The presence of alkanes (ethane and pentane) in exhaled breath has been shown to be correlated with lipid peroxidation [24]. However, pentane is also a non-specific marker reported in bowel disease [128] and rheumatoid arthritis [129].
Aldehydes are also associated with oxidative stress and inflammatory processes [4]. Hexanal, heptanal and nonanal, which are formed by the peroxidation of ω 3 and ω 6 fatty acids [59], have been reported as markers of asthma, lung cancer and COPD [36,56,75,76,78,79]. Aldehyde concentrations are known to be affected by age (e.g., pentane may indicate higher metabolic demands of young adults) and smoking history [4]. Endogenous compounds occurring in cigarette smoke (such as acetonitrile, furan, 2methylfuran, 3-methylfuran, 2,5-dimethylfuran, benzene and toluene) are detected in smokers' breath samples, but not in the breath of non-or ex-smokers [22,84]. Toluene present in breath samples can result from environmental contamination.
In a pilot study, Gahleitner et al. [65] identified VOC markers of childhood asthma in exhaled breath. Partial least square discriminant analysis was performed and eight

The Origin of VOCs Related to the Investigated Diseases and a Sum Up of the Diagnostic Prediction Using the Markers Reported in the Reviewed Studies
A breath sample is composed of a mixture of N 2 , O 2 , CO 2 and vapors of H 2 O, together with a small fraction of VOCs that consists of more than 1000 compounds [24]. In terms of their origin, these VOCs can be endogenous (generated by the organism, as a normal process of metabolism or as a response to diverse pathologies) or exogenous (absorbed by the organism from the environment and then eliminated through exhaled breath), or both. Unfortunately, the metabolic pathways for the production of endogenous biomarkers associated with various diseases are mostly unknown. The metabolic fates for a limited number of exogenous compounds is well known. The challenge in VOC selection from a complex exhaled breath matrix is the correct identification to a given disease, and this needs to be based on an in-depth knowledge of inflammatory processes. Asthma, COPD and lung cancer are conditions characterized by chronic inflammation and oxidative stress that can be diagnosed through endogenous volatiles. Clinical studies have proven the link between the condition and inflammatory or peroxidative activity as a result of reactive oxygen species (ROS) reaction with lipid membranes [124]. Unfortunately, the inflammatory processes have different sources. For example, sputum inflammatory profiles were able to predict both neutrophilic and eosinophilic asthma [47]. Another method that can be used for asthma phenotyping is sputum cell count [125,126]. However, other interactions of leukocytes, epithelial and stromal cells, proved their contribution to inflammatory processes in asthmatic patients [127].
Hydrocarbons are stable end products of lipid peroxidation released in breath in real time (seconds) after formation in tissues [23]. The presence of alkanes (ethane and pentane) in exhaled breath has been shown to be correlated with lipid peroxidation [24]. However, pentane is also a non-specific marker reported in bowel disease [128] and rheumatoid arthritis [129].
Aldehydes are also associated with oxidative stress and inflammatory processes [4]. Hexanal, heptanal and nonanal, which are formed by the peroxidation of ω 3 and ω 6 fatty acids [59], have been reported as markers of asthma, lung cancer and COPD [36,56,75,76,78,79]. Aldehyde concentrations are known to be affected by age (e.g., pentane may indicate higher metabolic demands of young adults) and smoking history [4]. Endogenous compounds occurring in cigarette smoke (such as acetonitrile, furan, 2-methylfuran, 3-methylfuran, 2,5-dimethylfuran, benzene and toluene) are detected in smokers' breath samples, but not in the breath of non-or ex-smokers [22,84]. Toluene present in breath samples can result from environmental contamination.
In a pilot study, Gahleitner et al. [65] identified VOC markers of childhood asthma in exhaled breath. Partial least square discriminant analysis was performed and eight compounds (1,7-dimethylnapthalene; 1-(methylsulfanyl)propane; 2-octenal; octadecyne; 1isopropyl-3-methylbenzene; ethyl benzene; 1,4-dichlorobenzene and limonene) were found to have the greatest contribution to the discrimination between asthmatic and control group. The authors concluded that only 2-octenal is an endogenous marker, while the other seven compounds may potentially result from environmental exposure, catabolism/metabolism, treatments involved for asthma or they can even have an etiological significance in relation to asthma pathogenesis [65].
The concentrations of methanol, acetone, propanol and pentane were measured in patients with lung cancer [69]. The detected concentrations were higher in patients with stage IV than in those with stage III, and in both cases higher in patients with diabetes, than in non-diabetic persons. It was assumed that these findings occurred because the predictive power of markers is proportional with the tumor size and because the lack of insulin is leading into the accumulation of ketones (especially acetone). Patients with smoking history presented increased concentrations of all four markers when compared to non-smokers [69]. In comparison, Song et al. stated that they could not correlate the detected markers (1-butanol and acetoin) with the stage of lung cancer [60].
Isoprene (2-methyl-1,3-butadiene) is an endogenous controversial marker of diseases. The assumption that isoprene is related to cholesterol metabolism [130], is a possible indicator of obesity [131], or a biomarker of lung cancer [16,22,41] and/or COPD [14,54], has been invalidated by researchers. It has been proposed that the variability in isoprene concentration is more related to increasing and decreasing heart rates (as a result of washout from muscle tissues [132]) since isoprene concentrations have been shown to increase within a few seconds following physical exercise and then to reach the initial level when breath rate stabilizes [131,133]. Moreover, isoprene can correlate with age, while it was proven that people younger than 40 years exhaled significantly less isoprene than older people [48].
Propanal and 1-propanol have both been proposed as markers of lung cancer [16,22,44,59,69]. However, they are used in disinfectants, and hence are found in high concentrations in the hospitals' environment. This is why it has been strongly recommended that they are excluded as biomarkers of lung cancer [48]. Benzaldehyde, reported initially as a marker of COPD [14], was actually found to be a decomposition product [19].
Limonene (4-isopropenyl-1-methylcyclohexene) is a ubiquitous monoterpene found in fruits (especially citrus), drinks, flavor additives, air fresheners, cleaning products, scented candles, toothpastes, and deodorants. Therefore, limonene can have possible origin in indoor pollution. Yet limonene has been reported to be an endogenous biomarker of lung cancer [41,78]. This is almost certainly incorrect, and the higher levels in the breath of patients with lung cancer may indicate a higher consumption of citrus fruits or fresh juice [134]. In case of liver cirrhosis, limonene is a key exogenous biomarker denoting a deficient liver metabolism, accumulated due to the liver incapacity to convert it in carveol metabolites or perillyl metabolites by CYP2C enzymes [135,136].

Limitations, Excluding Criteria and Standardization
The diagnosis of lung disease via breath samples is still not a reality. This is a result of a number of limitations and challenges, including sampling, analysis, confounding factors, correct use of controls, small numbers of volunteers, dietary issues, medications, medical treatments, coexisting conditions, and the lack of reproducibility between studies.
Concerning sampling, it is well documented that subjects are breathing spontaneously with different frequencies, while hypo-or hyper-activity during sampling will produce changes in the composition of the expired breath. Use of mixed expired or end tidal will result in changes in concentrations being measured. Concentrations in the exhaled breath dramatically increase in the end-tidal phase, correlating to the highest concentration of expired carbon dioxide (end-tidal carbon dioxide concentration) [23]. Consequently, the standardization should start from this level. The resting period before sampling and the establishment of which part of breath is going to be sampled need to be decided. In terms of analyses, a combination of GC-MS instrumentation to be used for discovery, followed by fast identification of these targets with rapid techniques such as sensors and e-noses is highly desirable.
The control cohorts used in some studies are often younger compared with the investigated patient groups. For example, in one study, the mean age of the control group was 28 ± 6.08, while the age of the two patient investigated groups, one with COPD with acute exacerbation and the other with COPD only, was 66.9 ± 9.05 and 71.4 ± 7.46, respectively [81]. Fens et al. [38] included in their study a much wider age range, between 18 and 87 years, in an attempt to discriminate between asthma and COPD patients, while Oguma et al. involved 37 volunteers between the ages of 24 and 64 years as controls, and 116 patients with lung cancer between the ages of 36 and 96 years [80]. Comparable age discrepancies were found in another study, where the age difference of the two control groups and the investigated cohort diagnosed with COPD were considerably lower [18]. The mean age of healthy smoker and non-smoker groups were 38.7 ± 14 years, and 42.5 ± 8.4 years, respectively, while the mean age of patients diagnosed with COPD was 56.2 ± 8.5 years. The authors reported as well that the age difference was statistically significant between the two control groups and the group affected by COPD [18]. Conversely, another study included COPD patients with the mean age 58.6 ± 6.9 years, while the mean age of healthy controls was 58.1 ± 8.1 years [20]. Nevertheless, it is still questionable how much the variables such as age, smoking status, Body-Mass-Index, and presence of other diseases can affect the emitted VOCs profiles in an exhaled breath sample. We do believe that a rigorous quantification of emitted volatiles is almost impossible, due to differences in patients, mainly related to gender and age. Adult males with bigger chest volume will definitely exhale more breath compared with females, elders or infants. Whether the concentration of volatile markers of interest is influenced by the total volume, still remains debatable.
The small cohort size involved in many studies is a limitation that needs to be mentioned. Many of the clinical studies included only a few dozen volunteers [13,20,46,57,58,62], and rarely more than two hundred patients [14,22,43,68,84]. In only three cases did the number of included subjects exceeded 400 [73,82,86]. This is understandable, because of the unavailability of suitable patients to donate the necessary samples, but also because of the long duration required to collect a large number of samples. From our personal experience, from a small city with 202,074 inhabitants reported in 2018, we succeeded to collect during one year just 30 tissues samples coming from patients with post-operative bacterial infections and controls [10]. We are confident that other researchers experienced the same issue. For example, Fens et al. [38] mentioned in their published article that although they included 100 patients with an established diagnosis of asthma or COPD, these were recruited over a long period, namely between August 2007 and March 2010. Moreover, the patients come from a limited location. The analyses of these kind of samples may simply provide results that reflect the diagnosis of a subtype/phenotype of a respiratory disease, which cannot accurately be mirrored in the markers liberated by the general population affected by the same condition. For example, Gaida et al. [19] recruited 222 subjects from two different sites in Germany, Hannover and Marburg, in an attempt to investigate VOCs related to COPD. Differences between both room air VOCs and breath VOCs were found when the two sites were compared. Geographical variation in the exhaled VOCs was also found between two sampling sites in China and Latvia [137].
Dietary issues are another important factor connected with VOCs analysis detected from a breath samples. Many studies imposed fasting limits of one hour [40,77,95], two hours [15,38,47,63], three hours [65,81], four hours, [14] and six hours [33]. In some studies, volunteers were fasting overnight, or for 12 hours [21,41,60,70,82]. However, a long fasting period is not easily accepted by volunteers, and is not feasible in a real-life scenario.
The impact of medication applied for respiratory diseases (like inhalative agents, corticosteroids, antibiotics, anesthetics, etc.) together with the effect of concomitant medications (antihypertensive or anti-diabetic therapy), as well as the effect of co-existing disorders on exhaled VOCs still remains unknown.
Owing to a total lack of standardization in this field, different excluding criteria have been applied. For example, Zou et al. [70] excluded all participants younger than 45 years old from a validation cohort, while Phillips et al. [14] excluded all patients with current or previous cancer history, known dementia, heart failure, other known pulmonary, and renal or liver disease when investigating COPD. Rodríguez-Aguilar and colleagues in their COPD study excluded all patients with asthma and all individuals with a history of upper or lower respiratory tract infection during the 4 weeks before their measurements [85]. Van Vliet investigated asthma in children aged between 6 and 17 years old, and applied the following exclusion criteria: technically unsatisfactory performance of lung function measurements; other pulmonary diseases; cardiac abnormalities; mental retardation; congenital abnormalities or existence of a syndrome; active smoking; children that had immunotherapy during the study [76].
Excluding certain categories of volunteers is not a suitable solution all of the time. Furthermore, patients often do not honestly declare if they are active or ex-smokers. In addition, their medical histories are often confidential. Applying excluding criteria will decrease the cohort in a biased way. However, not applying such criteria may result in too much interference that makes it difficult to follow the pattern of markers occurrence, which, in turn, will finally affect the diagnostic accuracy. Perhaps the best decision is to not exclude a key population, but just to subtract some well-known volatiles associated to certain habits (e.g., smoking).
The chemical composition of a breath sample is also dependent on the lung area from where it is sampled. Alveolar breath is generally expected to have the highest concentration of VOCs, because it originates from the deepest part of the lungs, and is, therefore, the closest to the alveolar capillaries, but that depends on the solubility of the volatile, which is related to the compound's Henry coefficient. Clearly, the gas exchange process is dependent on the alveolar membrane thickness and in the case of respiratory disease by the ability of patients to take a deep inspiration and provide a profound expiration. The lack of standardized methods for sampling, analysis and data processing, as well as the effects of environmental contaminants, has resulted in the large number of disparate studies.
An important issue to address is where in the breathing cycle a breath sample should be taken from patients suffering from COPD or asthma, because these illnesses are more related to the upper airways, and not the alveolar region. Whilst it is true that breath from the lower airways are less important for these diseases, use of the end-tidal region limits dilution and contamination of a breath sample from the mouth, and anatomic or tubing dead-space. Furthermore, there would no temporal resolution in the breath sample that could be used to differentiate upper from lower airways breath samples. Therefore, it is always best to collect a breath sample during the end-tidal exhalation phase.

Current Status of VOCs Based Diagnosis
A snapshot of cancer studies included in the current review, including quantification or identification, is presented in Figure 5, as a network analyses obtained by using R studio with console version 3.6.3. It is worth mentioning that in 21 studies related to lung cancer, 83 biomarkers have been reported. From this number, just 31 of them are common between at least two studies. Moreover, the best concordance was obtained for 2-butanone, which was common between six studies, followed by different isomers of xylene detected in five studies. Nonanal, 2-pentanone, 3-hydroxy-2-butanone and hexanal were common markers in four studies.  Table 3; darker diamonds represent the common VOCs; pale diamonds depict the uncommon VOCs.
The case of the other two lung diseases is even more deficient in comparison to lung cancer. No common compound was found for asthma, for which four studies only reported biomarker identification. Six studies reported biomarker identification for COPD. Just one compound, hexanal, was common between three studies, while five VOCs were common only in two studies. The distribution of VOCs between the three diseases we have reviewed, as well as between different studies investigating the same conditions are presented in Figure 6. As shown in Figure 6A, the compounds that are common between lung cancer studies and COPD studies are generally common for all three diseases. This fact denotes that they are not specific markers for a given lung condition, but rather simply indicative of a respiratory disease. Consequently, it is obvious that exhaled VOCs may depend also on a variety of parameters, other than the disease under investigation. This is why a standardized approach, including simultaneously sampling, analysis, data processing, normalization and correcting parameters, is needed to lead to the discovery of well-founded biomarkers that can provide clinically relevant information from breath analysis. Janssens et al. [138] have reviewed VOCs detected from urine, tissue, blood and cell lines of lung cancer patients and discovered some similar markers with those reported in the present review.
Efforts for development of a new standardized sampling device are being made by the company Owlstone Medical (Cambridge, UK). Their ReCIVA (Respiration Collector for In Vitro Analysis) provides a dedicated clean air supply, CASPER (Clean Air Supply Pump for ReCIVA). Thermo-desorption tubes containing Tenax/Carbograph-5TD adsorbents are used to collect the breath samples. The ReCIVA device allows for specific fractions of exhaled breath to be collected in TD tubes through continuous monitoring of pressure and CO2 levels within the mask and for the removal of background contaminants  Table 3; darker diamonds represent the common VOCs; pale diamonds depict the uncommon VOCs.
The case of the other two lung diseases is even more deficient in comparison to lung cancer. No common compound was found for asthma, for which four studies only reported biomarker identification. Six studies reported biomarker identification for COPD. Just one compound, hexanal, was common between three studies, while five VOCs were common only in two studies. The distribution of VOCs between the three diseases we have reviewed, as well as between different studies investigating the same conditions are presented in Figure 6. As shown in Figure 6A, the compounds that are common between lung cancer studies and COPD studies are generally common for all three diseases. This fact denotes that they are not specific markers for a given lung condition, but rather simply indicative of a respiratory disease. Consequently, it is obvious that exhaled VOCs may depend also on a variety of parameters, other than the disease under investigation. This is why a standardized approach, including simultaneously sampling, analysis, data processing, normalization and correcting parameters, is needed to lead to the discovery of well-founded biomarkers that can provide clinically relevant information from breath analysis. Janssens et al. [138] have reviewed VOCs detected from urine, tissue, blood and cell lines of lung cancer patients and discovered some similar markers with those reported in the present review. [139]. Using Tenax as an absorption material in the sampling process has advantages (such as stability and low desorption temperature) but there are some drawbacks. For example, benzaldehyde is a decomposition product that appears in the chromatograms when Tenax tubes are used. In addition, nonanal and decanal, which have been proposed as markers related to COPD [79], asthma [76] and lung cancer [56,59,78], are difficult to evaluate correctly when Tenax ® TA is used as adsorption material [19]. Figure 6. Network analyses presenting the distribution of VOCs between the three reviewed conditions (part (A)) and highlighting volatiles common dispensation in case of lung cancer (part (B)), COPD (part (C)) and asthma (part (C)). The circles noted with S (in part (B-D)) represent number of the study, which are allocated similarly in Table 3, and the diamonds represent the components.

Overall Proposed Solutions
• Breath sampling needs highly standardized conditions to include certain breath fraction, well-defined excluding criteria, given conditions for preparation of volunteers for sample collection, and the volume and duration of sampling; • In the absence of a "perfect" breath reference material, routine breath control measurements should be performed at certain time spans; • Operating of instruments according to well-defined protocols and standardized criteria; • Monitoring of background air that can impact the performance of the methods; • Calibration of instruments (especially sensors) with standardized samples that mimic breath is highly desired; • Data processing workflow should be also standardized including for examples: peaks alignment, normalization, and statistical analyses. Figure 6. Network analyses presenting the distribution of VOCs between the three reviewed conditions (part (A)) and highlighting volatiles common dispensation in case of lung cancer (part (B)), COPD (part (C)) and asthma (part (C)). The circles noted with S (in part (B-D)) represent number of the study, which are allocated similarly in Table 3, and the diamonds represent the components.
Efforts for development of a new standardized sampling device are being made by the company Owlstone Medical (Cambridge, UK). Their ReCIVA (Respiration Collector for In Vitro Analysis) provides a dedicated clean air supply, CASPER (Clean Air Supply Pump for ReCIVA). Thermo-desorption tubes containing Tenax/Carbograph-5TD adsorbents are used to collect the breath samples. The ReCIVA device allows for specific fractions of exhaled breath to be collected in TD tubes through continuous monitoring of pressure and CO 2 levels within the mask and for the removal of background contaminants [139]. Using Tenax as an absorption material in the sampling process has advantages (such as stability and low desorption temperature) but there are some drawbacks. For example, benzaldehyde is a decomposition product that appears in the chromatograms when Tenax tubes are used. In addition, nonanal and decanal, which have been proposed as markers related to COPD [79], asthma [76] and lung cancer [56,59,78], are difficult to evaluate correctly when Tenax ® TA is used as adsorption material [19].

Overall Proposed Solutions
• Breath sampling needs highly standardized conditions to include certain breath fraction, well-defined excluding criteria, given conditions for preparation of volunteers for sample collection, and the volume and duration of sampling; • In the absence of a "perfect" breath reference material, routine breath control measurements should be performed at certain time spans; • Operating of instruments according to well-defined protocols and standardized criteria; • Monitoring of background air that can impact the performance of the methods; • Calibration of instruments (especially sensors) with standardized samples that mimic breath is highly desired; • Data processing workflow should be also standardized including for examples: peaks alignment, normalization, and statistical analyses. • Utilization of standardized methods for data processing (statistical tools, thresholds used for extraction parameters); • Creation of databases of markers obtained using standardized methods that can be accessed and completed by researchers.

Concluding Remarks and Future Perspectives
The current available tools for the diagnosis of pulmonary diseases based on exhaled VOCs are promising, but are far from being of clinical use. Promising findings have been reported, and we have emphasized in this review that both discrimination between the three lung diseases reviewed and diagnosis prediction are relevant. However, multiple constraints-including sampling, analysis, validation and standardization-need to be solved before analysis of specific VOCs can be widely applied into clinical practice. As a short-term future perspective, we predict that analytical instrumentation will be used in small point of care studies to confirm or deny the possibility of certain respiratory conditions. Based on this first diagnosis the subjects may then be sent for a more complex and confirmatory diagnosis. As for long-term future perspectives, we consider that online instrumentation, especially portable instrumentation, IMS, GC-IMS, sensors and e-noses, are convenient devices for physicians to be used in the diagnosis and monitoring of respiratory diseases, as well as for use in monitoring therapy.