Microbiota Biomarkers for Lung Cancer

Non-small cell lung cancer (NSCLC) is the number one cancer killer and its early detection can reduce mortality. Accumulating evidences suggest an etiopathogenic role of microorganisms in lung tumorigenesis. Certain bacteria are found to be associated with NSCLC. Herein we evaluated the potential use of microbiome as biomarkers for the early detection of NSCLC. We used droplet digital PCR to analyze 25 NSCLC-associated bacterial genera in 31 lung tumor and the paired noncancerous lung tissues and sputum of 17 NSCLC patients and ten cancer-free smokers. Of the bacterial genera, four had altered abundances in lung tumor tissues, while five were aberrantly abundant in sputum of NSCLC patients compared with their normal counterparts (all p < 0.05). Acidovorax and Veillonella were further developed as a panel of sputum biomarkers that could diagnose lung squamous cell carcinoma (SCC) with 80% sensitivity and 89% specificity. The use of Capnocytophaga as a sputum biomarker identified lung adenocarcinoma (AC) with 72% sensitivity and 85% specificity. The use of Acidovorax as a sputum biomarker had 63% sensitivity and 96% specificity for distinguishing between SCC and AC, the two major types of NSCLC. The sputum biomarkers were further validated for the diagnostic values in a different cohort of 69 NSCLC cases and 79 cancer-free controls. Sputum microbiome might provide noninvasive biomarkers for the early detection and classification of NSCLC.


Introduction
Lung cancer is the leading cause of cancer-related deaths in men and women [1]. Over 85% lung cancers are non-small cell lung cancer (NSCLC), which mainly consists of squamous cell carcinoma (SCC) and adenocarcinoma (AC). The early detection of NSCLC by low-dose CT (LDCT) followed by the effective treatments can reduce mortality [1]. However, many positive LDCT scans are false alarms and result in multiple examinations and invasive biopsies that carry their own morbidity [1]. The development of noninvasive biomarkers that can accurately diagnose early stage lung cancer remains clinically imperative.
Microbiota is the group of a wide-ranging array of microorganisms, including bacteria, archaea, fungi, and viruses that inhabit various body sites [2]. The microbiome, defined as the collection of microbiota and their genes, plays an important role in health and disease [2]. Microbial agents could cause approximately 20% of the overall cancer burden [2]. For example, infection of Human Papilloma virus, Epstein-Barr virus, Helicobacter pylori (H. pylori), Escherichia coli and Fusobacterium nucleatum lead to a variety of malignancies [2]. In the respiratory tract, there are more than 500 different species of bacteria [3]. Changes of the airway microbiome are attributed to lung tumorigenesis through different mechanisms, such as damage of the local immune barrier, production of bacterial toxins that alter host genome stability, and release of cancer-promoting microbial metabolites [4]. Furthermore, intratumoral microbes may directly affect the growth and metastatic spread of tumor

Study Population
The study protocol was approved by the Institutional Review Board of the University of Maryland Medical Center (IRB HP-00040666). From a tissue bank, we obtained 31 frozen lung tumor and the matched noncancerous lung tissues of stage I NSCLC patients who had either a lobectomy or a pneumonectomy. Tumor tissues were intraoperatively dissected from the surrounding lung parenchyma. Paired normal lung tissues were obtained from the same patients at an area distant from their tumors. Of the 31 cases, 16 cases were diagnosed with SCC and 15 were AC of the lungs. We collected sputum samples from participants between the ages of 55-79 at the point of their referral for suspected lung cancer. A total of 27 subjects including 17 lung cancer patients and ten cancer-free smokers were recruited. The 17 lung cancer patients were diagnosed with NSCLC consisting of five stage I cases, five stage II cases, and seven stage III-IV cases. The NSCLC cases consisted of ten AC and seven SCC of lungs ( Table 1). The ten cancer-free patients were smokers who had either granulomatous inflammation (n = 5), nonspecific inflammatory changes (n = 3) or pulmonary infections (n = 2).
Sputum samples of 69 lung cancer patients and 79 cancer-free smokers were obtained from Dr. Ruth L Katz's laboratory of The University of Texas M.D. Anderson Cancer Center. As shown in Table 2, the 69 NSCLC patients consisted of 22 stage I cases, 24 stage II cases, and 23 stage III-IV cases. Thirty-six cases were AC and 33 were SCC of the lungs. The 79 cancer-free patients who were smokers and had either granulomatous inflammation (n = 39), nonspecific inflammatory changes (n = 22) or lung infections (n = 18).

Collection and Preparation of Sputum
Sputum was collected from the participants before they received any treatment as described in our previous studies [29][30][31][32][33][34][35][36][37][38][43][44][45]. To reduce the percentage of oral epithelial cells in sputum, the participants were asked to blow their nose, rinse their mouth, and swallow water to minimize contamination of squamous cells from postnasal drip and saliva. Sputum samples were then coughed into a sterile container and processed within 2 h. To further minimize oral squamous cell contamination, opaque or dense portions that looked different from saliva under the inverted microscope were selected using blunt forceps from expectorate. The samples were processed on ice in four volumes of 0.1% dithiothreitol (Sigma-Aldrich, St. Louis, Mo) followed by four volumes of phosphate-buffered saline (Sigma-Aldrich). We centrifuged the samples at 1500× g for 15 min and removed the supernatant. The remaining cell pellets were collected and stored at −80 • C until use.

Genomic DNA Isolation
We used QIAGEN-DNeasy Blood & Tissue Kit (QIAGEN, Germantown, MD, USA) to isolate DNA from the cell pellets or tissue specimens according to manufacturer's instructions [23,46,47]. We determined the purity by taking the optical density (OD) of the sample at 280 nm for protein concentration and at 260 nm for DNA concentration. The ratio OD260 /OD280 was calculated and DNA sample within the range of 1.6-2 was considered as pure.

Detection and Quantification of Bacterial Abundances Using Droplet Digital PCR (Ddpcr)
We preformed ddPCR to detect DNA of 25 bacterial genera (Table 3) by using a QX100 Droplet Digital PCR System and 2× ddPCR Supermix (Bio-Rad, California, CA, USA) with a protocol developed in our previous studies with modification [23,30,40,46,[48][49][50][51][52]. The 25 bacterial genera were suggested to be associated with lung cancer by previous studies (references in Table 3) and thus tested in this study. To design genus-specific primers of PCR test for determining their bacterial abundances, we first aligned 16S rRNA sequences for the maximum number of species for the specific genus to identify consensus regions at genus level. We then use the Primer3 primer design program to design specific primers as previous described [53,54]. Sequences of PCR primers to amplify DNA of the bacterial genera are shown in Table 3. To generate the droplets, we inserted 20 µL of PCR reaction and 70 µL of Droplet Generation oil for Probes (Bio-Rad) in an eight-well cartridge using a QX100 droplet generator (Bio-Rad). We then transferred 40 µL of the generated droplet emulsion in a 96-well PCR plate (Eppendorf, Hamburg, Germany). Amplification reaction was conducted in a T100™ thermal cycler (Bio-Rad) with the following conditions: initial denaturation at 95 • C for 5 min followed by 35 cycles of 15 s at 95.0 • C, 30 s at 55.3 • C, 5 min at 4 • C, and, finally, 5 min at 90 • C for signal stabilization. After thermal cycling, we transferred plates to a droplet reader (Bio-Rad). We used the software provided with the ddPCR system for data acquisition to calculate the concentration of target DNA in copies/µL from the fraction of positive reactions using Poisson distribution analyses.

Statistical Analysis
We used statistical system software version 6.12 (SAS Institute, Cary, NC) and Graph-Pad Prism version 7 (GraphPad Software, La Jolla, CA) for data analysis. The results were graphed and plotted by GraphPad Prism version 7. Mann-Whitney U test was used to determine whether bacterial abundances were significantly different between lung cancer patients and healthy controls. Furthermore, Pearson's correlation coefficient test was used to determine the associations of bacterial abundances with clinicopathologic and demographic characteristics of the participants. Spearman correlation test was carried out to analyze the correlation between abundances of bacterial genera. Logistic regression was used to generate prediction models. To evaluate diagnostic significance of potential biomarkers, we used receiver-operator characteristic (ROC) curve analysis and computed the area under ROC (AUC) value by numerical integration of the ROC curve.
As shown in Figure 1, Acidovorax was overrepresented in SCC tissues compared with noncancerous lung tissues and AC tissues (p = 0.0051). Capnocytophaga DNA was enriched in AC tissues compared with noncancerous lung tissues and SCC tissues (p = 0.0049) ( Figure 1). However, the abundances of Haemophilus and Fusobacterium were lower in AC tissues compared with noncancerous lung tissues and SCC tissues (p = 0.049 and 0.039), respectively ( Figure 1).
Diagnostics 2021, 11, x FOR PEER REVIEW 6 of 15 As shown in Figure 1, Acidovorax was overrepresented in SCC tissues compared with noncancerous lung tissues and AC tissues (p = 0.0051). Capnocytophaga DNA was enriched in AC tissues compared with noncancerous lung tissues and SCC tissues (p = 0.0049) ( Figure 1). However, the abundances of Haemophilus and Fusobacterium were lower in AC tissues compared with noncancerous lung tissues and SCC tissues (p = 0.049 and 0.039), respectively ( Figure 1).

Bacterial genera displayed different abundances in sputum of lung cancer patients vs. cancer-free smokers
All the 25 bacteria produced more than 10,000 droplets in each reaction, and thus were also readily detected in the sputum specimens by ddPCR. Of the bacteria, Acidovorax,

Bacterial Genera Displayed Different Abundances in Sputum of Lung Cancer Patients vs. Cancer-Free Smokers
All the 25 bacteria produced more than 10,000 droplets in each reaction, and thus were also readily detected in the sputum specimens by ddPCR. Of the bacteria, Acidovorax, Streptococus, and Veillonella were overrepresented in sputum of lung SCC patients compared with lung AC patients and cancer-free smokers (pall p < 0.05) (Figure 2). The abundance of Helicobacter was underrepresented in sputum of lung SCC patients compared with lung AC patients and cancer-free smokers (p= 0.018 ( Figure 2). Capnocytophaga was enriched in sputum of lung AC patients compared with lung SCC patients and cancer-free smokers (p = 0.046) (Figure 2). Furthermore, both Acidovorax and Capnocytophaga displayed significantly different abundances in sputum of lung AC vs. SCC patients (all p < 0.05). In addition, abundances of the five sputum bacteria were not associated with the age, gender, ethnic group, tumor stage, and smoking status of the patients (all p > 0.05), except histology and location of primary lung tumors (all p > 0.05).
Comparison of abundances of bacteria in tumor tissues of lung cancer patients and sputum of lung cancer patients and cancer-free smokers.
The change of Acidovorax abundance had a similar trend in SCC tissues as in sputum of lung SCC patients ( Figure 3) (Spearman correlation test, p = 0.023). Furthermore, Capnocytophaga had a similar trend in AC tissues as in sputum of lung AC patients. (Spearman correlation test, p = 0.017). The altered abundances of the two bacterial genera (Acidovorax and Capnocytophaga) in sputum might directly reflect those in lung tumor tissues. However, the reduced abundances of Haemophilus and Fusobacterium were only observed in lung AC tissue specimens compared with their normal counterparts ( Figure 3A). The increased abundances of the Streptococcus and Veillonella were solely discovered in sputum of lung SCC patients and decreased abundances of Helicobacter were found in sputum of lung AC patients, as compared with their normal counterparts ( Figure 3B). Furthermore, both Acidovorax and Capnocytophaga displayed significantly different abundances in sputum of lung AC vs. SCC patients (all p < 0.05). In addition, abundances of the five sputum bacteria were not associated with the age, gender, ethnic group, tumor stage, and smoking status of the patients (all p > 0.05), except histology and location of primary lung tumors (all p > 0.05).
Comparison of abundances of bacteria in tumor tissues of lung cancer patients and sputum of lung cancer patients and cancer-free smokers.
The change of Acidovorax abundance had a similar trend in SCC tissues as in sputum of lung SCC patients ( Figure 3) (Spearman correlation test, p = 0.023). Furthermore, Capnocytophaga had a similar trend in AC tissues as in sputum of lung AC patients. (Spearman correlation test, p = 0.017). The altered abundances of the two bacterial genera (Acidovorax and Capnocytophaga) in sputum might directly reflect those in lung tumor tissues. However, the reduced abundances of Haemophilus and Fusobacterium were only observed in lung AC tissue specimens compared with their normal counterparts ( Figure 3A). The increased abundances of the Streptococcus and Veillonella were solely discovered in sputum of lung SCC patients and decreased abundances of Helicobacter were found in sputum of lung AC patients, as compared with their normal counterparts ( Figure 3B).

Development of sputum bacteria biomarkers for NSCLC
Sputum is noninvasively obtained body fluid. It contains bronchial epithelial cells from the lungs and lower respiratory tract and, thus, has the advantages as surrogate material for specifically diagnosing lung cancer. We evaluated if the five bacteria, which were readily detected in sputum and associated with lung cancer, could be used as noninvasive biomarkers for NSCLC. In the cohort 1 of sputum specimens, the five bacteria exhibited AUC values of 0.56-0.88 in distinguishing NSCLC patients from controls (Table 4). We used a stepwise logistic regression analysis to select the optimal panels of biomarkers. Two bacteria consisting of Acidovorax and Veillonella were selected as the best biomarkers for lung SCC. The two bacterial biomarkers used in combination produced 0.91 AUC ( Figure 4A) in diagnosis of lung SCC with 80.00% sensitivity and 89.26% specificity ( Table 4). The estimated correlations among levels of the two bacteria were very low (Spearman correlation test, p = 0.53), implying that the integration of the two biomarkers

Development of Sputum Bacteria Biomarkers for NSCLC
Sputum is noninvasively obtained body fluid. It contains bronchial epithelial cells from the lungs and lower respiratory tract and, thus, has the advantages as surrogate material for specifically diagnosing lung cancer. We evaluated if the five bacteria, which were readily detected in sputum and associated with lung cancer, could be used as noninvasive biomarkers for NSCLC. In the cohort 1 of sputum specimens, the five bacteria exhibited AUC values of 0.56-0.88 in distinguishing NSCLC patients from controls (Table 4). We used a stepwise logistic regression analysis to select the optimal panels of biomarkers. Two bacteria consisting of Acidovorax and Veillonella were selected as the best biomarkers for lung SCC. The two bacterial biomarkers used in combination produced 0.91 AUC ( Figure 4A) in diagnosis of lung SCC with 80.00% sensitivity and 89.26% specificity ( Table 4). The estimated correlations among levels of the two bacteria were very low (Spearman correlation test, p = 0.53), implying that the integration of the two biomarkers has complementary classification. Furthermore, the use of Capnocytophaga as a sputum biomarker could detect lung AC with 0.85 AUC (Figure 4B), 72.73% sensitivity and 85.19% specificity (Table 4).
Diagnostics 2021, 11, x FOR PEER REVIEW 9 of 15 has complementary classification. Furthermore, the use of Capnocytophaga as a sputum biomarker could detect lung AC with 0.85 AUC ( Figure 4B), 72.73% sensitivity and 85.19% specificity (Table 4). In addition, the use of Acidovorax as a sputum biomarker had 0.86 AUC ( Figure 4C) with 63.64% sensitivity and 96.30% specificity for distinguishing between SCC and AC of the lungs ( Table 5). The bacterial biomarkers had no association with age, gender, and smoking status of the participants, and stages of lung tumors (Pearson's correlation coefficient test, all p > 0.05), except location and histology of primary lung tumors (Supplementary Table S1).

Validating the bacterial biomarkers in an independent set of lung cancer patients and controls
The sputum bacterial biomarkers developed from the cohort 1 were tested using the same procedures to diagnose lung cancer in cohort 2 consisting of 69 NSCLC patients and 79 controls. Consistent with findings in the cohort 1, abundances of Acidovorax, Streptococus, and Veillonella were higher in sputum of lung SCC patients compared with lung AC patients and cancer-free smokers (all p < 0.05). The abundance of Helicobacter was lower in sputum of lung SCC patients compared with lung AC patients and cancer-free smokers (p = 0.018). Capnocytophaga was overrepresented in sputum of lung AC patients compared with lung SCC patients and cancer-free smokers (p = 0.046).
Furthermore, the bacterial biomarkers displayed similar diagnostic values in the cohort 2 as did in the cohort 1 ( Figure 5). Particularly, Acidovorax and Veillonella used in combination could diagnose lung SCC with 0.89 AUC, producing 75.76% sensitivity and 88.61% specificity (Table 5). In addition, sputum Capnocytophaga biomarker could detect In addition, the use of Acidovorax as a sputum biomarker had 0.86 AUC ( Figure 4C) with 63.64% sensitivity and 96.30% specificity for distinguishing between SCC and AC of the lungs ( Table 5). The bacterial biomarkers had no association with age, gender, and smoking status of the participants, and stages of lung tumors (Pearson's correlation coefficient test, all p > 0.05), except location and histology of primary lung tumors (Supplementary Table S1).

Validating the Bacterial Biomarkers in an Independent Set of Lung Cancer Patients and Controls
The sputum bacterial biomarkers developed from the cohort 1 were tested using the same procedures to diagnose lung cancer in cohort 2 consisting of 69 NSCLC patients and 79 controls. Consistent with findings in the cohort 1, abundances of Acidovorax, Streptococus, and Veillonella were higher in sputum of lung SCC patients compared with lung AC patients and cancer-free smokers (all p < 0.05). The abundance of Helicobacter was lower in sputum of lung SCC patients compared with lung AC patients and cancer-free smokers (p = 0.018). Capnocytophaga was overrepresented in sputum of lung AC patients compared with lung SCC patients and cancer-free smokers (p = 0.046).
Furthermore, the bacterial biomarkers displayed similar diagnostic values in the cohort 2 as did in the cohort 1 ( Figure 5). Particularly, Acidovorax and Veillonella used in combination could diagnose lung SCC with 0.89 AUC, producing 75.76% sensitivity and 88.61% specificity (Table 5). In addition, sputum Capnocytophaga biomarker could detect lung AC with 0.83 AUC, yielding 69.44% sensitivity and 84.42% specificity (Table 5). Moreover, the use of Acidovorax as a sputum biomarker had 0.83 AUC with 66.67% sensitivity and 89.86% specificity for distinguishing between SCC and AC of the lungs. There was no association of sputum bacterial genera with age, gender, and smoking status of the participants, and stages of lung tumors (all p > 0.05), except location and histology of primary lung tumors (Supplementary Table S2). There was no statistical difference of sensitivity and specificity of combined use of Acidovorax and Veillonella for diagnosis of SCC and using Capnocytophaga for detection of AC (all p > 0.05) in the cohort 1 and cohort 2. There was also no statistical difference of sensitivity and specificity of using Capnocytophaga for detection of AC (all p > 0.05) in the cohort 1 and cohort 2. However, the use of sputum Acidovorax had a lower specificity in cohort 2 compared with cohort 1 for distinguishing between SCC and AC of the lungs (89% vs. 96%, p = 0.02), while maintaining a similar sensitivity (63% vs. 66%) ( Table 5).  (Table 5). Moreover, the use of Acidovorax as a sputum biomarker had 0.83 AUC with 66.67% sensitivity and 89.86% specificity for distinguishing between SCC and AC of the lungs. There was no association of sputum bacterial genera with age, gender, and smoking status of the participants, and stages of lung tumors (all p > 0.05), except location and histology of primary lung tumors (Supplementary Table S2). There was no statistical difference of sensitivity and specificity of combined use of Acidovorax and Veillonella for diagnosis of SCC and using Capnocytophaga for detection of AC (all p > 0.05) in the cohort 1 and cohort 2.
There was also no statistical difference of sensitivity and specificity of using Capnocytophaga for detection of AC (all p > 0.05) in the cohort 1 and cohort 2. However, the use of sputum Acidovorax had a lower specificity in cohort 2 compared with cohort 1 for distinguishing between SCC and AC of the lungs (89% vs. 96%, p = 0.02), while maintaining a similar sensitivity (63% vs. 66%) ( Table 5).

Discussion
Our present study confirms that certain microbes, at genus level, are differentially abundant in lung tumor vs. normal lung tissues. Furthermore, we demonstrate the abundances of genera could be quantitatively measured in sputum by using ddPCR and the altered abundances of some sputum bacteria are associated with lung cancer. We further develop Acidovorax and Veillonella as a sputum biomarker panel for lung SCC, regardless of the stages. In addition, a single sputum bacterial biomarker, Capnocytophaga, could be used for detection of lung AC. Moreover, the use of Acidovorax as a sputum biomarker could distinguish between SCC and AC, the two major histological types of NSCLC. Therefore, the sputum microbiota might have the potential use as noninvasive biomarkers for diagnosis and classification of lung cancer at the early stage.
Previous studies have shown that diverse airway microbial profiles exist at different airway sites of lung cancer patients [42,68,69]. However, the comparison of bacterial profiles in primary tumor tissues and sputum of lung cancer patients has not been performed. Our findings in comparison of bacterial abundances in tumor tissues and sputum of lung cancer patients suggest that the altered bacterial genera could be classified into three categories. (1) Lung tumor microbes, which comprise Capnocytophaga and Haemophile. Aberrant abundances of the bacterial genera were exclusively found within lung tumors. The intratumoral microbes of lung cancer might be directly involved in the development and progressions of NSCLC, however, the imbalance in their abundance is not detectable in sputum. (2) Sputum microbes of lung cancer patients, such as Streptococcus, Veillonella, and H. pylori, whose aberrations were solely observed in sputum of NSCLC patients. Changes of these microbes in sputum might not simply mirror those in primary lung tumors. The discovery is in line with the previous observation [70]. The analysis of bladder tumors and the paired urine samples showed that aberrations of certain bacteria existed in urine rather

Discussion
Our present study confirms that certain microbes, at genus level, are differentially abundant in lung tumor vs. normal lung tissues. Furthermore, we demonstrate the abundances of genera could be quantitatively measured in sputum by using ddPCR and the altered abundances of some sputum bacteria are associated with lung cancer. We further develop Acidovorax and Veillonella as a sputum biomarker panel for lung SCC, regardless of the stages. In addition, a single sputum bacterial biomarker, Capnocytophaga, could be used for detection of lung AC. Moreover, the use of Acidovorax as a sputum biomarker could distinguish between SCC and AC, the two major histological types of NSCLC. Therefore, the sputum microbiota might have the potential use as noninvasive biomarkers for diagnosis and classification of lung cancer at the early stage.
Previous studies have shown that diverse airway microbial profiles exist at different airway sites of lung cancer patients [42,68,69]. However, the comparison of bacterial profiles in primary tumor tissues and sputum of lung cancer patients has not been performed. Our findings in comparison of bacterial abundances in tumor tissues and sputum of lung cancer patients suggest that the altered bacterial genera could be classified into three categories. (1) Lung tumor microbes, which comprise Capnocytophaga and Haemophile. Aberrant abundances of the bacterial genera were exclusively found within lung tumors. The intratumoral microbes of lung cancer might be directly involved in the development and progressions of NSCLC, however, the imbalance in their abundance is not detectable in sputum. (2) Sputum microbes of lung cancer patients, such as Streptococcus, Veillonella, and H. pylori, whose aberrations were solely observed in sputum of NSCLC patients. Changes of these microbes in sputum might not simply mirror those in primary lung tumors. The discovery is in line with the previous observation [70]. The analysis of bladder tumors and the paired urine samples showed that aberrations of certain bacteria existed in urine rather than the tumor tissues, however, they had diagnostic significance for malignancy [70]. This category of microbiota might indirectly prompt tumor susceptibility and development via altering respiratory bacterial environment and modulating inflammation, inducing DNA damage, and producing metabolites involved in oncogenesis or tumor suppression [2]. (3) Bacterial genera whose changes in sputum were consistent with those in tumor tissues in the same direction, including Acidovorax and Capnocytophaga. The aberrant bacterial abundances in sputum could directly reflect those in primary lung tumors. We have also found that altered abundances of the bacterial genera in sputum are histologically dependent. Particularly, the abundances of sputum Acidovorax, Streptococcus, H. pylori, and Veillonella in sputum are related with lung SCC, whereas increased Capnocytophaga abundance in sputum is related to lung AC. Nevertheless, an extensive and deep investigation of the microbiota is needed to have a better understanding of the pathogenesis of NSCLC and provide new diagnostic and therapeutic targets for the disease.
Overall, the potential sputum bacterial biomarkers have a higher sensitivity for lung SCC compared with lung AC (80.00% vs. 72.73% p = 0.032). The findings are in good agreement with our previous studies [6,23,27,30,32,34,40,46,50,52]. We have shown that sputum-based molecular biomarkers have a higher sensitivity in identifying central SCC compared with peripheral AC of the lungs. The possible reason might be that sputum is secreted from large airways or main bronchi where SCC more commonly exists. Conversely, lung AC tumors often arise in peripheral lung tissue and originate from the smaller airways of the lungs. Future integration of the sputum-based assay with LDCT could overcome the weakness of the imaging analysis by improving accuracy for the early detection of lung SCC.
Among the bacterial genera analyzed, Acidovorax was found by Greathouse et al. to have an elevated abundance in lung SCC tissues with TP53 mutation [8]. Furthermore, there was a significant increase in lung tumor volume in mice inoculated with Acidovorax temperans. Acidovorax temperans could contribute to lung carcinogenesis in the presence of activated Kras and mutant p53 and, thus, act as a promoter in the development and progression of the disease [8]. Our current study supports this early report [8], and more importantly, suggests that Acidovorax might provide a sputum biomarker for lung SCC. Capnocytophaga species were proposed to be involved in lung carcinogenesis and lower respiratory tract infections [15]. Furthermore, Capnocytophaga might induce long-term immune response/infection to the organ or cancer growth environment, which favors the growth of these bacteria in the airways [68]. Our study also suggests that Capnocytophaga abundance is significantly higher in NSCLC vs. normal lung tissues. Tsay et al. found an increased abundance of Streptococcus and Veillonella in the lower respiratory tract of NSCLC patients, which was associated with upregulation of the ERK and PI3K signaling pathways [12]. It has been well accepted that H. pylori is a risk factor for gastric and several other cancers [2]. Our present study demonstrates a close association of H. pylori with lung SCC. However, rigorous investigations regarding the H. pylori-lung cancer association remain to be performed.
Smoking causes most lung cancers, but lung cancer can be found in never smokers [1]. Interestingly, the abundances of the genera tested in this present study were not associated with the smoking status of the patients. The result suggests that the microbiota aberrations might play an important role in lung tumorigenesis of nonsmokers. The observation is in line with previous studies [16,17], in which lung cancer patients who were never smokers had a long history of bacterial respiratory tract infection. Dysregulation of the genera could be involved in the development and progression of NSCLC via a specific manner that is beyond tobacco-smoking-related carcinogenesis.
The sputum biomarkers were further tested in an independent (validation) cohort of cases and controls. The diagnostic significance of the bacterial biomarkers for diagnosis of SCC and AC of the lungs was confirmed. However, although the use of sputum Acidovorax for distinguishing between SCC and AC had a similar sensitivity, its specificity was reduced in the validation cohort. Possible explanation for the difference might be that the sputum specimens of the validation cohort were collected five years ago, whereas sputum samples in cohort 1 were fresh and collected within six months. DNA quality might significantly decline with long storage duration, leading to a lower specificity of the sputum biomarker for distinguishing between SCC and AC of the lungs.
This study may suffer some limitations. (1), the sample size is small. We will prospectively validate the sputum biomarkers in a large cohort. (2), we only assessed 25 bacterial genera whose changes were previously suggested to be associated with lung cancer. Although the results show promise, the sensitivity and specificity of the biomarkers are not enough in routine laboratory settings. We will evaluate more lung tumor-bacteria to identify additional bacterial biomarkers that can be added to the current ones so that the diagnostic efficacy of the sputum tests could be improved.

Conclusions
We show that aberrant microbial composition, at genus level, is present in lung tumor and sputum of lung cancer patients. We have for the first time developed sputum bacterial biomarkers that could be potentially used for the early detection and classification of lung cancer, though a larger sample study is needed to validate the findings.
Supplementary Materials: The following are available online at https://www.mdpi.com/2075-4 418/11/3/407/s1, Table S1. The association of abundances of bacterial genera in sputum of cohort 1 with the age, gender, ethnic group, tumor stageand location, and smoking status of the patients determined by Pearson's correlation coefficient test. A p-value < 0.05 is statistically significant; Table S2. The association of abundances of bacterial genera in sputum of cohort 2 with the age, gender, ethnic group, tumorstageand location, and smoking status of the patients determined by Pearson's correlation coefficient test. A p-value < 0.05 is statistically significant.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Maryland Baltimore.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon a reasonable request.