Sputum Microbiome Composition in Patients with Squamous Cell Lung Carcinoma

Background: Recent findings indicate that the host microbiome can have a significant impact on the development of lung cancer by inducing an inflammatory response, causing dysbiosis, and generating genome damage. The aim of this study was to search for bacterial communities specifically associated with squamous cell carcinoma (LUSC). Methods: In this study, the taxonomic composition of the sputum microbiome of 40 men with untreated LUSC was compared with that of 40 healthy controls. Next-Generation sequencing of bacterial 16S rRNA genes was used to determine the taxonomic composition of the respiratory microbiome. Results: There were no differences in alpha diversity between the LUSC and control groups. Meanwhile, differences in the structure of bacterial communities (β diversity) among patients and controls differed significantly in sputum samples (pseudo-F = 1.53; p = 0.005). Genera of Streptococcus, Bacillus, Gemella, and Haemophilus were found to be significantly enriched in patients with LUSC compared to the control subjects, while 19 bacterial genera were significantly reduced, indicating a decrease in beta diversity in the microbiome of patients with LUSC. Conclusions: Among other candidates, Streptococcus (Streptococcus agalactiae) emerges as the most likely LUSC biomarker, but more research is needed to confirm this assumption.


Introduction
Interactions between the host and the commensal microbiota are complex and insufficiently understood. In cancer, diverse microbial ecosystems have been documented to induce metabolic changes in the tumor microenvironment, promote dysbiosis, directly induce oncogenic transformation, or modulate the immunotherapy response [1][2][3]. Comprehensive metagenomics approaches enable precise mapping of the tumor-associated microbiome and unveiling mechanisms of bacterial influence on cancer occurrence and progression [4]. In addition, recent efforts identified microbial signatures characteristic of certain cancer types, which may serve as tumor diagnostic biomarkers [5].
Lung cancer (LC) arises in the lung parenchyma or bronchi, and is annually diagnosed in approximately 1.2 million people worldwide with >1 million associated deaths during this period [6]. Although all forms of LC originate from epithelial cells of the airway mucosa, the current classification of LC includes several different histological types of Life 2022, 12, 1365 2 of 14 this disease [7]. LC is usually divided into small cell lung cancer and non-small cell lung cancer (NSLC), which accounts for 85% of all bronchogenic tumors [8]. NSLC is further subdivided into large cell lung cancer, adenocarcinoma of the lung (AD), and lung squamous cell carcinoma (LUSC). LUSC accounts for about 30% of all NSLC cases. It is associated with a poor prognosis, and no targeted therapy is available so far [9].
The mortality rate from LUSC remains high, partly due to the lack of early detection of diagnostic biomarkers, including metagenomic ones. However, the search for bacteria associated with the risk of LC development has intensified tremendously in recent years, especially due to the wide application of the newest DNA sequencing technologies [10,11].
Previous studies have shown that changes in the number of specific microbiota taxa in bronchoalveolar lavage fluid, lung tissue, and saliva samples may be associated with LC, but results from these studies are largely inconsistent [12][13][14][15][16][17][18][19][20]. Another source of information on the composition of the respiratory tract microbiota is sputum, which has been poorly studied in patients with LC in general and particularly in those with LUSC [21][22][23][24][25]. Even though sputum is not reflecting the microbiome of any particular part of the respiratory tract, it can still be very useful as a metagenomic biomarker, since its collection is easy and non-invasive.
Different histological types of LC are characterized by different biological patterns, molecular markers, and treatment strategies [26]; however, very few studies have so far examined the relationship between the respiratory tract microbiome and individual histological types of LC.
In this report, we for the first time compare the taxonomic composition of the sputum microbiome in LUSC patients and healthy control donors, all residents of the Kuzbass region of Western Siberia.

Cohort Information
The composition of the sputum bacterial microbiome was studied in 40 patients with newly-diagnosed LUSC (male only, average age 59.9 ± 6.9 years) who were admitted to the Kemerovo Regional Oncology Center (Kemerovo, Russian Federation) and 40 healthy male donors, residents of Kemerovo (average age 54.0 ± 5.3 years). This material was collected from the period March 2018-August 2020. Active smokers were 75% and 55% of LUSC patients and control subjects, respectively. Smoking pack-years were not different between groups. For LUSC patients, the disease stage was determined in accordance with the TNM classification [27]: 18 patients (45%) were stage I-II, and 22 patients (55%) were stage III-IV. A questionnaire was filled out for each participant, containing information on place and date of the birth, living environment, occupation, exposure to occupational hazards, health status, dietary habits, and intake of medications (use of antibiotics at least four weeks prior to sampling), X-ray procedures, smoking and drinking status. For patients with LUSC, the results of clinical and histological analyses were additionally taken into account.
Inclusion criteria were adult males ≥40 years of age, willingness to participate in the study, donate sputum, and sign written informed consent. Exclusion criteria were any acute or chronic condition that would limit the ability of the patient to participate in the study, use of antibiotics within 4 weeks prior to collection, failure to obtain a sputum sample, or refusal to give informed consent.
All procedures undertaken were in accordance with the ethical standards of the Helsinki Declaration (1964 and amended in 2013) of the World Medical Association. All participants (patients and control subjects) were informed about the aim, methodology, and possible risks of the study; informed consent was signed by each donor.

Sample Collection, Process, and Storage
To analyze the composition of the microbiome of the respiratory tract, sputum samples obtained from LC patients and control subjects were used. The sputum from patients was obtained prior to all diagnostic or therapeutic procedures. Sputum was collected on the first day of hospitalization. Before sputum collection, patients were asked to rinse their mouths. Sputum samples were collected non-invasively through participant-induced coughing (i.e., without induction) and represent the oropharyngeal secretion. Giemsastained cytological slide microscopy was used to test random sputum samples. The presence of columnar airway epithelial cells was confirmed. Samples were immediately placed in sterile plastic vials and frozen (−20 • C). Frozen samples were transported to the laboratory and stored at −80 • C. Libraries were again purified using Agencourt AMPure XP beads (Beckman Coulter, Bray, Houston, TX, USA) according to the Illumina 16 S metagenomic sequencing library protocol. Sample PCR products were then pooled in equimolar amounts, purified using AMPure XP Beads (Beckman Coulter, Bray, Houston, TX, USA), and then quantified using a fluorometer (Quantus Fluorometer dsDNA (Promega, Madison, WI, USA). Molarity was then brought to 4 nM, the libraries were denatured, and then diluted to a final concentration of 8 pM with a 10% PhiX spike buffer for sequencing on the Illumina MiSeq platform [28].

Taxonomy Quantification Using 16S rRNA Gene Sequences and Statistical Methods
The processing of the resulting sequence data was conducted using the QIIME2 software [29]. A quality check was carried out and a sequence library was generated. The sequences were combined into operational taxonomic units (OTUs) based on a 99% nucleotide similarity threshold using the Greengenes reference sequences library (versions  and SILVA (version 132), followed by the removal of singletons (OTUs containing only one sequence). The total diversity of prokaryotic communities (alpha diversity) of sputum was estimated by the number of allocated OTU (analogue of species richness) and Shannon indices (H = Σpi ln pi, pi-part of i-sh species in a community). When calculating sample Life 2022, 12, 1365 4 of 14 diversity indices, 1045 sequences were normalized (the minimum number of received sequences per sample). The variation in the structure of the bacterial community of different samples (beta diversity) was analyzed using Bray-Curtis dissimilarity metrics [30]-a method common in microbial ecology that estimates the difference between communities based on the abundance relationships of the taxa present in the samples.
In addition, to assess the significance of differences in the relative percentage of individual bacterial taxa in sputum, the Mann-Whitney U test was used. Spearman's correlation coefficient was used to calculate correlations. Calculations were performed using the software package STATISTICA.10, Statsoft, Tulsa, OK, USA. The False Discovery Rate (FDR) correction was used to assess the significance of differences in the relative percentages of individual bacterial taxa taking into account multiple comparisons. Multiple linear regression (MLR) was performed to predict the relationship between the relative abundance of individual bacteria in LUSC patients' sputum and lifestyle/disease factors.

Results
Here we profiled the composition of the sputum bacterial microbiome across 40 patients with LUSC and 40 healthy donors, all residents of Kemerovo. We have used a large-scale approach to sequence the 16S rRNA V3-V4 region of the bacterial genomes purified from the sputum samples from the compared groups in the study. A summary of the demographic information regarding LUSC and control subjects is shown in Table 1. There were differences in mean age between patients and control (p < 0.05). Both LUSC and healthy control groups were sex-matched, and had no differences in living environment, alcohol consumption, and smoking pack years.   For the LUSC group, the average number of analyzed sequences was 76,776 (range: 9694−181,146). For the healthy control group, the average number of analyzed sequences was 72,613 (range: 12,537−160,232). We identified a total of 11 bacterial phyla with relative frequencies above 0.1%. The prevailing phyla in our dataset were Firmicutes, Bacteroidetes, Actinobacteria, and Proteobacteria (Figure 1), consistent with results from previous studies [21,22,31,32]. For the LUSC group, the average number of analyzed sequences was 76,776 (range: 9694−181,146). For the healthy control group, the average number of analyzed sequences was 72,613 (range: 12,537−160,232). We identified a total of 11 bacterial phyla with relative frequencies above 0.1%. The prevailing phyla in our dataset were Firmicutes, Bacteroidetes, Actinobacteria, and Proteobacteria (Figure 1), consistent with results from previous studies [21,22,31,32]. Regarding alpha diversity, neither the number of allocated OTUs nor the Shannon indices showed significant differences between LUSC and control groups. Overall, the bacterial communities were fairly diverse in the two groups as indicated by the Shannon index at the genus level (5.267 in LUSC vs. 5.439 in control groups). This suggests that Differences in the structure of bacterial communities in sputum samples of lung cancer patients and healthy subjects are shown in Figure 2. The first two principle components explained 14.47 and 7.182% of the total variation. Compositional similarity within the phylum-level taxa was displayed among individual samples using the bar plot (Figure 1). The PERMANOVA (Adonis) test using the difference matrix, constructed by the Bray-Curtis method, showed a significant difference in the prokaryotic communities in sputum from healthy subjects and patients with LUSC (pseudo-F = 1.53; p = 0.005). Regarding alpha diversity, neither the number of allocated OTUs nor the Shannon indices showed significant differences between LUSC and control groups. Overall, the bacterial communities were fairly diverse in the two groups as indicated by the Shannon index at the genus level (5.267 in LUSC vs. 5.439 in control groups). This suggests that any changes in the sputum microbiome of the LUSC are not large-scale shifts in the bacterial community.
Differences in the structure of bacterial communities in sputum samples of lung cancer patients and healthy subjects are shown in Figure 2. The first two principle components explained 14.47 and 7.182% of the total variation. Compositional similarity within the phylum-level taxa was displayed among individual samples using the bar plot ( Figure 1). The PERMANOVA (Adonis) test using the difference matrix, constructed by the Bray-Curtis method, showed a significant difference in the prokaryotic communities in sputum from healthy subjects and patients with LUSC (pseudo-F = 1.53; p = 0.005). We then compared frequencies of major bacteria phyla in our sputum specimens. Samples from LUSC patients revealed a significant increase in the representatives of Firmicutes phylum as compared to control subjects (56.77 ± 15.29 vs. 47.34 ± 10.65 %, respectively; p = 0.004); in contrast, the other four major bacterial phyla (Bacteroidetes, We then compared frequencies of major bacteria phyla in our sputum specimens. Samples from LUSC patients revealed a significant increase in the representatives of Firmicutes phylum as compared to control subjects (56.77 ± 15.29 vs. 47.34 ± 10.65 %, respectively; p = 0.004); in contrast, the other four major bacterial phyla (Bacteroidetes, Fusobacteria, TM7, and Spirochaetes), were overrepresented in the sputum of healthy subjects in comparison with that from LUSC patients (Figure 3).
Analysis of the composition of microbial communities in LUSC and control sputum enabled us to annotate the core microbiome of our sputum samples, which consisted of 67 genera and 32 species. Bacterial genera and species significantly different between groups are listed in Tables 2 and 3 (23 and 17, respectively, after FDR correction). We observed a considerable variation in the percentages of all genera and species presented.    Representatives of the other 17 species were significantly more common in the microbiome of healthy controls in comparison to the patients with LUSC, as shown in Table 3.
We found no specific association of any bacterial taxon in the sputum with the age of patients or control donors participating in the study.
The influence of smoking status on the microbiota composition in patients with LUSC and control subjects was also investigated. For LUSC patients, no significant difference was found in the bacterial genera or species in sputum between smokers and nonsmokers. Controls differing in smoking status revealed a significant difference in the occurrence of several genera and species in the sputum. Control group smokers (Figure 4)  We found no specific association of any bacterial taxon in the sputum with the age of patients or control donors participating in the study.
The influence of smoking status on the microbiota composition in patients with LUSC and control subjects was also investigated. For LUSC patients, no significant difference was found in the bacterial genera or species in sputum between smokers and nonsmokers. Controls differing in smoking status revealed a significant difference in the occurrence of several genera and species in the sputum. Control group smokers ( Figure  4) had less Neisseria than non-smokers (0.56 ± 1.16% vs. 3.94 ± 5.63%; p = 0.00006); Fusobacterium (1.4 ± 1.55% vs. 3.39 ± 3.01%; p = 0.02); Prevotella nigrescens (0.35 ± 1.38% vs. 0.52 ± 0.68%; p = 0.01) and Peptostreptococcus Anaerobius (0.04 ± 0.1% vs. 0.39 ± 0.71%; p = 0.02). At the same time, control group smokers had more Streptobacillus in their sputum compared to nonsmokers (3.62 ± 2.8% vs. 1.92 ± 2.28%; p = 0.03). Comparison of the total composition of the microbiome in patients with different stages of LUSC (I-II and III-IV), as well as between subgroups with different localization of the primary tumor, revealed no differences.
Сonditional logistic regression models adjusted for age, smoking status, alcohol consumption status, living environment, occupational exposure, family cancer history, chronic diseases (heart and vessels, bronchitis, COPD, stomach, diabetes and obesity) and the phyla (Streptococcus, Bacillus, Gemella and Haemophilus) were constructed. In these models, heart and vessels diseases (p = 0.0001), bronchitis (p = 0.008), COPD (p = 0.003), and presence of Streptococcus (p = 0.009) were strongly associated with LUSC as compared to healthy subjects. Comparison of the total composition of the microbiome in patients with different stages of LUSC (I-II and III-IV), as well as between subgroups with different localization of the primary tumor, revealed no differences.
Conditional logistic regression models adjusted for age, smoking status, alcohol consumption status, living environment, occupational exposure, family cancer history, chronic diseases (heart and vessels, bronchitis, COPD, stomach, diabetes and obesity) and the phyla (Streptococcus, Bacillus, Gemella and Haemophilus) were constructed. In these models, heart and vessels diseases (p = 0.0001), bronchitis (p = 0.008), COPD (p = 0.003), and presence of Streptococcus (p = 0.009) were strongly associated with LUSC as compared to healthy subjects.

Discussion
The respiratory tract microbiome is closely linked to the onset of lung diseases, including LC. It has been previously shown that there are changes in the microecology of the lungs in patients with lung cancer compared to healthy subjects. In addition, the abundance of certain bacterial species correlates with pathology, suggesting their potential use as microbial markers for the detection of lung cancer. However, until now, the composition of the lung microbiome in patients with different histological types of lung cancer has not been determined.
In this study, we examined the difference between the microbiome of sputum samples from patients with LUSC and healthy controls. In general, the sputum microbiota in men with lung cancer had a significant decrease in beta diversity, which is consistent with the results of previous studies [13,21,33,34]. At the level of bacterial phyla, the most notable finding in our patients with LUSC was an abundance of Firmicutes to the detriment of Proteobacteria. The dominance of Proteobacteria in healthy lung microbiota was also detected by others [18,35]. In a pairwise comparison, representatives of four bacterial phyla (Bacteroidetes, Fusobacteria, TM7, and Spirochaetes) and the 19 genera shown in Table 2 were significantly enriched in healthy control samples as compared to LUSC patients. On the other hand, we found that Streptococcus, belonging to the Firmicutes phylum, demonstrated the highest abundance in LUSC patients in comparison with controls. Two other genera (Bacillus and Gemella) from the Firmicutes type, and a representative of Proteobacteria-Haemophilus, were also overrepresented in the sputum of LUSC patients compared to controls. We believe that all four genera may be considered potential bacterial biomarkers of LUSC.
An increased prevalence of Streptococcus in the sputum of patients with lung cancer has previously been reported in several publications [21,22,26]; however, a high abundance of the Bacillus, Gemella, and Haemophilus genera were not previously reported. Indeed, a recent study using ddPCR found a significant increase in Streptococcus load in the sputum of seven patients with LUSC compared with ten control patients [36]. Interestingly, at the same time, a significant increase in Veillonella was found in the sputum of the same patients in comparison to control participants. In our study, however, representatives of this bacterial genus, were more evidently enriched in controls than in LUSC patients (Table 2). Finally, the amount of Haemophilus in the sputum of patients and controls was almost equal [31], while in our cohort of patients this facultative anaerobe was significantly more common in LUSC patients as compared to healthy donors. Another recent study of the respiratory microbiome (saliva and bronchial biopsy specimens) in 25 patients with central lung cancer from Spain [37] found a significant increase in Streptococcus, Rothia, Gemella, and Lactobacillus, which partially agrees with our results (for Streptococcus and Gemella). Thus, it appears that Streptococcus is a major bacterial marker in the airways associated with lung cancer, although it could depend on different histopathological types and stages of this disease. For example, it was reported that Streptococcus and Neisseria were predominant in the sputum of patients with lung adenocarcinoma, while Streptococcus and then Veillonella dominated the microbiome of LUSC patients, while Neisseria and, to a lesser extent, Streptococcus, were the most frequently found genera in the sputum of small cell lung cancer patients [21].
A comparison of sputum microbiome composition in subgroups of LUSC patients with different TNM stages, central or peripheral tumor localization, and smoking status, revealed no significant differences in bacterial content. However, in the group of healthy donors, we observed a prominent decrease in Neisseria in the sputum of smokers compared to non-smokers (Figure 4), which is consistent with previously published results [38]. The effect of smoking on the sputum microbiota remains unclear, according to the latest published data [39], and requires further study. As shown in Table 3, Streptococcus agalactiae was the only bacterial species that significantly increased in patient sputum, according to sequencing data and analysis of two databases (Greengenes and SILVA). It should be noted that Streptococcal species are difficult to identify using 16S rRNA gene sequencing alone, and requires further validation using ddPCR. Previous studies have reported the prevalence of Streptococcus viridans in the sputum of patients with lung cancer [17]. In our study, Streptococcus agalactiae was the most frequently found bacteria in the sputum of both LUSC patients and controls, and its significant increase in LC patients suggests its utility as a possible biomarker, similar to Streptococcus gallolyticus subsp. in colorectal carcinoma [40]. Streptococcus agalactiae (also known as GBS) is an important opportunistic species that can cause pneumonia, sepsis, and meningitis in newborns and in immunocompromised subjects [41,42]. Cases of invasive GBS infections are frequently reported in the elderly and immunocompromised adults, including those with diabetes mellitus, alcoholism, and cancer [43]. In the respiratory tract, GBS occasionally contributes to community-acquired pneumonia and empyema in adults [44]. When GBS causes a pulmonary infection, it is usually defined as part of polymicrobial pneumonia [45]. GBS bacteria effectively attach to pulmonary epithelial cells and are capable of invasion. This is initiated by their attachment to extracellular matrix components such as agglutinin, fibronectin, fibrinogen, and laminin, which facilitates their attachment to host cell surface proteins, such as integrins. Thus, the invasive potential of GBS is influenced by changes in the surface proteome of host cells, which can be caused by various lung pathologies [46]. The molecular mechanisms of cytopathology caused by GBS bacteria in patients are currently being intensively studied. It was shown that GBS induces the generation of reactive oxygen species and loss of mitochondrial membrane potential [47]. In human endothelial cells, reactive oxygen species are generated via the NADPH oxidase pathway, which is accompanied by cytoskeletal reorganization through the PI3K/Akt pathway, and is generally associated with pathogen penetration, providing evidence for the involvement of oxidative stress in the pathogenesis associated with S. agalactiae [48].
Several limitations of this study should be noted. First, our study with 80 samples may not be powerful enough. Our results require confirmation in independent large-scale studies to further understand the role of the sputum microbiota in the development of lung cancer. Second, only men were included in the present study, so women with LUSC should be studied further. Finally, at this stage of the study, we cannot unequivocally identify the specific Streptococcal species whose presence in patients' sputum is elevated compared with controls. Further analysis using ddPCR will eliminate this limitation.

Conclusions
In this report, we used mass parallel sequencing of bacterial 16S ribosomal genes to compare the taxonomic composition of the sputum microbiome of patients with LUSC and healthy donors. It was found that the bacterial taxonomic groups detected in the microbiome of patients were significantly different compared to controls. The sputum of patients with LUSC contains significantly more members of the genera Streptococcus, Bacillus, Gemella, and Haemophilus. Streptococcus (Streptococcus agalactiae) is the most likely LUSC biomarker from this list, but more research is still required to validate this assumption.
In order to consider these bacteria as biomarkers for the risk of LUSC development, it is necessary to have information about their population dynamics in the respiratory microbiome from health to lung malignancy. This can be solved, for example, by forming a database of the respiratory microbiome in healthy individuals over a long period of time. Another possible and more accessible approach is to study the composition of the microbiome in the sputum of patients with chronic inflammatory diseases of the lungs. A recent study showed increased numbers of Streptococci in airway microbiome samples from patients with idiopathic pulmonary fibrosis and COPD. It is important, in this regard, that our logistic regression models showed a significant relationship between an increase in abundance of Streptococcus and chronic inflammatory lung diseases, such as bronchitis