1. Introduction
Chronic obstructive pulmonary disease (COPD) is a progressive and largely irreversible chronic airway disease that affects approximately 400 million individuals worldwide and currently ranks as the third leading cause of global mortality [
1,
2]. Despite its high prevalence and disease burden, early-stage COPD is frequently underdiagnosed because its initial clinical manifestations are often subtle and nonspecific [
3]. Spirometry remains the diagnostic gold standard; however, its routine implementation in clinical practice is limited by technical complexity and accessibility constraints. As a result, many patients are diagnosed at advanced stages, when therapeutic options are less effective, leading to impaired quality of life and unfavorable long-term outcomes [
4,
5]. These challenges underscore the urgent need to identify novel biomarkers and therapeutic targets that enable earlier detection and more precise disease management.
The gut microbiome constitutes the largest and most extensively studied microbial ecosystem in the human body [
6,
7]. Increasing evidence supports a bidirectional interaction between the gut microbiota and pulmonary health, a relationship conceptualized as the “gut–lung axis”. This framework describes dynamic communication between the gastrointestinal tract and the lungs mediated through shared mucosal immune networks, systemic circulation, and microbiota-derived metabolites. Within this axis, alterations in gut microbial composition and function can influence pulmonary immunity and inflammation, while lung pathology can, in turn, disrupt intestinal microbial homeostasis [
8,
9].
Substantial progress has been made in elucidating the role of gut microbiota and microbial metabolites in COPD. Using 16S rRNA sequencing, Bowerman et al. reported a significant reduction in gut microbial diversity in COPD patients, accompanied by alterations in metabolite profiles, including decreased levels of anti-inflammatory short-chain fatty acids (SCFAs) [
10]. In parallel, reductions in beneficial bacterial genera such as
Bifidobacterium and
Prevotella, together with increased abundance of potentially pathogenic taxa including
Streptococcus and
Escherichia-Shigella, have been observed, suggesting a link between gut dysbiosis and systemic inflammation in COPD [
11]. Additional studies demonstrated that patients experiencing acute exacerbations of COPD exhibit further reductions in gut microbial alpha diversity and SCFA abundance compared with those in the stable phase [
12,
13]. Moreover, Mendelian randomization analysis conducted by Cao et al. identified a positive causal association between the microbial metabolite phenylacetylglutamine and COPD, highlighting its potential relevance as a therapeutic target [
14]. Collectively, these findings indicate that gut microbiota and their metabolites play an important regulatory role in COPD pathogenesis and disease progression.
However, the majority of prior investigations relied on 16S rRNA sequencing, which provides limited taxonomic resolution and does not allow comprehensive assessment of microbial functional capacity or metabolite-related pathways. Although a metagenomic-based study attempted to address these limitations, it was constrained by a small sample size drawn from an Australian cohort [
10], thereby limiting the representativeness and generalizability of its conclusions. Therefore, integrated analyses incorporating metagenomics and metabolomics, particularly in Chinese populations, remain scarce and warrant further exploration.
In this context, the present study employed shotgun metagenomic sequencing to systematically characterize the compositional and functional features of the gut microbiota in patients with stable COPD from northern China. By comparing individuals with normal pulmonary function to those with COPD, the study aimed to elucidate the regulatory role of gut microbiota in COPD pathogenesis. In parallel, untargeted metabolomics was integrated to identify metabolic alterations and potential biomarkers associated with disease development and progression. Together, these analyses aim to provide a scientific foundation for improved clinical prediction, early diagnosis, and targeted intervention strategies for COPD.
3. Discussion
In this study, integrated metagenomic and metabolomic approaches were applied to comprehensively characterize gut microbiota composition, functional potential, and metabolic profiles in 74 patients with stable COPD and 30 healthy controls from northern China. By moving beyond 16S rRNA sequencing, which is limited in taxonomic and functional resolution, this work addresses key gaps in prior studies that were constrained by small sample sizes, insufficient functional analyses, and a lack of data from Chinese populations. The results demonstrate pronounced alterations in gut microbiota structure, function, and host-associated metabolic profiles in COPD, supporting an important role for gut microbial dysregulation in disease pathophysiology. At the compositional level, LEfSe analysis identified 23 taxa with significantly different abundances between COPD patients and healthy controls (LDA ≥ 3), indicating distinct microbial signatures associated with COPD. Functional profiling further revealed enrichment of genes involved in signal transduction, altered expression of specific glycoside hydrolases, and increased abundance of antibiotic resistance genes, including baeS. At the metabolic level, 497 differential fecal metabolites and 1260 differential serum metabolites were identified. By combining metagenomic sequencing with serum untargeted metabolomics and fecal untargeted metabolomics, we obtained unique insights that could not be derived from any single omics dataset alone. Serum riboflavin levels were significantly reduced in COPD patients and were positively correlated with pulmonary function indices, including FEV1 and FVC (p < 0.05), as well as with the key differential functional gene diaminohydroxyphosphoribosylaminopyrimidine deaminase/5-amino-6-(5-phosphoribosylamino)uracil reductase (K11752) (p < 0.05). Among all candidate biomarkers, serum eremopetasinorol exhibited the highest diagnostic performance (AUC = 0.947). Collectively, these findings provide evidence supporting a regulatory role of the gut–lung axis in COPD among individuals from northern China and offer new perspectives for precision diagnosis and targeted intervention.
The gut–lung axis has emerged as a conceptual framework for elucidating the complex and bidirectional interactions between the gastrointestinal tract and pulmonary system [
15]. Clinically, patients with COPD frequently exhibit comorbid chronic gastrointestinal disorders, including inflammatory bowel disease and irritable bowel syndrome [
16]. These conditions are closely linked to gut microbial dysbiosis, increased intestinal permeability, and inflammatory cell infiltration [
17]. Previous studies have also demonstrated that intestinal epithelial damage and compromised barrier function are present in patients with COPD, contributing to gastrointestinal functional disturbances under chronic inflammatory conditions [
18]. Conversely, alterations in gut microbiota composition and microbe-derived metabolites may actively participate in COPD pathogenesis.
With respect to microbial diversity, this study observed no significant difference in gut microbiota alpha diversity between patients with COPD and healthy controls, a finding consistent with the report by Kou et al. [
19]. In contrast, beta diversity analysis demonstrated a significant separation in overall gut microbiota composition between the COPD and control groups, indicating that COPD is associated with distinct alterations in microbial community structure. This observation aligns with the majority of existing studies reporting gut microbiota dysbiosis in patients with COPD [
20,
21] and suggests that structural remodeling of the gut microbiota may represent an important component of COPD-associated pathophysiology. Further compositional analysis revealed specific microbial shifts in COPD patients. Compared with healthy controls, COPD patients exhibited a reduced relative abundance of
Bacteroidetes and an increased abundance of
Firmicutes [
22,
23,
24], consistent with previous findings and in agreement with the results of the present study. In addition, the genus
g_Subdoligranulum was significantly enriched in COPD patients. Li et al. reported that patients with acute exacerbations of COPD are particularly susceptible to environmental exposures and identified a positive correlation between air pollutants and microbial function, specifically a positive correlation between particulate matter (PM10) levels and
Subdoligranulum abundance [
25]. These observations suggest that enrichment of
Subdoligranulum may be associated with disease severity or environmental sensitivity in COPD, supporting the relevance of this taxon in the present findings. Notably, at the genus level,
Weissella emerged as an important discriminator between COPD patients and healthy individuals.
Weissella has been reported to possess probiotic-related properties [
26], including the production of antimicrobial substances, competitive exclusion of pathogenic bacteria, and modulation of host immune responses [
27,
28]. However, its specific role in the development and progression of COPD remains unclear and warrants further investigation.
At the functional level, this study identified significant upregulation of genes involved in signal transduction mechanisms in COPD patients, along with abnormal expression of glycoside hydrolases, including GH116 and GH130. In parallel, antibiotic resistance genes such as
baeS and
smeS were enriched, accompanied by increased resistance to pleuromutilin and sulfonamide antibiotics. Glycoside hydrolases, including GH116, participate in the degradation of complex carbohydrates, and their dysregulated expression may disrupt intestinal energy metabolism. Given that patients with COPD frequently exhibit increased energy expenditure and nutritional imbalance, such functional alterations may further influence metabolic homeostasis through gut–lung axis–related pathways. The enrichment of antibiotic resistance genes within the gut microbiota of COPD patients is also noteworthy and may reflect long-term or repeated exposure to antibiotic therapies. Previous studies have demonstrated that the annual frequency of antibiotic use is significantly higher in individuals with COPD than in healthy populations [
29]. Antibiotic exposure can perturb gut microbial balance and promote the accumulation of resistance genes, potentially exacerbating immune dysregulation and inflammatory responses and contributing to a self-perpetuating cycle [
30]. These findings underscore the importance of optimizing antibiotic treatment strategies in COPD, including consideration of resistance gene profiles to guide personalized therapy, with the aim of limiting resistance gene dissemination and reducing therapeutic complexity.
In addition, metabolomic analyses revealed pronounced disturbances in both fecal and serum metabolic profiles in patients with COPD. In fecal samples, 497 differential metabolites were identified, involving multiple metabolic pathways, including amino acid and lipid metabolism. In serum, 1260 metabolites exhibited significant alterations, encompassing key compounds such as riboflavin, calanolide A, and methyl jasmonate. MSEA demonstrated that these differential metabolites were predominantly enriched in biologically relevant pathways, notably riboflavin metabolism, cysteine and methionine metabolism, and histidine metabolism. Among these pathways, the association between RF and COPD has attracted increasing attention. Riboflavin, also known as vitamin B2, is a water-soluble vitamin with high thermal stability [
31] and is indispensable for cellular metabolic processes [
32]. Patients with COPD typically exhibit impaired antioxidant capacity, resulting in excessive production of reactive oxygen species (ROS), increased oxidative stress, and subsequent exacerbation of pulmonary inflammation and tissue injury. Riboflavin possesses intrinsic antioxidant properties and contributes to ROS neutralization through its role as a precursor of flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD), which serve as essential coenzymes for numerous enzymes involved in oxidative metabolism, including glutathione peroxidase (GPx). Toyasaki et al. demonstrated that glutathione reductase requires RF to catalyze the reduction of oxidized glutathione (GSSG) to reduced glutathione (GSH), a critical endogenous antioxidant that inactivates ROS. Through GPx-mediated reactions, glutathione is converted from GSH to GSSG during the detoxification of lipid peroxides, yielding alcohols and mitigating oxidative damage [
33]. Accordingly, RF may alleviate oxidative stress by inhibiting lipid peroxidation and attenuating reperfusion-related oxidative injury. Given that oxidative stress is a central pathogenic mechanism in COPD, riboflavin-mediated maintenance of the glutathione redox cycle represents a critical antioxidant defense pathway in lung tissue [
34,
35]. Beyond its antioxidant role, riboflavin has also been implicated in the modulation of inflammatory responses. A recent study demonstrated that riboflavin attenuates
NLRP3,
NLRC4,
AIM2, and non-canonical inflammasome activation by inhibiting mitochondrial ROS production and mitochondrial DNA release, thereby blocking caspase-1 activation and subsequent maturation of IL-1β and IL-18. Given the well-established involvement of the IL-1β/IL-18 axis in COPD airway inflammation and emphysema, this mechanism provides a direct link between riboflavin metabolism and COPD pathogenesis [
36]. Studies by Verdrengh et al. reported that riboflavin supplementation reduced inflammatory cell infiltration, decreased neutrophil accumulation in lung tissue, attenuated inflammatory responses, and improved respiratory function in COPD patients [
37]. Moreover, clinically, patients with COPD have been reported to exhibit lower riboflavin status compared with individuals without COPD, as measured by the erythrocyte glutathione reductase activation coefficient [
38]. In vivo experimental studies have further validated the protective effects of riboflavin against pulmonary injury. In three distinct rat models of oxidant-mediated acute lung injury, riboflavin administration significantly reduced vascular permeability (by 31–56%), alveolar hemorrhage (by 51–76%), and lipid peroxidation products (by 45% in lung tissue) [
39]. Collectively, these findings suggest that riboflavin may exert multifaceted protective effects in the pathogenesis and progression of COPD.
Humans are unable to synthesize riboflavin endogenously and therefore depend on dietary intake, intestinal absorption, and microbial biosynthesis within the gut. Notably, approximately 40% of riboflavin can be produced by the gut microbiota. A systematic genomic analysis of B-vitamin biosynthesis pathways demonstrated that complete riboflavin operons are present in the genomes of all
Bacteroidetes and
Fusobacteria, as well as in 36 genomes (accounting for 92%) of
Proteobacteria, with de novo riboflavin synthesis pathways identified in nearly all members of these phyla. In contrast, within
Actinobacteria, only two publicly available genomes—
Corynebacterium ammoniagenes DSM 20306 and
Bifidobacterium longum ATCC 15697—possess the genetic capacity for riboflavin biosynthesis [
40]. George et al. further demonstrated that
Escherichia coli harbors a complete riboflavin biosynthesis operon (rib operon) and can efficiently synthesize riboflavin under aerobic conditions [
41]. However, in patients with COPD, this biosynthetic capacity may be markedly reduced as a result of intestinal congestion and associated hypoxic conditions. Together, these observations indicate that gut microbiota represent a critical exogenous source of riboflavin and that their biosynthetic capacity is strongly influenced by both microbial community structure and host physiological status.
To elucidate the molecular mechanisms underlying the impaired riboflavin biosynthetic capacity of the gut microbiota in patients with COPD, the present study conducted a focused analysis of the riboflavin metabolism pathway. The results demonstrated a significant downregulation of this pathway in COPD patients. Consistently, three key functional genes annotated to riboflavin biosynthesis—K11752, K00794, and K14652—were all significantly reduced in abundance in COPD patients. The enzymes encoded by these genes play essential roles in the microbial riboflavin pathway. K11752, also known as RibD, is a critical bifunctional enzyme that catalyzes an early “deamination–reduction” step in riboflavin biosynthesis, converting an unstable pyrimidine precursor into a structurally stable intermediate that forms the foundation of the pathway [
42]. K00794 (RibH) catalyzes the condensation of 5-amino-6-ribitylamino-2,4(1H,3H)-pyrimidinedione with 3,4-dihydroxy-2-butanone 4-phosphate to generate the pteridine derivative DMRL, which serves as the sole substrate for downstream riboflavin synthase [
43]. K14652, designated RibA [
44], uniquely integrates two rate-limiting enzymatic activities—GTP cyclohydrolase II and DHBP synthase—within a single polypeptide, thereby coordinating substrate flux across parallel branches of the pathway and enabling efficient regulation of overall riboflavin production [
45]. The coordinated downregulation of the riboflavin metabolism pathway and its core biosynthetic genes provides a molecular basis for the reduced riboflavin synthesis capacity of the gut microbiota in COPD, ultimately diminishing microbial contribution to host riboflavin supply.
In parallel, riboflavin availability in humans also depends on dietary intake and intestinal absorption. Riboflavin is primarily obtained from animal-derived foods and fortified grains, with comparatively lower levels present in plant-based sources. Intestinal uptake is mediated by riboflavin transporters (RFVTs) expressed in the small intestinal mucosa [
46]. Subramanian et al. demonstrated that mice with targeted knockout of RFVT3 developed severe riboflavin malabsorption, highlighting the essential role of these transporters in maintaining riboflavin homeostasis [
47]. In patients with COPD, the presence of cor pulmonale frequently leads to intestinal congestion, resulting in mucosal ischemia and hypoxia. Chronic hypoxic conditions can suppress RFVT2 transcription through hypoxia-inducible factor–mediated pathways, while inflammatory mediators further inhibit RFVT expression [
46], collectively reducing riboflavin absorption efficiency. Supporting the reversibility of this process, Gariballa et al. reported that riboflavin supplementation at a dose of 10 mg/day for 8 weeks increased plasma riboflavin concentrations to 312 ± 58 nmol/L, approaching levels observed in healthy individuals, while reducing fecal riboflavin excretion to 157 ± 41 µg, indicating partial restoration of intestinal absorption capacity [
38]. These findings provide a mechanistic rationale for riboflavin supplementation in COPD patients. Taken together, insufficient dietary intake or impaired absorption, combined with reduced microbial biosynthetic capacity, likely contributes to the observed decline in serum riboflavin levels in COPD patients.
Importantly, integrated metagenomic and untargeted metabolomic analyses revealed a previously unrecognized link between gut microbial dysfunction, systemic metabolic alterations, and impaired pulmonary function in COPD. The key microbial gene K11752 showed a strong association with serum riboflavin concentrations, and serum riboflavin levels were positively correlated with lung function indices. These findings support the existence of a gut microbiota–riboflavin metabolism–lung function regulatory axis. This framework not only offers a mechanistic explanation for riboflavin deficiency in COPD but also identifies modulation of gut microbial riboflavin biosynthesis as a potential therapeutic avenue to improve riboflavin metabolism and support disease management in COPD.
The identification of reliable biomarkers is crucial for the early diagnosis and longitudinal monitoring of COPD. In this study, random forest analysis combined with ROC curve evaluation identified the serum metabolite eremopetasinorol as exhibiting the strongest diagnostic performance for COPD (AUC = 0.947, 95% CI: 0.88–0.98), markedly outperforming fecal metabolites and gut microbiota–based indicators. Compared with microbiome-derived markers, serum biomarkers are generally less susceptible to inter-individual variability and environmental influences, thereby offering greater stability and reproducibility in clinical applications. Eremopetasinorol is an exceptionally rare metabolite; research on this metabolite is still in its preliminary stages, and available evidence regarding its biological origin and physiological function remains limited. To infer the potential origin of Eremopetasinorol, we performed a comprehensive cross-database analysis. Specifically, BioDeep explicitly classified it as a human metabolite, an endogenous natural product, while HMDB noted its presence in dietary sources such as green vegetables but provided no evidence for microbial production. To date, only a single study suggests that alterations in the metabolic abundance of Eremopetasinorol are associated with early-life in vivo exposure to perfluorooctane sulfonate (PFOS), and that PFOS exposure may induce liver inflammation by disrupting gut-liver crosstalk [
48]. While the exact mechanism of Eremopetasinorol in COPD remains to be elucidated, its structural classification as an eremophilane-type sesquiterpenoid—a family known for anti-inflammatory and immunomodulatory activities [
49]—suggests potential involvement in the inflammatory pathways central to COPD pathogenesis. Future studies should further explore its potential mechanisms of action in patients with COPD and validate its suitability as a serum biomarker for COPD.
Our findings have several potential clinical implications. First, the reduced serum riboflavin levels and their positive correlation with lung function suggest that riboflavin supplementation could serve as a low-cost adjunctive strategy to alleviate oxidative stress in COPD, though optimal dosage and patient subgroups require further investigation in randomized controlled trials. Second, altered gut microbial taxa (e.g., Weissella) and functional genes (e.g., K11752 in riboflavin metabolism) raise the possibility of probiotic or prebiotic approaches aimed at restoring beneficial gut bacteria and enhancing microbial riboflavin biosynthesis. Third, serum eremopetasinorol demonstrated high diagnostic accuracy (AUC = 0.947), outperforming fecal metabolites and gut microbial features. Although this rare metabolite requires further biological characterization, its strong discriminatory performance supports future large-scale validation as a non-invasive early diagnostic biomarker for COPD.
Several limitations of this study should be acknowledged. First, the absence of long-term follow-up restricts the ability to comprehensively evaluate the sustained effects of riboflavin on disease progression. Second, metabolic and microbiome-related outcomes may vary across populations, and the influence of environmental exposures, dietary patterns, and regional factors warrants careful consideration. Specifically, the general dietary pattern in northern China is relatively homogeneous, characterized by wheat-based staples and coarse grains as the main sources of carbohydrates, along with relatively fixed patterns of vegetable, meat, and fermented food consumption [
50]. Such dietary patterns are known to influence the composition of gut microbiota, including Prevotella [
51,
52]. All participants in this study were recruited from the same northern city, had long-term local residence, and shared broadly similar dietary habits; therefore, the confounding effect of dietary factors on intergroup comparisons is relatively small. Nevertheless, individual-level dietary data were not collected in this study, and the potential impact of dietary differences on the results cannot be completely excluded. Consequently, the generalizability of these findings to genetically diverse populations with different lifestyles and dietary habits may be limited. Third, although this study only enrolled patients receiving inhaled corticosteroids and bronchodilators and excluded those with recent use of systemic corticosteroids or antibiotics, we did not perform subgroup analyses based on the type or dose of inhaled medications. Therefore, the potential impact of inhaled medications on the gut microbiota and metabolic profiles cannot be completely ruled out. In addition, the associations observed in this study have not yet been validated in animal models or interventional clinical trials. Finally, as this was an observational rather than an interventional study, it was not possible to fully control for all factors that may influence gut microbiota composition and COPD disease status. This study is captures a single time-point snapshot and cannot infer causality or track intra-individual dynamics of the microbiota and metabolome during acute exacerbations or disease progression. While we strictly controlled sampling time to reduce temporal heterogeneity, true longitudinal tracking is required. Future research should therefore prioritize large-scale, multicenter, prospective longitudinal cohort, incorporating standardized dietary assessment tools, to further validate the relationships between gut microbiota, microbial metabolites, and COPD. Moreover, in-depth investigations examining the differential effects of RF across diverse ethnicities, age groups, and sex are needed to enhance the generalizability and clinical relevance of these findings.
4. Materials and Methods
4.1. Study Population and Evaluation Criteria
Participants were recruited between June 2023 and May 2024 from the Department of Respiratory and Critical Care Medicine at Beijing Chaoyang Hospital, Capital Medical University (ClinicalTrials.gov registration number: NCT03044847). In total, 74 patients with COPD and 30 healthy controls were enrolled. Eligibility criteria for the COPD group included: (1) age > 40 years; (2) fulfillment of the GOLD 2021 diagnostic criteria, defined as a post-bronchodilator FEV1/FVC < 70% of the predicted value; (3) clinical stability with no acute exacerbations within the preceding 4 weeks; and (4) a documented history of smoking (Smoking index ≥ 10 packs/year). Healthy controls were defined as individuals with normal pulmonary function confirmed by spirometry and no history of chronic respiratory disease. Exclusion criteria applied to both groups were as follows: (1) coexisting asthma, active pulmonary tuberculosis, interstitial pneumonia, or severe bronchiectasis; (2) severe comorbidities (acute infection, diabetes mellitus, stroke, cardiovascular disease, hepatic or renal insufficiency, malignancy, or autoimmune diseases); (3) a history of chronic diarrhea or constipation; (4) prior gastrointestinal surgery; (5) hyperlipidemia; (6) use of probiotics or antibiotics within 4 weeks prior to enrollment; (7) use of oral corticosteroids or Chinese herbal medicine within the preceding 3 months; and (8) pregnancy or lactation. All enrolled patients with COPD were in a stable disease state at the time of inclusion. Comprehensive demographic and clinical data were collected, including exposure to indoor and outdoor environmental pollutants, duration of COPD diagnosis, smoking status, medication use, dietary habits, and GOLD classification. Blood sampling and pulmonary function testing were performed for all participants. The study protocol was approved by the Ethics Committee of Beijing Chaoyang Hospital, Capital Medical University (Approval No. 2023-5-25-6), and written informed consent was obtained from all participants prior to enrollment.
4.2. Sample Collection and Biochemical Index Assessment
Peripheral blood samples were collected from healthy individuals and patients with COPD. Following standardized preprocessing procedures, serum was isolated by centrifugation at 3000 rpm for 10 min, and the supernatant was collected. The centrifugation speed and time were strictly controlled to avoid hemolysis. Biochemical indices were measured using a fully automated biochemical analyzer. Remaining serum samples were stored at −80 °C until further analysis. Hemolyzed serum samples were discarded to ensure the quality of metabolomic analysis. Fresh fecal samples (approximately 6 g) were collected from each participant, with material obtained from the central portion of the stool to minimize environmental contamination. Samples were immediately aliquoted into sterile cryotubes (approximately 3 g per tube) and stored at −80 °C for subsequent analyses.
4.3. Metagenomic Analysis
Total fecal DNA was extracted using the cetyltrimethylammonium bromide method according to standard protocols, and three technical replicates were set for each sample to ensure extraction reproducibility. DNA concentration was quantified using a Qubit 3.0 Fluorometer (Invitrogen, Qubit™ dsDNA HS Assay Kit), and DNA integrity was assessed by 1% agarose gel electrophoresis. For library preparation, 10 ng of high-quality genomic DNA, as quantified by Qubit, was transferred into a 96-well plate and adjusted to the required volume with nuclease-free water. The DNA then underwent enzymatic fragmentation, end repair, ligation product purification, PCR amplification, and size selection of amplified fragments. Library fragment quality was assessed using the Qsep-400 system, and library concentration was re-quantified using the Qubit 3.0 Fluorometer. Prepared libraries were sequenced on an Illumina NovaSeq 6000 platform using the NovaSeq 6000 S4 Reagent Kit (Illumina, San Diego, CA, USA). The raw reads obtained from the above sequencing contained low-quality sequences.
4.4. Bioinformatics Analysis
Raw sequencing reads were initially quality-filtered using Trimmomatic to generate high-quality clean reads. Host-derived sequences were subsequently removed by aligning the filtered reads to the human reference genome using Bowtie2. De novo metagenomic assembly was performed using MEGAHIT (version 1.1.2) with default parameters, and assembled contigs shorter than 300 bp were excluded from further analysis. Assembly quality was evaluated using QUAST (version 2.3) with default settings. Open reading frames were predicted from assembled contigs using MetaGeneMark (version 3.26; parameters: −A −D −f G). To reduce redundancy, predicted gene sequences were clustered using MMseqs2, applying a sequence identity threshold of 95% and a coverage threshold of 90% to generate a non-redundant gene catalog. Protein sequences derived from this catalog were aligned against the NCBI non-redundant (Nr) protein database using DIAMOND BLAST (version 0.9.29.130), with an E-value cutoff of ≤1 × 10−5. For each gene, functional annotation was assigned based on the best-matching reference sequence. Functional annotation and abundance profiling were conducted by aligning non-redundant protein sequences against the KEGG, eggNOG, CAZy, and CARD databases. Differentially abundant taxonomic features between the COPD and control groups were identified using LEfSe, with a LDA score threshold set at ≥3.0. The raw metagenomic sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under the BioProject accession number PRJNA1454801.
4.5. Metabolomic Analysis
Metabolites were extracted from fecal and serum samples using methanol-based protein precipitation, with three technical replicates per sample. A quality control (QC) sample was prepared by mixing equal volumes of all samples to monitor the stability of the extraction process. Untargeted metabolomic profiling was performed using a liquid chromatography–mass spectrometry (LC–MS) platform comprising a Waters Acquity I-Class PLUS ultra-performance liquid chromatography (UPLC) system coupled to a Waters Xevo G2-XS quadrupole time-of-flight (QTOF) high-resolution mass spectrometer. Chromatographic separation was achieved using an Acquity UPLC HSS T3 column (1.8 µm, 2.1 mm × 100 mm). The Xevo G2-XS QTOF mass spectrometer was operated in MSe acquisition mode under the control of MassLynx version 4.2 software (Waters), enabling simultaneous collection of precursor (MS1) and fragment ion (MS2) spectra. Raw LC–MS data were processed using Progenesis QI software (version 4.0) for peak detection, retention time alignment, normalization, and data filtering. Metabolite identification was performed by matching accurate mass, retention time, and fragmentation patterns against the online METLIN database, publicly available metabolite databases, and a curated in-house database integrated within Progenesis QI, with additional confirmation based on theoretical fragment ion matching. After database searching, overall data quality was further assessed by intra-group reproducibility (evaluated by PCA clustering) and inter-group differences (greater separation in PCA and lower inter-group correlation indicate stronger group discrimination). The fecal non-targeted metabolomics data and serum non-targeted metabolomics data have been deposited in the MetaboLights database under the accession numbers MTBLS14336 and MTBLS14338, respectively.
4.6. Statistical Analysis
All statistical analyses were performed using the R programming environment. Within-sample (alpha) diversity was assessed by calculating the Shannon and Simpson indices, and differences between the COPD and control groups were evaluated using Welch’s t-test. Between-sample (beta) diversity was quantified using Bray–Curtis distance matrices and visualized by PCoA. Group-level differences in beta diversity were statistically tested using analysis of similarities (ANOSIM). Differentially abundant taxonomic features between groups were identified using LEfSe. For functional analysis, the non-redundant gene catalog was annotated by comparison against functional gene and metabolic databases, including KEGG, eggNOG, CAZy, and CARD, using the DIAMOND alignment tool. Differences in functional module abundance between groups were evaluated using non-parametric statistical tests. Correlations between microbial species were assessed using Spearman’s rank correlation coefficient.
In untargeted metabolomics analyses, PCA and Spearman correlation analysis were applied to assess sample repeatability within groups and to evaluate the stability of quality control samples. Identified metabolites were annotated with classification and pathway information by querying the KEGG, HMDB, and LipidMaps databases. Group-wise fold changes were calculated for each metabolite, and statistical significance was determined using t-tests. OPLS-DA was conducted using the R language package ropls, and model robustness was assessed through 200 permutation tests. Variable importance in projection (VIP) scores were calculated using cross-validation. Differential metabolites were identified by integrating fold change, p value, and VIP score, with screening criteria defined as FC > 1, p value < 0.05, and VIP > 1. KEGG pathway enrichment analysis of differential metabolites was performed using the hypergeometric distribution test.
To integrate metagenomic and metabolomic datasets, Spearman correlation analysis was employed to examine associations between differentially abundant microbial species, functional genes, and differential metabolites. Correlations between differential metabolites and differential functional genes were further analyzed using R. Random forest analysis combined with ROC curve analysis was used to identify candidate biomarkers from microbial and metabolic profiles and to evaluate their diagnostic performance.