1. Introduction
Goat cheese has long been an integral component of culinary traditions across many regions worldwide and is valued for its distinctive aroma and flavor. Its sensory properties are shaped by multiple factors, including milk composition, production conditions, and—critically—the metabolic activity of microorganisms during fermentation and ripening. In recent years, goat milk-derived products have attracted increasing consumer interest as alternatives to cow’s milk cheeses, often due to differences in digestibility, protein composition, and amino acid profiles [
1,
2].
The microbial community associated with goat cheese plays a central role in shaping its biochemical and sensory properties through complex metabolic interactions during fermentation and ripening. Several metagenomic and multi-omics studies in cheese systems have demonstrated that microbial succession and metabolic activity directly influence proteolysis, volatile formation, and flavor development during ripening [
3,
4,
5]. However, many earlier investigations relied primarily on 16S rRNA gene amplicon sequencing, which captures broad taxonomic patterns but provides limited resolution at the strain level and does not enable direct functional gene profiling or mobile genetic element detection.
The limitations discussed are particularly pertinent to artisanal cheeses made from raw milk, where intricate and dynamic microbial consortia evolve under the influence of local environmental conditions, production methodologies, and ripening-associated stressors, such as salt concentration, pH fluctuations, and nutrient depletion. Previous investigations of raw-milk and artisanal cheeses have demonstrated that production environment, processing technology, and spatial differentiation within the cheese matrix significantly shape microbial succession and community structure [
6]. Although this microbial diversity is frequently associated with distinctive sensory attributes, it remains inadequately characterized at the genomic and functional levels, particularly concerning strain-level diversification, genome plasticity, and reweighting of metabolic pathways during the ripening process. In particular, while recent studies have begun to incorporate shotgun metagenomic and genome-resolved approaches in fermented dairy systems, comprehensive strain-level analyses integrating functional pathway restructuring in artisanal goat cheese remain limited [
4]. For example, Bertuzzi [
3] applied whole-metagenome shotgun sequencing to investigate microbial succession and metabolic potential on the surface of smear-ripened cheese, correlating strain-level dynamics with volatile compound production. While this study provided important insights into species- and strain-level succession in a model surface-ripened system, it focused primarily on inoculated smear communities and surface-associated microbiota rather than on genome-resolved reconstruction across defined ripening stages of artisanal raw-milk cheeses. Moreover, integration of strain-level genome reconstruction with resistome profiling, bacteriophage–host interactions, and pathway-level metabolic reallocation in artisanal goat cheese remains insufficiently explored.
Despite increasing interest in cheese microbiome research, genome-resolved investigations integrating strain-level diversification with functional pathway restructuring during artisanal goat cheese ripening remain scarce. To our knowledge, this study represents the first comprehensive genome-resolved metagenomic analysis of artisanal Polish raw-milk goat cheese, combining strain-level genomic reconstruction with pathway-level functional profiling. The potential contribution of mobile genetic elements to metabolic variability and strain-level diversification in traditional dairy systems remains insufficiently characterized, particularly in artisanal goat cheese.
Accordingly, we hypothesized that cheese ripening involves not only taxonomic succession but also strain-level genomic diversification and coordinated reallocation of metabolic pathways, potentially associated with genome plasticity and mobile genetic elements. To test this, we applied shotgun metagenomic sequencing and genome-resolved analyses to characterize microbial succession, functional potential, and genomic plasticity across successive ripening stages.
2. Results
2.1. Overall Quality of Sequencing Data
Metagenomic sequencing generated an average of 47.1 ± 5.6 million paired-end reads per sample, corresponding to 6.93 ± 0.79 Gb of data (range: 37.9–57.6 million reads; 5.67–8.59 Gb). Taxonomic classification using Kraken2 demonstrated consistently high assignment rates across all samples, with an average of 96.9% of reads classified (range: 92.4–98.6%), indicating that the vast majority of sequenced reads were of microbial origin.
Sequencing quality was consistently maintained across all analyzed samples, as evidenced by the comprehensive quality control and assembly statistics summarized in the MultiQC report (
Supplementary Figure S1). Following read filtering and quality control, no samples were excluded from subsequent analyses, indicating a uniform level of data quality throughout the dataset.
Metagenomic assembly metrics demonstrated substantial assembly continuity, as reflected by total assembly length and N50 values calculated using QUAST v5.3.0. Contig-level statistics obtained using MEGAHIT v1.2.9, including the number of contigs, average contig length, and N50, further confirmed stable assembly performance across all samples.
The alignment of a substantial proportion of reads to assembled contigs using Bowtie2/HISAT2 demonstrated sufficient sequencing depth and high assembly quality. Taxonomic classification with Kraken2 consistently identified dominant taxa, with only a minor fraction of reads remaining unclassified.
Quality control metrics generated by fastp further substantiated the robustness of the sequencing data, as reflected by high read retention after filtering, consistent GC content, minimal adapter contamination, low duplication levels, and high base quality scores (
Figure S1).
In summary, the sequencing data exhibited high and consistent quality across all samples, providing a reliable basis for subsequent taxonomic profiling, functional characterization, and genome-resolved analyses.
2.2. Microbial Community Composition
All alpha-diversity comparisons were performed on the rarefied dataset to ensure equal sequencing depth across samples. Alpha diversity was assessed using richness estimators (Observed, Chao1, ACE) and diversity/evenness metrics (Shannon, Simpson, Pielou) (
Table S1, sheet: Alpha;
Figure 1).
Beta diversity (community structure). Group differences in Bray–Curtis beta diversity were tested by PERMANOVA (adonis2). The sample group (series: ripening vs. product) explained 19.0% of the variance in community dissimilarities (R
2 = 0.1901), but the effect was not statistically significant (F = 2.347,
p = 0.23, 99 permutations;
Table S1, sheet: Permanova). Homogeneity of multivariate dispersions was also not different between groups (
p = 0.9068;
Table S1, sheet: Permanova), indicating that the PERMANOVA result was not confounded by unequal within-group dispersion. Pairwise comparisons likewise showed no significant separation (
Table S1, sheet: Permanova).
The microbial communities associated with the final product demonstrated greater alpha diversity than those observed in samples collected during the ripening phase (
Figure 1). Indices related to richness and diversity, such as ACE, Chao1, observed richness, and Shannon diversity, consistently exhibited higher values in the final product samples. In contrast, Pielou’s evenness and Simpson diversity indices showed similar values between the ripening and final product samples.
At the genus level, the microbial communities of both groups were dominated by lactic acid bacteria. A comprehensive genus-level overview of relative abundance profiles for all detected genera is provided as
Supplementary Table S1, sheet: Genus_all, and the Top 30 genera by mean relative abundance (±SD) for each group are summarised in
Table S1, sheet: Genus_top_30. Across both ripening and product samples,
Lactococcus was the most abundant genus (mean 88.768% in ripening vs. 79.926% in product;
Table S1, sheet: Genus_top_30). Several additional genera contributed at lower proportions and showed group-wise shifts, including
Lactiplantibacillus (6.310% vs. 10.551%),
Lacticaseibacillus (2.236% vs. 3.545%), and
Streptococcus (1.102% vs. 1.549%) (
Table S1, sheet: Genus_top_30). Other genera occurred at markedly lower mean abundances (generally <1% each) but collectively accounted for a larger “tail” of low-abundance taxa (
Table S1, sheet: Genus_all, Genus_top_30).
Following the genus-level community overview, we evaluated differential features between ripening and product groups using LEfSe, reporting effect sizes (LDAmean) together with BH–FDR corrected
q-values (
Table S1, sheet: Genus_LDA) and contextualizing these results with group-wise relative abundances (
Table S1, sheet: Genus_all). Differential abundance analysis using an LDA-based approach identified multiple taxa with non-zero effect sizes differentiating ripening and product samples (
Table S1, sheet: Genus_LDA;
Figure 2). The largest LDA effect sizes (LDAmean) were observed for
Orthohepadnavirus (LDAmean = 0.165,
q = 2.98 × 10
−1),
Mycobacterium (0.114,
q = 2.98 × 10
−1),
Spiroplasma (0.099,
q = 3.06 × 10
−1),
Mesorhizobium (0.084,
q = 2.98 × 10
−1), and
Micrococcus (0.071,
q = 2.98 × 10
−1), while additional genera (e.g.,
Lactobacillus,
Pseudomonas,
Pluralibacter,
Streptococcus,
Paenibacillus,
Riemerella) showed smaller effect sizes (LDAmean ≈ 0.019–0.059,
q ≈ 2.67 × 10
−1–2.95 × 10
−1). Importantly, these taxa occurred at low mean relative abundances at the genus level compared with dominant genera (e.g.,
Pseudomonas 0.014% vs. 0.046%;
Acinetobacter 0.016% vs. 0.042%;
Table S1, sheet: Genus_top_30), underscoring that the between-group differences are driven largely by low-abundance community members. Although several taxa had nominal
p-values < 0.05, after Benjamini–Hochberg correction for multiple testing, all taxa had
q-values > 0.05 (minimum
q ≈ 0.135). Therefore, none of the taxa met the predefined FDR threshold for statistical significance, and the differences visualized in
Figure 3 should be interpreted as exploratory patterns rather than statistically supported biomarkers.
Hierarchical clustering utilising Bray–Curtis dissimilarity demonstrated a partial distinction between samples collected during the ripening phase and those obtained from the final product (
Figure 4), although some overlap was observed between the two groups.
Principal coordinates analysis (PCoA) based on Bray–Curtis dissimilarity supported the hierarchical clustering pattern and indicated a partial differentiation between microbial communities from the ripening and product stages (2). The primary coordinate (PCoA1 = 46.59%) captured the major gradient in community composition, while PCoA2 (24.53%) and PCoA3 (9.89%) further resolved within-group variation. Although some overlap occurred near the ordination origin, the product samples displayed a broader dispersion than the more compact ripening cluster, indicating increased heterogeneity of the final product microbiota. The taxa contributing most strongly to the ordination included lactic acid bacteria commonly associated with fermentation (Lactococcus lactis, OTU_1; Lactiplantibacillus plantarum, OTU_117; Lacticaseibacillus paracasei, OTU_122), as well as Staphylococcus saprophyticus (OTU_388), Staphylococcus succinus (OTU_391) and Enterobacter sp. T2 (OTU_4578), suggesting that variation in both fermentative and accompanying taxa contributed to between-sample differences.
Discriminant analysis revealed several taxa exhibiting differential associations with ripening and final product samples (
Figure 3). Genera such as
Mycobacterium,
Pseudomonas,
Spiroplasma, and
Sphingobium were enriched in ripening samples, whereas
Limosilactobacillus,
Streptomyces,
Rhizobium, and
Acinetobacter were more prevalent in the final products.
2.3. MAGs and Genome-Resolved Functional Analysis
In this study, a genome-resolved analysis was conducted to reconstruct metagenome-assembled genomes (MAGs) representing the predominant and consistently detected constituents of the cheese microbiome. The genome reconstruction process was independently executed using two assemblers, MEGAHIT and SPAdes, in conjunction with two binning algorithms, MetaBAT2, and MaxBin2. This approach facilitated the evaluation of genome recovery robustness across various methodological workflows.
In total, 37 MAGs were recovered from samples collected during the ripening stage, whereas 141 MAGs were reconstructed from the final product, corresponding to an approximately 3.8-fold increase in the total number of reconstructed genomes in mature cheese samples.
Taxonomic classification of reconstructed MAGs revealed higher genomic richness in the final product compared to the ripening stage. The final product comprised 10 genera and 13 species, whereas the ripening stage included 6 genera and 7 species. The distribution of MAGs per genus and species across production stages is summarised in
Supplementary (Table S2). Although certain taxa, including
Staphylococcus and
Rothia, were exclusively reconstructed from the final product, Fisher’s exact test followed by Benjamini–Hochberg correction did not identify statistically significant differences in MAG representation at either the genus or species level (all
q > 0.05). These findings indicate quantitative expansion of genomic diversity in the final product rather than statistically robust taxonomic restructuring between stages. The higher number of reconstructed MAGs in the final product likely reflects increased overall genomic complexity and cumulative microbial contributions during cheese maturation rather than stage-specific enrichment of particular taxa.
In all assembly–binning combinations,
Lactiplantibacillus plantarum,
Lacticaseibacillus paracasei, and
Lactococcus lactis were the most frequently reconstructed species (
Figure 5,
Figure 6 and
Figure 7). Assembly statistics of representative high-quality MAGs corresponding to dominant lactic acid bacteria (LAB) species in mature ripened cheese are summarized in
Table 1.
Comparative genomic alignment of reconstructed MAGs against corresponding reference genomes was performed using BLASTn and visualized as circular genome maps generated with Proksee [
7]. (
Figure 5,
Figure 6 and
Figure 7). The alignments demonstrated near-continuous genome-wide coverage, indicating strong conservation of chromosomal architecture. GTDB-Tk analysis further confirmed high genomic similarity between reconstructed MAGs and their closest reference genomes. The
Lactococcus lactis MAG exhibited an average nucleotide identity (ANI) of 98.95% with an alignment fraction of 0.943. Similarly, the
Lacticaseibacillus paracasei and
Lactiplantibacillus plantarum MAGs showed ANI values of 98.96% and 98.98%, with alignment fractions of 0.877 and 0.878, respectively. These results indicate that the majority of reference genomic content was represented in the reconstructed genomes.
Localized regions of reduced alignment coverage were observed in circular genome maps, suggesting the presence of strain-specific genomic segments. Notably, each of these taxa consistently produced multiple distinct MAGs, regardless of the reconstruction strategy employed, indicating significant strain-level heterogeneity within the dominant cheese-associated populations. The repeated recovery of multiple MAGs per species further illustrates that their genomes were highly abundant in the metagenomic datasets and amenable to reliable genome reconstruction.
To quantitatively assess intra-species genomic diversity, high-quality MAGs were dereplicated at a 99% ANI threshold. This analysis resolved multiple strain-level clusters within dominant taxa. Specifically,
Lactiplantibacillus plantarum (
n = 35 MAGs) formed 2 clusters,
Lacticaseibacillus paracasei (
n = 29) formed 9 clusters,
Lactococcus lactis (
n = 23) formed 2 clusters,
Lentilactobacillus parabuchneri (
n = 18) formed 10 clusters,
Staphylococcus equorum (
n = 14) formed 5 clusters, and
Streptococcus thermophilus (
n = 21) formed 10 clusters (
Figure S2).
These results indicate substantial intra-species genomic heterogeneity, demonstrating that the reconstructed MAGs represent genomically distinct strain-level populations rather than redundant assemblies.
The number of MAGs retrieved for individual species varied across different assemblers and binning tools (
Figure 8), highlighting the impact of methodological selection on the efficiency of genome recovery. Despite these variations, the same three species consistently predominated in the MAG datasets across all reconstruction methodologies and at both production stages. This convergence across independent workflows strongly supports the focus of downstream genome-resolved analyses on these taxa as representative constituents of the core cheese microbiome.
For each of the three predominant species, Lactiplantibacillus plantarum, Lacticaseibacillus paracasei, and Lactococcus lactis, a single high-quality metagenome-assembled genome (MAG) with greater than 80% completeness and less than 5% contamination was selected from the final product for comprehensive comparative analysis against the corresponding reference genome. MAGs were selected based on the highest sequencing coverage among the available representatives, thereby ensuring maximal genomic completeness and analytical robustness.
Comparative genomic alignments have identified distinct genomic regions characterised by diminished synteny and sequence similarity compared to reference genomes. Functional annotation revealed that these regions of low synteny are predominantly linked to prophage-related elements and other mobile genetic elements. The presence of such variable genomic regions underscores strain-level genomic heterogeneity within dominant cheese-associated taxa and highlights the enhanced value of genome-resolved metagenomics beyond mere taxonomic profiling. This genomic variability indicates ongoing microevolutionary processes within dominant cheese-associated populations.
2.4. Resistome and Safety Marker
A total of 433 ARG hits corresponding to 44 unique ARO entries and 27 AMR gene families were identified across all samples (
Supplementary Table S3, tab:
Stage_Summary). The final product stage harbored a higher number of ARG hits (
n = 346) compared to the ripening stage (
n = 87). Per-sample comparisons confirmed significantly higher ARG counts in the final product for total hits (median 39.5 vs. 22.0;
q = 0.012), unique AROs (
q = 0.013), and AMR gene families (
q = 0.013) (
Supplementary Table S3, tab:
Stage_MW_tests).
Beta-diversity analysis based on Jaccard (presence/absence) and Bray–Curtis (abundance) distances further indicated significant compositional differences between ripening and final product resistomes (PERMANOVA,
p = 0.003;
Supplementary Table S3, tab:
PERMANOVA).
As illustrated in
Figure 9, antibiotic efflux and antibiotic inactivation were the predominant resistance mechanisms, present in most samples across both stages. At the drug-class level, resistance determinants were mainly associated with tetracyclines and aminoglycosides, whereas sulfonamide- and β-lactam-related genes were less abundant. Detailed prevalence analyses for resistance mechanisms, drug classes and AMR gene families (Fisher’s exact tests with BH–FDR correction) are provided in
Supplementary Table S3 (sheets:
Fisher_Mechanisms,
Fisher_Drug_Classes, and
Fisher_Gene_Families).
The distribution of AMR gene families across individual samples is shown in
Figure 10. Although the overall resistome structure remained broadly consistent between ripening and final product stages, a quantitative enrichment of ARG diversity and gene family counts was observed in the final product group (LSK38:45). The detected ARGs reflect the genetic potential of the cheese microbiome and do not imply phenotypic resistance or antimicrobial susceptibility.
Although statistical comparisons between ripening and final product stages were performed at the level of ARG counts, richness, and community composition, the aggregation of biological replicates limits fine-scale inference at the individual strain level. Importantly, the detected resistance-associated genes represent the genetic potential of the cheese microbiome and do not imply phenotypic resistance or antimicrobial susceptibility of the corresponding bacteria.
2.5. Phages and Host Interactions
A total of 672 putative viral sequences were initially identified through metagenomic analysis. Viral genome quality was assessed using CheckV, and only contigs classified as High-quality (90–100% completeness; checkv_quality field) were retained for downstream analyses. This stringent filtering reduced the dataset to 29 high-quality viral genomes (4.3% of all detected viral contigs).
Putative host assignment was performed using PHIST, a k-mer–based virus–host prediction tool that infers associations based on the number of shared 25-mers (k = 25) between viral contigs and candidate bacterial genomes. Viral sequences were queried against reconstructed bacterial MAGs with completeness > 80%. For each viral contig, PHIST reports the top-scoring host genome, the number of shared k-mers, and associated p-values with multiple-testing correction.
All reported host–phage associations were statistically significant after multiple-testing correction (adjusted p-values < 0.5), and the number of shared k-mers ranged from several thousand to over 50,000, supporting strong sequence-level similarity between viral contigs and predicted hosts.
At the genus level, the majority of high-quality phages were assigned to dominant lactic acid bacteria, including
Lactococcus (7 phages),
Lacticaseibacillus (7 phages), and
Lactiplantibacillus (5 phages). Additional associations were observed for
Lentilactobacillus (2 phages) and
Staphylococcus (2 phages). At the species level, the most frequently predicted hosts were
Lactococcus lactis and
Lacticaseibacillus paracasei (7 phages each), followed by
Lactiplantibacillus plantarum (5 phages). Six high-quality viral genomes did not yield confident host assignments (
Table 2).
The preferential association of high-quality bacteriophages with dominant lactic acid bacteria suggests that phage–host interactions are primarily structured around core members of the cheese microbiome. Such interactions may contribute to genome plasticity through lysogenic integration, recombination events, and horizontal gene transfer, thereby shaping population-level genomic variability within technologically relevant taxa.
All host prediction outputs, including detailed virus–host matches with k-mer counts, are provided in
Supplementary Table S4 (sheet:
PHIST_predictions_with_taxa_depth and
PHIST_taxa_counts).
The host assignments presented herein are putative and based on sequence similarity rather than experimentally validated phage–host interactions.
2.6. Functional Potenctial of the Microbiome
In this study, the analysis of KEGG-annotated gene families focused on carbohydrate, amino acid, and lipid metabolism, as these functional categories represent the primary biochemical processes involved in microbial growth, substrate utilisation, and metabolite production during cheese ripening (
Figure 11).
Within KEGG level A category 09100 (Metabolism), a total of 11 KEGG level B subsystems and 123 KEGG level C pathways were detected, corresponding to 726 unique KEGG Orthologs (KOs) across all ripening stages (
Supplementary Table S5, sheet: Global_Overview and PerSample_Overview). The overall number of detected subsystems remained comparable between LSK12–LSK15, indicating stable global metabolic richness during ripening.
Notably, the observed changes at the pathway level (
Figure 11) indicate shifts in the relative functional weighting of metabolic modules rather than uniform alterations in total gene abundance, suggesting pathway-level restructuring within dominant microbial populations.
Throughout all stages of ripening, carbohydrate metabolism (KEGG level B: 09101) consistently constituted the largest portion of the metabolic repertoire (
Figure 11). When normalised to the parent category Metabolism (09100), its relative contribution exhibited a gradual increase during the ripening process, rising from approximately 22.5% in LSK12 to over 24% in LSK15. Effect size comparisons between the first and fourth week confirmed a positive shift in proportional representation (
Supplementary Table S5, sheet: EffectSizes_B_W4vsW1), indicating an increasing functional emphasis on carbohydrate-associated processes during maturation.
Genes associated with amino acid metabolism (09105) represented the second most abundant metabolic category (
Figure 11). Although their absolute CPM values diminished during ripening, their proportional contribution to total metabolism remained relatively constant, comprising approximately 18.8–19.5% across all samples. This pattern suggests the preservation of amino acid-related functional capacity despite broader alterations in the metabolic gene repertoire.
Lipid metabolism (09103) constituted the smallest segment of the metabolic gene repertoire (
Figure 11). Although a slight reduction in absolute abundance was observed, its relative contribution to the metabolic category remained stable at approximately 6.0–6.2% throughout all ripening stages. This indicates a consistently limited but persistent representation of lipid-associated functions within the cheese microbiome.
Drawing from the KEGG level B overview of metabolic gene distribution (
Figure 11), an analysis of representative KEGG level C pathways was conducted to determine how specific functional modules shaped the observed patterns (
Figure 12;
Table S5, sheet: C_in_B_all and C_targets_view).
Galactose metabolism (KEGG level C: 00052), a significant subpathway within carbohydrate metabolism, demonstrated a progressive increase in its proportional contribution during ripening (
Figure 12). When normalised to the KEGG carbohydrate metabolism category, its relative representation increased from approximately 15.3% in LSK12 to 23.2% in LSK15. Effect size estimates confirmed a marked positive shift between the first and fourth week (
Table S5, tab: EffectSizes_C_W4vsW1), indicating enhanced functional prominence of galactose-utilising pathways during later stages of cheese ripening.
In contrast, phenylalanine, tyrosine and tryptophan biosynthesis (KEGG level C: 00400) maintained a stable proportional contribution within amino acid metabolism throughout ripening (approximately 16.2–17.1%;
Figure 12), despite gradual reductions in absolute gene abundance. This observation suggests preservation of aromatic amino acid biosynthetic potential during community restructuring.
Fatty acid biosynthesis (KEGG level C: 00061) was the most substantial subcomponent of lipid metabolism, accounting for approximately 32.4–33.1% of the KEGG lipid metabolism category at each ripening stage (
Figure 12). This consistent proportional contribution underscores fatty acid biosynthesis as a structurally stable and functionally central module within lipid metabolism, notwithstanding the overall lower abundance of lipid-associated genes observed at KEGG level B.
Collectively, the combined evidence from KEGG level B (
Figure 11) and KEGG level C analyses (
Figure 12) indicates that cheese ripening is characterised by redistribution of metabolic weighting among dominant subsystems rather than by loss of global functional diversity. Because biological replicates were pooled prior to sequencing (one metagenomic profile per time point), formal hypothesis testing was not performed; instead, proportional analyses and effect size comparisons were used to quantify functional shifts (
Table S5).
3. Discussion
This study aimed to thoroughly characterise the microbial community structure and functional potential of artisanal goat cheese produced from unpasteurized milk throughout successive ripening stages and in the final product, with a particular focus on molecular mechanisms that may underlie microbiome adaptation. By employing shotgun metagenomic sequencing in conjunction with functional profiling and genome-resolved analyses, we captured taxonomic changes and processes occurring at genomic and metabolic pathway levels that may contribute to microbial persistence during cheese ripening.
Integration of taxonomic, functional, and genome-resolved evidence suggests coordinated succession accompanied by putative functional reallocation within the community. Rather than indicating a simple replacement of dominant taxa, the observed patterns are consistent with progressive strain-level diversification within key bacterial lineages, which may occur within a relatively stable core microbiome under ripening-associated selective pressures (e.g., nutrient limitation, increased salinity, prolonged residence in the cheese matrix). Such interpretation is in line with prior functional and shotgun studies reporting that cheese ecosystems can maintain a conserved functional core despite compositional shifts [
3,
8,
9,
10,
11,
12,
13].
Alpha-diversity analyses revealed significantly greater taxonomic richness and Shannon diversity in the final product compared with ripening-stage samples, whereas indices related to evenness and dominance remained largely unchanged. This pattern suggests that increased microbial complexity in mature cheese is primarily driven by accumulation of additional low-abundance taxa rather than reorganization of community dominance. The higher alpha diversity observed in some final product samples is consistent with increased heterogeneity and potential additional microbial inputs at the final processing/handling stage (e.g., contact with equipment/packaging environment). Because alpha diversity was calculated after rarefaction to a common sequencing depth, these differences are unlikely to be driven by library-size variation [
9,
10,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23].
The partial separation observed in Bray–Curtis ordination, together with the broader dispersion of final product samples, suggests a stage-associated shift in community composition accompanied by increased heterogeneity at the product stage, potentially reflecting additional microbial inputs and/or more variable microenvironments during late processing and handling. Published studies examining alpha-diversity dynamics during cheese ripening report heterogeneous and occasionally contrasting trajectories, contingent on cheese type, production protocol, and degree of environmental exposure during ripening. Therefore, the increase in richness observed here should be regarded as system-specific rather than a universal ripening pattern [
9,
10,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23].
Discriminant analysis identified taxa associated with distinct ripening stages, suggesting stage-dependent restructuring of the microbial community. However, no taxa remained significant after FDR correction. The absence of FDR-significant taxa likely reflects limited statistical power and inter-sample variability; therefore, larger sample sizes and/or targeted validation (e.g., qPCR, culture-based assays, targeted shotgun sequencing) will be needed to confirm potential biomarkers suggested by LDA rankings [
9,
14,
21,
22,
23].
Among discriminant genera,
Pseudomonas was linked to earlier stages of ripening, consistent with its frequent presence in raw milk and early processing environments [
14,
17,
21,
23]. The decline of Pseudomonas during maturation is commonly attributed to limited tolerance to ripening-associated conditions and competitive exclusion by lactic acid bacteria, and our patterns are consistent with this ecological filtering [
10,
11,
17,
19,
23,
24]. From a genome-centric perspective, this may indicate that Pseudomonas lineages present early in production are less likely to persist under prolonged ripening pressure, although this interpretation remains putative without strain-resolved tracking of Pseudomonas populations across time [
14,
17,
21,
23].
In contrast,
Limosilactobacillus exhibited a stronger association with the final products. As a non-starter lactic acid bacterium (NSLAB),
Limosilactobacillus and related NSLAB lineages are widely recognized for persistence during prolonged ripening and for contributing to late-stage metabolism under nutrient-limited and stress-rich conditions [
10,
15,
20,
25,
26]. This enrichment may reflect strain-level functional diversification, including differences in carbohydrate utilization and stress-response repertoires, which can enable niche partitioning among closely related strains. Likewise, for genera such as
Mycobacterium,
Spiroplasma,
Sphingobium,
Streptomyces, and
Rhizobium, the literature provides limited evidence for consistent stage-specific roles during cheese ripening, and their contribution to group separation likely captures subtle, production-specific signatures rather than universal ripening drivers [
14,
16,
21,
23,
27].
To strengthen inference below the species level and assess whether stage-associated signatures align with stable bacterial populations at the genomic scale, we performed genome-resolved metagenomic analysis using metagenome-assembled genomes (MAGs). This approach enables evaluation of genomic variability below species level and has been increasingly applied in studies of ripened cheeses. Here, MAG reconstruction was performed using multiple independent assembly and binning strategies, supporting robustness of the reconstructed dominant lineages across workflows [
11,
12,
22,
27,
28,
29].
Although a higher total number of MAGs and greater taxonomic richness were observed in the final product compared to the ripening stage, statistical testing did not reveal significant differences in MAG representation at the genus or species level after correction for multiple testing. This suggests that the final product exhibits a quantitative expansion of genomic diversity rather than a fundamental restructuring of the dominant bacterial community.
Across reconstruction strategies,
Lactiplantibacillus plantarum,
Lacticaseibacillus paracasei, and
Lactococcus lactis were most frequently reconstructed. The recovery of multiple MAGs assigned to the same species supports the presence of strain-level heterogeneity rather than reliance on single genomic representatives, consistent with other longitudinal and genome-resolved cheese studies [
11,
12,
21,
27,
28,
29,
30]. Dereplication at a 99% ANI threshold further resolved multiple intra-species clusters within dominant taxa, including 9 clusters in
Lacticaseibacillus paracasei and 10 in
Lentilactobacillus parabuchneri, as well as distinct clusters in
Lactiplantibacillus plantarum,
Lactococcus lactis,
Staphylococcus equorum, and
Streptococcus thermophilus. These results provide quantitative support for substantial intra-species genomic heterogeneity and indicate that reconstructed MAGs represent genomically distinct strain-level populations rather than redundant assemblies.
Importantly, the presence of multiple genomically distinct strains within dominant cheese-associated species can provide a substrate for functional differentiation (e.g., accessory gene differences, mobile elements, stress-response traits) and may contribute to ecological robustness by buffering functional performance under fluctuating conditions. While such patterns are consistent with strain-level diversification and adaptive responses, the extent to which they reflect within-system microevolution versus coexistence of multiple introduced lineages should be considered putative in the absence of explicit SNP-trajectory or pangenome dynamics across time [
9,
12,
28,
29,
31].
Sequence-based host prediction indicated that most high-quality viral contigs were associated with lactic acid bacteria, including
Lactococcus lactis,
Lacticaseibacillus paracasei, and
Lactiplantibacillus plantarum. Similar LAB-centred phage associations have been reported in genome-resolved and longitudinal studies of cheese ecosystems [
12,
29,
31,
32,
33]. However, these phage–host assignments should be interpreted as putative because they depend on sequence-based host inference rather than direct demonstration of infection in situ [
28,
32,
33,
34,
35].
Metagenomic profiling identified a resistome predominated by mechanisms of antibiotic efflux and inactivation, with resistance determinants mainly linked to tetracyclines and aminoglycosides. Comparable resistome profiles have been reported in cheese and other fermented food systems when analyzed via shotgun metagenomics [
21,
23,
34]. Importantly, detection of resistance genes reflects genetic potential rather than confirmed phenotypic resistance or clinical risk, and therefore should be interpreted cautiously [
3,
21,
24,
34,
35].
Functional profiling of KEGG-annotated gene families indicates that cheese ripening is characterized not by complete functional turnover but by gradual reallocation of metabolic capacities within a relatively stable functional core. Carbohydrate metabolism predominated within the metabolic gene pool, consistent with functional surveys of ripening cheeses [
9,
10,
11]. Pathway-resolved analyses suggested dynamic changes in galactose metabolism in later ripening stages, and when interpreted alongside genome-resolved results, these patterns are consistent with a model in which pathway-level shifts may be supported by strain-level heterogeneity within dominant lineages. While numerous shotgun studies report carbohydrate metabolism as central to ripened cheese ecosystems, KEGG module–resolved time-series dynamics for specific modules such as galactose utilization remain less commonly detailed, supporting the value of reporting pathway-level resolution in artisanal systems [
11,
21].
Amino acid metabolism constituted the second most prominent functional category and maintained a stable proportional contribution despite reduced absolute gene abundance. This trend aligns with the well-documented significance of proteolysis-linked amino acid metabolism for ripening progression and flavor development, and with longitudinal studies linking microbial succession to amino acid pathway dynamics [
8,
10,
13,
15,
20,
28]. Lipid metabolism represented a smaller but stable fraction of the metabolic repertoire, with fatty acid biosynthesis as a dominant lipid-associated subcomponent, consistent with shotgun functional surveys of ripened cheeses and fermented dairy products [
8,
9,
10,
12,
13,
21].
Overall, integration of diversity analyses, discriminant screening, genome-resolved reconstruction, and functional profiling supports a model in which artisanal goat cheese ripening is driven by a conserved core community dominated by lactic acid bacteria, accompanied by increased low-abundance diversity and strain-level heterogeneity that may enable functional buffering and ecological stability. These conclusions emphasize putative mechanisms supported by metagenomic evidence while acknowledging current limits of inference for causality, within-wheel spatial heterogeneity, and phage/ARG activity without targeted validation.
4. Materials and Methods
4.1. Cheese Production and Sample Collection
The experimental material consisted of goat cheese produced from unpasteurized milk obtained from a single artisanal producer located in the Masuria region of Poland. The cheese was manufactured without the use of commercial starter cultures, relying exclusively on autochthonous microbiota naturally present in raw milk. The ripening process was conducted for four weeks at 12 °C, and the final product was analyzed after five weeks of ripening.
Sampling was performed weekly throughout the ripening period (weeks 1–4) and additionally from the final product (week 5). At each sampling point, three independent biological replicates were collected. Each replicate consisted of 1 g of cheese, yielding a total of three biological samples per time point. Samples were immediately transported under refrigerated conditions within 12 h and stored at −80 °C until further processing.
Prior to DNA extraction, samples were thawed at 4 °C and homogenized at a ratio of 1 g cheese to 5 mL sterile physiological saline (0.9% NaCl). For the final product, samples were collected from multiple depths and locations within the cheese matrix using aseptic core drilling with sterilized high-speed steel (HSS) drill bits. Spatial differences within the cheese were not analyzed separately; instead, collected subsamples were treated as biological replicates representing the overall microbial community of the cheese at the given ripening stage.
Samples collected during the ripening process were designated as LSK12–LSK15, corresponding to weeks 1–4 of ripening, while samples obtained from the final product were designated as LSK38–LSK45. An overview of sample designation and experimental design is provided in
Table 3.
4.2. DNA Extraction and Sequencing
Cheese samples (1 g) were homogenized in bead-beating tubes supplied with the Qiagen DNeasy PowerFood Kit using a TissueLyser III (Qiagen, Hilden, Germany). DNA extraction was performed according to the manufacturer’s protocol, with the addition of an RNase A treatment (10 µg/µL, 15 min, 37 °C). DNA was eluted in 100 µL of elution buffer (EB).
DNA quality and quantity were assessed using a Qubit 4 fluorometer (dsDNA HS Assay Kit, Invitrogen, Carlsbad, CA, USA), NanoDrop One spectrophotometer, and 1.5% agarose gel electrophoresis.
For each ripening stage, DNA obtained from three biological replicates was pooled after extraction to generate a composite sample representing the stage-specific microbial community. This strategy was adopted to increase effective sequencing depth per ripening stage and to enhance metagenome assembly quality and genome reconstruction efficiency within the genome-resolved analytical framework. As the primary objective of the study was to characterize stage-level microbial succession and functional restructuring rather than within-stage variability, pooling was considered appropriate for the experimental design.
Shotgun metagenomic libraries were prepared using the TruSeq DNA PCR-Free Library Preparation Kit (Illumina, San Diego, CA, USA) and sequenced on an Illumina NovaSeq X platform using a paired-end configuration (2 × 150 bp).
4.3. Bioinformatic Processing
4.3.1. Quality Control
Raw sequencing reads were initially assessed for quality using FastQC (
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, accessed on 1 July 2024) [
36]. Adapter sequences and low-quality bases were removed using fastp (v0.23.4) [
37], which performs automatic adapter detection, per-read quality filtering, and correction of mismatched bases in overlapping paired-end reads. Quality filtering was applied with a minimum Phred score threshold of Q20. Reads shorter than 50 bp after trimming were discarded. Post-filtering quality metrics were reassessed with FastQC, and summary statistics were compiled using MultiQC (
https://seqera.io/multiqc/, accessed on 1 July 2024) [
38].
4.3.2. Taxonomic Assignment and Diversity
Taxonomic identification was performed using Kraken2 (with the standard database, version dated 28 December 2024). Diversity statistics were calculated in R (v4.4.2) [
39] using the phyloseq [
40], vega, ggplot2, and ggpubr packages.
To control for sequencing depth effects in alpha-diversity comparisons, all samples were rarefied to an even sequencing depth using phyloseq::rarefy_even_depth (random seed = 1024), and alpha-diversity indices were computed on the rarefied phyloseq object. Rarefaction reduced the number of retained taxa from 9157 to 8285, as expected due to the loss of very low-abundance taxa during subsampling.
For β-diversity analysis, Hellinger transformation, Bray–Curtis distance, and hierarchical clustering were applied (MicrobiotaProces) [
41]. To screen for taxa differentiating ripening and product groups, we performed a LEfSe-like analysis implemented in MicrobiotaProcess using the Kruskal–Wallis test (nominal
p ≤ 0.05) and an LDA effect-size threshold of ≥0.01. Taxa were ranked by LDA effect size.
Differential abundance was assessed using an LDA-based workflow (LDA ≥ 0.01) and nominal
p ≤ 0.05. Feature-level
p-values were adjusted for multiple comparisons using the Benjamini–Hochberg procedure, and results are reported as false discovery rates (FDR, q-values). Taxa were considered statistically significant at
q ≤ 0.05; taxa with
q > 0.05 are presented as exploratory trends.
Figure 2 and
Table S1 report
q-values (BH–FDR) alongside LDA effect sizes.
4.3.3. Assembly and MAGs
Filtered reads were assembled using the MEGAHIT assembler [
42] in conjunction with MetaSPAdes v4.0 [
43]. Gene prediction was performed using Prodigal (Hyatt et al., 2010 [
44]), while viral contigs were discerned using geNomad [
45]. Contigs with a length of less than 1000 bp were eliminated from consideration. The completion of binning methods involved the use of MetaBat2 [
46] and MaxBin2 [
47] collaboratively. Read mapping to contigs was performed using Bowtie2 [
48], while coverage calculations were conducted using SAMtools v.21 [
49]. MAGs were assessed using CheckM v1.2.3 [
50] (with completeness thresholds set at ≥80% and contamination levels at <5%) and BUSCO v1.0.0 [
51]. High-quality MAGs were selected for further analysis. Taxonomic classification was performed using GTDB-Tk (GTDB R124) [
52], and functional annotation was performed using PROKKA v1.14.5 [
53].
To assess intra-species genomic redundancy and strain-level diversity, MAGs meeting quality criteria (≥80% completeness and <5% contamination) were dereplicated using dRep (v3.4.2) [
54]. Primary clustering was performed using MASH-based genome distance estimation, followed by secondary clustering at a 99% average nucleotide identity (ANI) threshold. Secondary cluster counts were used to quantify strain-level genomic heterogeneity within dominant taxa.
4.3.4. Functional Profiling of Metabolic Pathways
HUMAnN 3.0 [
55] was employed for microbiome functional profiling, which encompassed mapping to UniRef90, consolidation into KEGG Orthologues (KOs), execution of KEGG pathway analysis, and normalization via counts per million (CPM). HUMAnN was run with default parameters (v3.9), including MetaPhlAn-based taxonomic prescreening, nucleotide-level alignment against a ChocoPhlAn pangenome database constructed from prescreened taxa (Bowtie2 v 2.5.4) [
48], and translated alignment of unmapped reads using DIAMOND (v2.1.9) [
56] against the UniRef90 database (release 201901b).
The processed information was analysed using a custom-built pipeline developed in Python (version 3.12; Python Software Foundatio, Wilmington, DE, USA) and R (version 4.3.2; R Core Team, Vienna, Austria, 2023), which also utilized the pathview package v1.47.1 [
57] for the generation of heatmaps and functional categorization, including stratified results. Absolute abundances (CPM) were visualized to reflect the functional potential of the microbiome, while relative contributions (%) were calculated within KEGG category 09100 (Metabolism) to facilitate comparative interpretation across ripening stages.
4.3.5. Resistance, Virulence and Phages
Antibiotic resistance genes in MAGs were identified using AMRFinder v3.12.8 [
58] and RGI (CARD) v6.0.3 [
59].
Viral contigs were detected with geNomad v1.8.0 (default parameters) [
45], and their quality and completeness were assessed with CheckV [
60]. Only high-quality viral sequences with CheckV completeness > 90% were retained for downstream analyses.
Putative host assignment was performed using PHIST v1.2.1 [
60], which predicts virus–host associations base on shared k-mers between viral contigs and candidate host genomes. Viral sequences were queried against bacterial MAG genomes reconstructed in this study with completeness > 80%. PHIST was executed with default parameters (k = 25), and for each viral contig the top = scoring host, number of shared k-mers, and associated
p-value (including multiple-testing correction_ were recorded).