Next Article in Journal
Round-the-Clock Intelligent Monitoring Technology with UAV to Improve the Efficiency and Quality of Monitoring the Population of Phoca largha
Previous Article in Journal
Distribution and Driving Environmental Factors of Three Tilapia Species in the Inland Waters of Guangxi, China
Previous Article in Special Issue
Species Diversity of Calocybe (Agaricales, Lyophyllaceae) from Shanxi Province of Northern China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mapping Global Biodiversity and Habitat Distribution of Lactobacillaceae Using NCBI Sequence Metadata

by
Tatiana S. Sokolova
1,
Zorigto B. Namsaraev
1,2,
Ekaterina R. Wolf
1,
Mikhail A. Kulyashov
1,
Ilya R. Akberdin
1,* and
Aleksey E. Sazonov
1
1
Scientific Center of Genetics and Life Sciences, Sirius University of Science and Technology, Sirius 354340, Russia
2
National Research Centre “Kurchatov Institute”, Moscow 123182, Russia
*
Author to whom correspondence should be addressed.
Diversity 2025, 17(11), 776; https://doi.org/10.3390/d17110776
Submission received: 4 October 2025 / Revised: 31 October 2025 / Accepted: 3 November 2025 / Published: 4 November 2025

Abstract

The Lactobacillaceae family encompasses microorganisms of exceptional ecological and biotechnological importance, serving as central agents in food fermentations, health applications, and nutrient cycling across diverse environments. Despite their broad functional and phylogenetic diversity, the global distribution and ecological specialization of Lactobacillaceae are not yet fully understood. In this study, we performed a comprehensive analysis of over 2 million records from the NCBI database to survey and trace the ecological landscape of Lactobacillaceae across thousands of distinct habitats. Our results reveal that food products and animal hosts represent the primary ecological niches for members of this family. The examined taxa exhibit a broad spectrum of ecological strategies, ranging from generalists with wide environmental adaptability to specialists with strict niche preferences. Notably, our findings highlight a profound geographical and ecological sampling bias, with unclassified taxids frequent in animal gastrointestinal tracts, soils, and especially in living plant tissues—habitats identified as promising frontiers for discovering novel biodiversity. The obtained results emphasize the urgent need for expanded sampling efforts in underexplored geographic regions such as Africa, Antarctica, the Arctic, South America, and Central Asia to capture a more complete picture of Lactobacillaceae diversity. The study underscores the necessity of implementing standardized, metadata-rich data deposition practices to enable unbiased, large-scale ecological and evolutionary analyses. Ultimately, these insights not only deepen our fundamental knowledge of Lactobacillaceae diversity but also provide a strategic framework for future bioprospecting, fostering the discovery of novel strains and expanding the biotechnological potential of this influential bacterial family.

1. Introduction

The family Lactobacillaceae comprises a group of microorganisms with profound industrial and ecological significance. As facultative anaerobes, they excel at fermenting organic compounds, a metabolic capability that enables them to thrive in diverse oxygen-limited habitats and assumes a critical role in nutrient cycling [1,2]. These bacteria are cornerstones of the food industry, serving as essential starter cultures for a vast array of fermented products. For instance, they are indispensable in dairy manufacturing for producing yogurt, cheese, and kefir, and are central to the creation of sourdough bread, fermented vegetables like sauerkraut and kimchi, and cured sausages, where they contribute not only to preservation through acidification but also to the development of unique flavors and textures [3,4]. Beyond food, their applications are extensive. In biotechnology, they are leveraged for the synthesis of organic acids, exopolysaccharides, and other bioactive compounds [5], while they serve as agents for the biodetoxification of mycotoxins in animal feeds in agriculture [6] and are increasingly utilized as plant biostimulants in sustainable farming to improve crop yield and soil health, reducing the reliance on chemical fertilizers [7]. Furthermore, their role in health is increasingly recognized, with specific strains employed as probiotics to modulate the gut microbiome [8] and as psychobiotics to positively influence the gut–brain axis and mental well-being [9,10].
Despite their well-documented importance and remarkable functional diversity, a comprehensive, data-driven analysis of their global ecological distribution has been lacking. This knowledge gap presents a significant barrier, as it limits our ability to understand the full spectrum of their diversity and to strategically target novel environments for bioprospecting. Without a clear map of their habitat preferences, the search for new strains with unique, technologically valuable properties remains largely unsystematic. Metadata from public sequence databases offer a powerful, yet underutilized, tool for analyzing the distribution of both cultivated and uncultivated taxa directly from environmental samples.
Recent large-scale taxonomic revisions, driven by whole-genome analysis, have dramatically reshaped the Lactobacillaceae family phylogeny, establishing numerous new genera and clarifying evolutionary relationships [11,12,13]. The high number of taxa holding an Unclassified status further indicates that a vast, yet-to-be-cultivated diversity exists within this group. While members of this family are well-documented inhabitants of fermented foods and animal microbiomes, recent studies have reported their presence in non-traditional habitats, such as soil and aquatic sediments [14]. This raises fundamental questions about the ecological boundaries of Lactobacillaceae: What are their primary habitats, and which environments are most promising for the discovery of novel taxa?
In this study, we aim to bridge this critical knowledge gap. By analyzing over 2 million records from the NCBI database, we systematically map the presence of Lactobacillaceae across thousands of distinct habitat types. The significance of this work is twofold: first, it provides a foundational understanding of the family’s ecological landscape, and second, it serves as a practical guide for future applications. The results of this investigation not only expand our fundamental knowledge but also identify promising and underexplored frontiers—such as living plant tissues and the gut of understudied animals—for discovering new species and genera. This targeted approach has the potential to accelerate the development of next-generation probiotics, novel food cultures, and other biotechnological solutions with significant industrial and health benefits.

2. Materials and Methods

2.1. Data Acquisition and Primary Database Construction

The entire methodological workflow, from data acquisition and database construction to final analysis and visualization, is schematically outlined in Figure 1. The data acquisition and processing pipeline was developed in Python 3.12. The core of the data retrieval process was handled by the Biopython library’s Entrez module [15], which provides an interface to the NCBI E-utilities API. Taxonomic hierarchy and name resolution were managed using the ETE3 toolkit [16], using NCBITaxa database. Data structuring and manipulation were performed with the pandas library. The process was initiated by specifying a target taxonomic family (e.g., Lactobacillaceae) via a command-line argument. The script first translated the family name into its corresponding NCBI Taxonomy ID (taxid) using ete3.NCBITaxa. Subsequently, the get_descendant_taxa function was used to recursively retrieve a comprehensive list of all associated taxids within that family, including all intermediate nodes, genera, and species. For each descendant taxid, the script performed a Bio.Entrez.esearch query against the NCBI Nucleotide database using the search term txid[Organism]. This query returned a complete list of all nucleotide accession IDs associated with that specific taxon. This entire workflow from identifying descendant taxa to fetching their records was parallelized with a ThreadPoolExecutor managing a queue of taxa to be processed. Each downloaded record was parsed using a custom function built on regular expressions. This function was specifically designed to extract key metadata fields from the unstructured text of the record. The targeted fields included the primary accession number (ACCESSION), the authors of the associated publication (AUTHORS), the original source material from which the sample was isolated (isolation_source), the geographical location (geo_loc_name), and the collection date (collection_date). Additionally, the script retrieved the type of molecule sequenced, such as ‘genomic DNA’ (mol_type), and the definition or title of the sequence record (DEFINITION). If a specific field was not present in a record, it was stored as a null value to maintain a consistent data structure.
The extracted metadata for every record was structured into a Pandas DataFrame. This process was repeated for each taxid. The final output of the pipeline was a collection of individual tab-separated value files (TSV). Each file corresponded to a single taxid and was named using the convention [Taxonomy ID]_[Taxon Name].tsv (e.g., 1579_Lactobacillus_acidophilus.tsv), with any special characters in the taxon name sanitized to prevent errors. This collection of TSV files constitutes the primary, non-normalized database for subsequent analysis.

2.2. Database Curation and Verification

The curation and verification of the database focused on standardizing, cleaning, and deduplicating metadata to ensure high-quality and reliable outputs for downstream analysis. The process utilized programmatic methods to address inconsistencies in the raw data and eliminate redundancies while maintaining analytical integrity. The curation workflow applied a series of transformations to the dataset to correct formatting issues, remove irrelevant information, and consolidate variations across key metadata fields. Fields such as “Isolation Source,” “Location,” “Date,” “Authors,” and “Definition” underwent a careful text standardization and cleaning process to improve consistency and ensure compatibility with subsequent analytical processes. This included applying uniform case-folding, removing extraneous whitespace and technical characters, and correcting minor typographical errors. Deduplication was performed to identify and eliminate records with overlapping information, based on specific metadata fields, including taxonomy identifiers, isolation sources, and geographical origins. We designate a unique record after the deduplication procedure as a sample for follow-up analysis. This step minimized redundancy and enhanced the precision of analyses by ensuring only unique data entries were retained in the final dataset

2.3. Habitat Classification and Dictionary Development

In order to translate the unstructured Isolation Source metadata into a robust and analyzable dataset, a hierarchical classification system and a controlled vocabulary were developed. All unique Isolation Source strings extracted from the NCBI Nucleotide database were manually reviewed, during which similar and redundant descriptions were grouped, and obvious synonyms were consolidated. Based on this initial review, a hierarchical dictionary was constructed, organizing habitats into logical nested categories to serve as the controlled vocabulary for the study. The primary, top-level categories were defined based on the fundamental origin of the isolation source. These categories include: Product Group, encompassing fermented or raw products, processed food items, beverages, and other manufactured goods; Animal Group, which covers animal tissues (such as the gastrointestinal tract, lungs, skin, etc.), live animal samples, and biological specimens (such as feces, urine, blood, and breast milk, etc.) collected directly from animals; Environment Group, including samples from natural or artificial environments (e.g., soil, water, air, industrial surfaces, agricultural fields); and Plant Group, comprising sources related to plant materials (e.g., leaves, roots, fruits, seeds). This hierarchical framework ensures a standardized and reproducible approach to classifying microbial sources for ecological and microbiological studies.

2.4. Data Analysis and Visualization

The workflow leverages Python libraries such as pandas for data cleaning, merging, and processing; plotly [17] for generating interactive visualizations; and geopy [18] for geographical geocoding and mapping. Visualization techniques include interactive bar charts, hierarchical sunburst charts for taxonomic distribution at different classification levels, taxonomic heatmaps for analyzing taxonomic distributions across groups, and geographical scatter maps for mapping microbial sample origins. All visualizations are exported as HTML for interactive exploration and reproducibility, ensuring scalable, reproducible ecological and taxonomical research.

3. Results

3.1. General Habitat Distribution of Lactobacillaceae

The members of the Lactobacillaceae family exhibit remarkable ecological diversity and adaptability, colonizing a wide spectrum of habitats classified into four primary categories: Product, Animal, Environment, and Plant. The proportional distribution of Lactobacillaceae representatives across these major habitat categories is visually summarized in the pie chart (Figure 2), while the specific distribution of individual genera is detailed in the bar chart (Figure 3).
The predominant category is Product-associated habitats, which accounts for 56.4% of all isolates. This highlights the family’s profound significance in food microbiology and industrial processes [12]. This group includes a vast number of fermented foods and beverages, such as dairy products (cheese, yogurt, etc.), fermented meats, sourdough, and plant-based fermentations (sauerkraut, pickles, silage, etc.). As shown in Figure 3, genera like Lactobacillus, Pediococcus, and Leuconostoc are particularly prevalent in these niches. Their dominance could be attributed to their robust metabolic capabilities, including rapid acidification through lactic acid production, which effectively preserves food and contributes to the development of unique flavors and textures. The Animal-associated categories represent 28.3% of the total distribution. The Animal-associated group comprises species that are integral members of the microbiota of various hosts, including mammals, birds, and insects. They are commonly found in the gastrointestinal tract, oral cavity, and urogenital system, where they often establish symbiotic relationships. Genera such as Limosilactobacillus and Ligilactobacillus, as seen in Figure 3, demonstrate a strong association with animal hosts, reflecting their evolutionary adaptation to these anaerobic, nutrient-rich environments where they contribute to host health by aiding digestion and competitively excluding pathogens [19]. The Environmental category includes isolates from non-host-associated niches like soil, water, and decaying plant matter. The significant presence of Lactobacillaceae in these habitats underscores their role in broader ecological processes, such as nutrient cycling and the decomposition of organic materials. Genera like Weissella and Fructobacillus show a notable prevalence in these habitats. Finally, the Plant-associated category, though the smallest at 3.7%, is a critical reservoir for species involved in plant-based fermentations. These bacteria can be found as epiphytes on the surfaces of leaves, fruits, and roots. Representatives of the genera such as Leuconostoc and Fructobacillus are frequently isolated from plant material, serving as the initial inoculum for many natural food fermentations.

3.2. Distribution of Top Isolation Sources

The analysis of the top 20 isolation sources reveals a clear and significant prevalence of samples originating from mammalian-associated environments and fermented food products (Figure 4). The most prominent source, with a count of 1844 samples, is feces, which stands out as the primary reservoir by a considerable margin. This finding underscores the importance of the gastrointestinal tract as a major habitat for these microorganisms. Following the leading source, there is a cluster of high-frequency sources dominated by fermented foods and the gut microbiome [20]. Kimchi and gut are nearly equal in significance, yielding 739 and 719 samples, respectively, highlighting a strong connection between dietary fermented products and the gut environment. It is methodologically important to clarify that the distinction observed in this section between the ‘feces’ and ‘gut’ categories is a direct result of our keyword-driven classification of the raw, unstructured “Isolation Source” metadata. For this analysis, the ‘Feces’ category includes entries where the term was explicitly used (e.g., “feces,” “fecal sample”). In contrast, the ‘Gut’ category (which also includes “gut microbiome”) captures entries that used more general anatomical or environmental terms (e.g., “gut,” “intestine”) but lacked the specific “feces” keyword. While these categories are closely linked ecologically (as feces are the standard proxy for studying the gut microbiome), their separation in our results directly reflects the different terminology choices made by the original data submitters in the NCBI database. The list continues with a diverse array of fermented foods, including sourdough (613 samples), cheese (577 samples), milk (468 samples), and pickles (427 samples), demonstrating that fermentation processes across different types of food substrates provide rich environments for these samples. Further down the list, other human and animal-associated sources reinforce this trend. Breast milk is a notable source with 393 samples, followed by urine (265 samples) and silage (264 samples), the latter being fermented animal feed. The list also includes various parts of the digestive and urogenital tracts, such as the vagina (178 samples), intestine (167 samples), and stomach (166 samples), illustrating the systemic presence of these microbes within a host. At the lower end of the top 20 list are additional food sources like fermented food in general (262 samples), fermented milk (216 samples), curd (203 samples), wine (191 samples), and traditional Chinese fermented vegetables (163 samples). In summary, the data clearly delineates two principal ecological hotspots for strain isolation: the mammalian body, particularly the gastrointestinal system, and a wide spectrum of fermented foods. The sheer number of samples from fecal samples points to the gut as the major habitat for this group, while the extensive representation of fermented products highlights their crucial role as a source for these microorganisms.

3.3. Distribution of Genera and Species Across Specific Habitats

3.3.1. Genus-Level Distribution and Habitat Preferences

The taxonomic distribution of Lactobacillaceae genera, as shown in the heatmap (Figure 5), emphasizes distinct habitat preferences rather than a uniform presence across different ecological groups. Genera with the highest prevalence, such as Lactiplantibacillus, demonstrate an essential abundance in food-related habitats (“Product”), with significantly fewer representatives in animal-associated and environmental niches. Notably, Lactiplantibacillus plantarum dominates food environments due to its adaptability to diverse substrates and fermentation processes [21].
Importantly, an uneven sample distribution must be considered. The dataset includes a disproportionately large number of isolates from food sources, potentially amplifying the observed dominance of food-related genera. Despite this, the absence of genera uniformly distributed across all ecological contexts reinforces the notion that Lactobacillaceae primarily adapt to specific niches. Similar ecological patterns are observed in genera like Pediococcus, which are mostly linked to fermented plant materials (group “Product”) and silage [22], and Leuconostoc, most frequently found in dairy products and vegetables [23]. These findings highlight their specialized roles in distinct types of habitats. This analysis reveals that while food environments serve as a major reservoir for these genera, the data reflects habitat-specific evolutionary adaptations rather than general ecological ubiquity.

3.3.2. Specialization Within Food-Associated Habitats

A detailed species-level analysis of the dataset reveals distinct patterns of association between habitat and species distribution within food environments. The genus Lactobacillus is the most prominent, with isolates sourced from a wide array of substrates (Figure 6). However, a closer look at the species level demonstrates significant specialization (Supplementary Table S1). The Lactobacillus casei group, encompassing L. casei, L. paracasei, and L. rhamnosus, exhibits a strong preference for dairy environments. The overwhelming majority of these isolates originate from cheese, yogurt, and fermented milks, highlighting their specialized metabolic pathways tailored to milk proteins and lactose, which are fundamental to the ripening and flavor development of these products [24]. Similarly, species like Lactobacillus delbrueckii and Lactobacillus helveticus are almost exclusively found in dairy, where they serve as key starter cultures for yogurt and Swiss-type cheeses, respectively [25,26]. The genus Leuconostoc also displays clear habitat preferences, primarily occupying dairy and meat substrates. Leuconostoc mesenteroides is frequently isolated from both fermented dairy, where it contributes to flavor through the production of diacetyl, and from the initial stages of vegetable fermentation, where it helps create the acidic conditions necessary for preservation [27]. Leuconostoc citreum and Leuconostoc pseudomesenteroides also show a prominent presence in dairy and meat, indicating their importance in developing the sensory characteristics of products like buttermilk, cheese, and cured sausages. The genus Weissella is characterized by its broad ecological distribution, particularly across plant-based substrates and meats. Species such as Weissella cibaria and Weissella confusa are frequently identified in sourdough starters, fermented vegetables like kimchi, and fruits. Their ability to thrive in these environments is linked to their capacity to metabolize a diverse range of plant-derived carbohydrates [28]. However, they are also commonly found in vacuum-packaged meat products, where their presence can be associated with both fermentation and spoilage, depending on the context. This species-level breakdown confirms that while broad generic classifications are useful, a specific understanding of each species’ preferred habitat is crucial for predicting its functional role in food fermentation, preservation, and quality.

3.3.3. Distribution Within Animal-Associated Habitats and Host Specificity

In the animal-associated habitats the data demonstrates that the gastrointestinal tract and feces serve as primary habitat for the majority of bacterial genera. These environments provide rich, nutrient-dense conditions that support the growth of commensal and symbiotic species. The data highlights that for genera such as Lactobacillus, Limosilactobacillus, Ligilactobacillus and Lacticaseibacillus, these habitats dominate their distributions, as illustrated by the red (Feces) and green (Gastrointestinal tract) segments in Figure 7. These bacteria are crucial components of the gut microbiome across mammals, insects, and birds, performing functions ranging from carbohydrate fermentation to pathogen inhibition [29]. Species-level analysis confirms these patterns (Supplementary Table S2). Lactobacillus acidophilus, Lactobacillus gasseri, and Lactobacillus crispatus are consistently isolated from human gastrointestinal and genitourinary tracts, where their probiotic benefits include regulating flora composition and supporting metabolic processes. A prominent example of a habitat specialist is Lactobacillus iners, a dominant and critical species of the human vaginal microbiome. Our analysis confirms that records for this species are almost exclusively associated with this specific niche, reinforcing its well-established role as a core member of the vaginal flora [30]. Similarly, Lactobacillus delbrueckii is frequently observed in the gut of poultry, emphasizing its role within avian hosts. Insects harbor distinctive specialists, such as Apilactobacillus kunkeei in honeybees and Fructobacillus fructosus in fruit flies. These bacteria are integral to digestive functions in their respective hosts, particularly in metabolizing nectar and fructose [31].
The distribution by host types further reveals a substantial proportion of “Unspecified” sources. These entries reflect cases where host metadata was either incomplete or absent in repositories such as NCBI’s Nucleotide database. Many of these isolates are likely of human origin, given the predominance of human-focused microbiome research. However, in the absence of explicit confirmation, they are conservatively labeled as “Unspecified.” This underscores the need for more precise metadata reporting to improve ecological contextualization. Despite these limitations, the data provides critical insights into host-microbe interactions. For instance, Apilactobacillus and Bombilactobacillus are tightly coupled with insect hosts, underlying specialized co-evolutionary dynamics within this habitat (Figure 8). Conversely, species like Lactobacillus acidophilus and Lactobacillus rhamnosus demonstrate broader adaptability, thriving in mammals and humans as probiotics, reflective of their commensal nature. Thus, the dataset not only delineates habitat and host specificity but also highlights evolutionary adaptations that define the ecological success of these bacterial species in animal-associated environments.

3.3.4. Ecological Associations in Soil, Silage, Water, and Organic Wastes

The analysis of bacterial taxids from environmental sources highlights diverse ecological associations across a range of habitats, including soil, silage, water, and organic wastes. The data reveals that certain bacterial genera exhibit distinct patterns of distribution, reflecting unique adaptations to their respective environments. The soil environment, particularly rhizospheric and polluted soils, proves to be a rich reservoir for genera like Weissella and Lentilactobacillus (Figure 9). These bacteria likely thrive here due to their ability to metabolize organic matter and withstand fluctuating conditions [32]. Rhizospheric soils, in particular, support microbial strains involved in plant-microbe interactions, where their presence contributes to nutrient cycling and soil fertility. In silage, Lactobacillus buchneri is a dominant player (Supplementary Table S3), crucial for preserving the fermentative quality of livestock feed [33]. It ensures long-term stability by producing lactic and acetic acid, inhibiting spoilage organisms such as yeasts and molds. This habitat showcases the targeted role of bacterial strains in agricultural applications, particularly those that benefit livestock nutrition. Aquatic environments, including wastewater, appear to be a more transient habitat rather than a primary niche for most of these genera. However, strains of Weissella, Leuconostoc, and Lactobacillus are found in these settings, likely representing populations washed in from surrounding soils, agricultural runoff, and organic wastes. Their presence in these habitats speaks to their resilience, but they are probably not specialists of aquatic ecosystems in the way other bacteria are. Organic wastes, including food and farming residues, provide a nutrient-rich habitat for a broad range of genera, including Lactiplantibacillus and Leuconostoc. These genera play a role in the decomposition of organic matter, breaking down polysaccharides, and contributing to waste management systems through natural degradation and fermentation processes. Figure 9 and Figure 10 further illustrate the ecological versatility of many bacterial strains. For instance, Lactobacillus demonstrates significant distribution across not only soil and waste environments but also biofilms and polluted ecosystems, indicating its resilience and metabolic adaptability. Genera such as Ligilactobacillus and Pediococcus also exhibit strong ties to biofilm formation in environmental niches, supporting their survival in challenging conditions like surface-exposed or chemically contaminated sites [34]. In conclusion, bacterial strains in environmental habitats demonstrate diverse functional roles, from supporting agricultural ecosystems to participating in waste degradation. Their distribution reflects adaptations to specific substrates and emphasizes their ecological significance in both natural and anthropogenic ecosystems.

3.3.5. Plant-Associated Habitats

The plant group of habitats presents a vast reservoir for Lactobacillaceae representatives, although it ranks as a less common source of isolates compared to animal hosts or food substrates. Within this category, plants represent a particularly under-sampled niche despite their significant ecological relevance (Figure 11). Many isolates associated with plants either inhabit their surfaces as epiphytes—colonizing leaves, stems, fruits, and seeds—or participate in the decomposition and recycling of plant biomass. These species make essential contributions to both natural ecosystems and fermentation-based food systems. For instance, Lactiplantibacillus plantarum dominates multiple plant surfaces, as evidenced by significant percentages in “Shoot” (28%) and “Seeds” (13%). This genus plays indispensable roles in fermenting foods like kimchi and sauerkraut, aided by its capacity to degrade complex plant polysaccharides and rapidly acidify environments for preservation. Genera like Leuconostoc (e.g., Leuconostoc mesenteroides) excel in fresh vegetables and fruits, contributing to early aromatic and flavor profiles during fermentation. This genus is highly prominent in “Flower,” “Root,” and “Shoot” niches, demonstrating its ecological versatility. Meanwhile, the rhizosphere and soil habitats are enriched by genera such as Limosilactobacillus and Lactiplantibacillus. Their presence in “Roots” and “Leaves and fruit” supports nutrient cycling and plant health. For example, Limosilactobacillus fermentum shows (Supplementary Table S4) high adaptability across “Seeds,” “Flowers,” and “Shoots,” promoting nutrient breakdown [19]. High-sugar environments like fruits and nectar are reservoirs for Fructobacillus, which thrives on their carbohydrate-rich substrates. This genus exhibits notable richness in “Fruit” and “Shoot” adaptations, pivotal for nutrient recycling in natural and human food systems. In addition to ecological benefits, bacteria such as Pediococcus and Weissella exhibit dominance in “Flower” niches, facilitating both biomass recycling and fermentation processes. Pediococcus pentosaceus, for example, significantly occupies “Shoots” and “Seeds,” emphasizing its biotechnological relevance. Thus, plant-associated bacterial taxa act as a crucial link between natural ecosystems and agricultural applications, representing a vast, untapped resource for fermentation and biotechnology. Their widespread adaptability across seeds, roots, fruits, and other plant tissues illustrates their broad ecological niche.

3.3.6. Synthesis: A Spectrum from Habitat Generalists to Extreme Specialists

The comprehensive analysis of Lactobacillaceae distribution reveals a clear pattern of habitat specialization, where distinct genera and species have adapted to thrive in specific habitats rather than being universally distributed. While food production environments, particularly dairy and fermented plant matter, represent the most significant reservoir of isolates in the current dataset, this is partially influenced by a sampling bias towards human-centric applications. Nevertheless, even within this context, a spectrum of specialization is evident. Animal-associated habitats, primarily the gastrointestinal tract, host a different but equally specialized cohort of species essential for host health, ranging from mammalian probiotics to insect symbionts. Environmental and plant-based niches, though less sampled, harbor unique species adapted for roles in nutrient cycling, decomposition, and agricultural processes like silage fermentation. The data collectively underscores that the evolutionary trajectory of Lactobacillaceae is driven by adaptation to specific nutritional and environmental conditions, leading to a rich diversity of functional roles across food, health, and environmental ecosystems. A deeper understanding of these habitat-specific adaptations is crucial for harnessing their full biotechnological potential.
This ecological gradient is clearly demonstrated at the species level, ranging from highly specialized organisms to versatile generalists. Lactiplantibacillus plantarum exemplifies the ultimate habitat generalist, with remarkable adaptability across dairy, cereals, meats, fermented plants, and natural environments. Similarly, Weissella species and Leuconostoc mesenteroides show broad ecological ranges, thriving in diverse plant-based fermentations, meats, and dairy products. In stark contrast, many species exhibit narrow and highly restricted habitats. Lactobacillus delbrueckii and Lactobacillus helveticus function as extreme specialists, almost exclusively confined to dairy environments where they are essential starter cultures. The Lactobacillus casei group also shows a strong preference for dairy, highlighting metabolic pathways tailored to milk substrates. Further specialization is seen in species with tight co-evolutionary relationships with specific hosts, such as Apilactobacillus kunkeei in honeybees and Fructobacillus fructosus in fruit flies. In agricultural contexts, Lactobacillus buchneri acts as a targeted specialist, primarily found in silage where its metabolic activity is critical for feed preservation, underscoring the functional diversity shaped by habitat adaptation.

3.4. Geographical Trends and Sampling Biases in Lactobacillaceae Research

The geographical distribution and intensity of Lactobacillaceae isolation sources showcase substantial regional variations and sampling biases (Figure 12). The analysis of the dataset, along with its visualization, reveals that the majority of isolates originate from a few well-studied regions, while vast parts of the globe remain underrepresented. East Asia emerges as the most intensively sampled region, with South Korea and China leading the contributions, particularly from fermented food sources such as kimchi and traditional Chinese fermented vegetables. These sources are closely tied to the region’s rich cuisine traditions and research focus on food microbiology. Europe follows closely with a high concentration of isolates derived from dairy products, such as cheese, yogurt, and milk, largely reflecting the continent’s agricultural and food production practices. Germany and Italy stand out as prominent contributors to these dairy isolates. North America also demonstrates significant research activity of the Lactobacillaceae, with a strong emphasis on human-associated sources such as feces and gut, underscoring the region’s focus on microbiome research and health-related studies. However, major gaps in geographical coverage are evident. Africa, South America, Central and Northern Asia, Antarctica, and Oceania are significantly underrepresented. While there are scattered sampling points in countries like Brazil, Australia, and South Africa, these data points are sparse compared to the densely sampled regions in East Asia, Europe, and North America. Additionally, the Middle East and large portions of Russia show minimal sampling, further emphasizing the uneven global research focus. The intensity map further reveals disparities in taxonomic richness among sampled regions, with East Asia having the most diverse and concentrated datasets, followed by specific hotspots in Europe and North America. Locations with high taxon counts per source highlight regions of advanced and detailed sampling, such as South Korea for kimchi and fermented vegetables, and Germany for cheese and sourdough. Overall, in our opinion, the dataset mirrors the global research effort more than the true ecological distribution of Lactobacillaceae. The obvious bias towards industrialized regions and human-centric applications underscores the need to broaden sampling efforts to neglected regions, which may harbor valuable untapped microbial diversity. A more equitable global sampling strategy would significantly enhance our understanding of Lactobacillaceae and their ecological roles.

3.5. Analysis of Unclassified and Environmental Samples Taxids

One of the most important groups analyzed during our study are records labeled as “unclassified” and those derived directly from “environmental samples”. These categories, often overlapping, represent a significant portion of the dataset and highlight ecosystems rich in novel, yet-to-be-characterized bacterial diversity. The data clearly shows that unclassified taxa are most frequently identified in complex animal-associated and environmental habitats, a trend visually summarized in Figure 13. Within animal hosts, the gastrointestinal tract and feces stand out as the primary reservoirs for these unidentified taxids. The intricate and highly competitive nature of the gut microbiome means that many resident bacteria have not yet been isolated in pure culture or fully characterized taxonomically. Their detection is often only possible through culture-independent methods like metagenomic sequencing, which explains why so many species remain “unclassified”. Similarly, environmental samples, especially from soil, water, and various plant-associated niches (rhizosphere, shoots), are significant hotspots for unclassified taxa. The term “environmental samples” in this context often refers to data obtained directly from a source without first cultivating the organisms in a lab. This approach provides a direct snapshot of the microbial community in its natural setting. The prevalence of unclassified taxa in these samples indicates that these ecosystems harbor a vast, untapped reservoir of novel microorganisms. These bacteria likely possess unique metabolic capabilities adapted to their specific environments, with potential applications in agriculture, bioremediation, and biotechnology.
The presence of many unclassified taxa is not random. As detailed in Figure 14, which ranks the top ecological niches by their proportion of uncharacterized isolates, these taxa are predominantly associated with ecosystems that either pose significant challenges for laboratory reproduction (such as the animal gut) or are extremely diverse and complex in nature (such as soil environments). These environmental samples, rich in unclassified sequences, represent the foremost frontiers for microbial exploration. Focused investigation in these areas, using both advanced culturing techniques and deeper high-throughput sequencing, is essential for expanding our taxonomic understanding of the Lactobacillaceae family and for discovering novel species with potentially groundbreaking functional properties.

4. Discussion

4.1. The Necessity of Data Standardization and Comprehensive Metadata for Ecological Analysis

The necessity of data standardization and providing complete metadata in the context of depositing scientific information, such as sequences in repositories like NCBI, is essential to ensure the accuracy, usability, and broader ecological relevance of such datasets. A comprehensive analysis of over 2 million records from the NCBI database for members of the Lactobacillaceae family highlighted two critical barriers to effective scientific research: inconsistent data standards and incomplete metadata. The study revealed that much of the deposited data lacked structured and comprehensive metadata, particularly in categories like “isolation source,” geographic coordinates and isolation date. For example, the “isolation source” category often contained unstructured text, making it difficult to extract meaningful ecological information. Another hurdle is a significant proportion of records, particularly those likely originating from human-associated environments, were labeled merely as “unspecified,” indicating insufficient contextual descriptions. This missing information limits the level of detail at which researchers can analyze and interpret the ecological contexts of these organisms. The consequences of such inconsistencies are profound. The analysis enabled us to assume that geographical distribution data for Lactobacillaceae was heavily skewed toward regions of greater scientific sampling activity (such as North America, Europe, and East Asia) instead of reflecting accurate global diversity patterns. This bias complicates efforts to assess actual habitat diversity and evolutionary adaptation. Furthermore, new discoveries, particularly from under-sampled and complex environments like soil or non-model organisms, can be overshadowed by data quality issues, impeding unbiased conclusions. Standardization offers a proven pathway to resolve these challenges. Implementing globally accepted frameworks for data deposition, such as the use of controlled vocabularies and ontologies for describing habitats, robust taxonomic classifiers, and mandatory fields for comprehensive metadata, is essential. Frameworks such as MIxS (Minimum Information about any (x) Sequence) [35] demonstrate the feasibility and benefits of harmonized data submission and encourage the provision of all relevant information about an organism’s origin, isolation environment, and context. Furthermore, making geographical details and collection methods mandatory as part of sequence metadata deposition would ensure higher-quality data for ecological, biotechnological, and evolutionary studies. In conclusion, while current databases provide invaluable resources for understanding microbial ecology and discovering new taxa, their utility is curtailed by the lack of standardized metadata and data curation. Addressing these issues via collaborative efforts among the scientific community will not only simplify analyses but also enhance resource accessibility and support global research initiatives aimed at uncovering the complexities of microbial life. Standardization and mandatory metadata represent the cornerstone for advancing ecological research and ensuring that databases evolve in line with the growing demands of interdisciplinary studies.

4.2. Correlation Between Taxonomy and Habitat

The analysis of the relationship between taxonomic classification and habitat preferences among members of the Lactobacillaceae family reveals a complex scenario in which ecological specialization becomes more pronounced as one progresses from the genus to the species level. The findings suggest that phylogenetic proximity is not always a reliable predictor of ecological niche, and distinct patterns are observed across different taxonomic levels. At the genus level, most Lactobacillaceae members exhibit characteristics of ecological generalists, demonstrating adaptability to a wide range of environments. A notable example is the genus Lactiplantibacillus, whose species inhabit diverse settings, including plants, food products, and the gastrointestinal tracts of animals. However, there are significant exceptions. Certain genera display a clear tendency toward specialization. For instance, the genus Lacticaseibacillus is closely associated with dairy products, indicating metabolic adaptation to this specific niche. At the species level, the picture changes dramatically, with ecological specialization becoming markedly more pronounced. Many species exhibit strict habitat preferences. Classic examples include Lactobacillus delbrueckii and Lactobacillus helveticus, which are almost exclusively confined to dairy products and are rarely found in other environments. This highlights that even within a single genus, both generalist and highly specialized species can coexist. For example, while Lactiplantibacillus plantarum is generalist, other members of the same genus, such as Lactiplantibacillus pentosus, display narrower preferences, favoring plant-based habitats. In conclusion, the correlation between taxonomy and habitat in Lactobacillaceae is multi-layered. While ecological plasticity and generalism dominate at the genus level, with few exceptions, specialization becomes more rigid at the species level, driven by adaptive evolution and metabolic optimization for specific ecological niches. These findings underscore the importance of considering taxonomic levels when studying adaptation and niche specialization in microbial communities.

4.3. Hotspots for Novel Taxa Discovery

The synthesis of results obtained from the analysis of unclassified taxa and uncultured taxa allows us to identify ecological niches with the highest potential for discovering new species and genera within the family Lactobacillaceae. These categories represent frontiers in microbial research, pointing to ecosystems rich in new, yet-to-be-characterized bacterial diversity. Animal-associated environments, particularly the gastrointestinal tract, are not only most abundant habitat for known isolates of Lactobacillaceae, but major reservoirs for unclassified strains. The complex and highly competitive nature of the gut microbiome means that many of its inhabitants have not yet been isolated in pure culture or fully taxonomically characterized. Their detection is often feasible only through culture-independent methods, such as metagenomic sequencing. The high frequency of unclassified taxa in fecal and intestinal samples emphasizes that a significant portion of microbial diversity in host organisms remains unexplored.
Natural environments represent another significant frontier for discovery. Specifically, soil (including the rhizosphere and contaminated soils), water, and various plant niches are important hotspots for search of the novel taxa. The prevalence of unidentified sequences in these samples indicates that these ecosystems contain vast, untapped reservoirs of new microorganisms. These bacteria likely possess unique metabolic capabilities adapted to their specific conditions and could be applied in agriculture, bioremediation, and biotechnology. Interestingly, plant-associated habitats (e.g., shoots, flowers, roots) remain a minor group in terms of current research focus, despite the fact that plant-based products constitute a broad and industrially significant category. This discrepancy suggests that plants, as habitats, are still underexplored and represent a promising area for upcoming discoveries. While fermented plant-based foods (e.g., kimchi, sauerkraut) have been extensively studied, the natural microbial communities inhabiting living plants remain under-characterized. This gap highlights the potential value of investigating plant surfaces and internal tissues as sources of novel Lactobacillaceae taxa. A question of whether the search for yet-uncultured bacteria should be focused on less explored habitats cannot be definitively answered due to bias in the data. The existing dataset largely reflects global research efforts rather than the real ecological distribution of Lactobacillaceae, with a clear skew toward industrialized regions and anthropocentric applications (e.g., fermented foods and the human microbiome). Nevertheless, the analysis of the prevalence of unclassified groups strongly suggests that targeted exploration of complex, diverse, and under-researched habitats, such as soil, wild plant ecosystems, and microbiomes of understudied animals, is a more promising strategy for discovering new taxa than further investigation of already well-characterized niches. Plants, in particular, emerge as a particularly underexplored frontier, offering significant potential for discovering novel species and genera within the Lactobacillaceae family.

4.4. Ecological Significance and Future Directions

The results of our study underscore the vast ecological diversity of the family Lactobacillaceae and may provide insights into the evolutionary processes that have shaped their distribution across different environments. The ability to thrive in such varied conditions as the nutrient-rich gastrointestinal tracts of animals, fermentable plant substrates, and dairy products is a remarkable feature of the metabolic flexibility of this family. This ecological plasticity has likely contributed to their ubiquity and domestication in food systems throughout human history. The observed species-level specialization, particularly concerning specific hosts (e.g., Apilactobacillus kunkeei in honeybees) or substrates (dairy-associated Lactobacillus species), points to long-standing co-evolutionary relationships. Genomic studies confirm that host adaptation is often associated with genome reduction, where bacteria shed genes essential for survival in other environments in exchange for an optimized existence within the stable, nutrient-rich environment provided by the host [36,37,38]. Understanding the genetic underpinnings of these adaptations is key to unlocking the full biotechnological potential of these microorganisms. Despite the extensive data, our study reveals significant knowledge gaps, requiring an interconnected strategy for future research. A crucial first step is to address the identified sampling bias by expanding geographical and ecological surveys. Targeted collection efforts in under-explored regions like Africa, Antarctica, Arctica, South America, and Central Asia, as well as in atypical habitats such as soil, marine sediments, and exotic plants, are likely to yield the discovery of novel species and genera. Strains from these new sources will become prime candidates for functional genomics and metabolomics, aiming to uncover unique metabolic pathways, enzymes, and bioactive compounds with potential applications in agriculture and biotechnology. As genomic data accumulates, comparative analyses between generalist and specialist species will elucidate the genetic basis of habitat adaptation, including substrate utilization, stress tolerance, and host-microbe interactions. Furthermore, given that a large number of sequences in our dataset remain “unclassified,” a significant portion of Lactobacillaceae diversity is likely yet uncultured. Upcoming research must therefore integrate advanced cultivation techniques with metagenomic analysis to characterize this elusive microbial “majority” and understand their ecological roles. Such a comprehensive approach is essential to fully realize the biotechnological potential of this important bacterial family.

5. Conclusions

This comprehensive analysis of over 2 million NCBI records systematically mapped the ecological landscape of the Lactobacillaceae family, revealing the dominance of habitats such as food products and animal hosts. Our findings confirm a wide range of ecological strategies characteristic for members of the Lactobacillaceae family, ranging from versatile generalists to highly adapted specialists confined to narrow niches, demonstrating that the family’s evolutionary trajectory is probably shaped by niche-specific adaptation. Crucially, the study sheds light on the significant geographical and ecological sampling bias in deposited data. This bias, combined with the high prevalence of unclassified records in complex environments, pinpoints the gastrointestinal tracts of understudied animals, soil, and especially living plant tissues as critical frontiers for discovering novel biodiversity. Ultimately, this data-driven ecological map not only deepens our fundamental understanding of Lactobacillaceae biology but also provides a strategic guide for future bioprospecting, paving the way to harness untapped microbial diversity for next-generation biotechnological solutions.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/d17110776/s1, Supplementary Table S1: Distribution of Lactobacillaceae species isolated from Product group; Supplementary Table S2: Distribution of Lactobacillaceae species from Animal group; Supplementary Table S3: Distribution of Lactobacillaceae species from Environment group; Supplementary Table S4: Distribution of Lactobacillaceae species isolated from different plant parts.

Author Contributions

I.R.A., A.E.S. and Z.B.N. conceptualized and designed the study, establishing the research goals and methodology frameworks. T.S.S. led the data acquisition from the NCBI database, implementing pipeline tools for metadata extraction and curation. T.S.S., E.R.W. and M.A.K. developed the hierarchical habitat classification system and controlled vocabulary, ensuring methodological consistency for habitats mapping. T.S.S., I.R.A. and Z.B.N. performed data analysis and created the taxonomic and ecological visualizations, including categorical and geographical distribution maps. T.S.S., Z.B.N., E.R.W., M.A.K., I.R.A. and A.E.S. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the grant of the state program of the «Sirius» Federal Territory «Scientific and technological development of the «Sirius» Federal Territory» (Agreement No. 18-03 date 10 September 2024).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets generated during the current study are available in the supplementary files. Additional interactive diagrams that expand upon the findings can be accessed at http://t95004fy.beget.tech/index.html (accessed on 4 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NCBINational Center for Biotechnology Information
LABLactic acid bacteria
TSVTab-separated value
MIxSMinimum Information about any (x) Sequence

References

  1. Davray, D.; Deo, D.; Kulkarni, R. Plasmids encode niche-specific traits in Lactobacillaceae. Microb. Genom. 2021, 7, 472. [Google Scholar] [CrossRef]
  2. George, F.; Daniel, C.; Thomas, M.; Singer, E.; Guilbaud, A.; Tessier, F.J.; Revol-Junelles, A.-M.; Borges, F.; Foligné, B. Occurrence and Dynamism of Lactic Acid Bacteria in Distinct Ecological Niches: A Multifaceted Functional Health Perspective. Front. Microbiol. 2018, 9, 2899. [Google Scholar] [CrossRef] [PubMed]
  3. Akpoghelie, P.O.; Edo, G.I.; Ali, A.B.; Yousif, E.; Zainulabdeen, K.; Owheruo, J.O.; Isoje, E.F.; Igbuku, U.A.; Essaghah, A.E.A.; Makia, R.S.; et al. Lactic acid bacteria: Nature, characterization, mode of action, products and applications. Process Biochem. 2025, 152, 1–28. [Google Scholar] [CrossRef]
  4. Anumudu, C.K.; Miri, T.; Onyeaka, H. Multifunctional Applications of Lactic Acid Bacteria: Enhancing Safety, Quality, and Nutritional Value in Foods and Fermented Beverages. Foods 2024, 13, 3714. [Google Scholar] [CrossRef] [PubMed]
  5. Yadav, M.K.; Song, J.H.; Vasquez, R.; Lee, J.S.; Kim, I.H.; Kang, D.-K. Methods for Detection, Extraction, Purification, and Characterization of Exopolysaccharides of Lactic Acid Bacteria—A Systematic Review. Foods 2024, 13, 3687. [Google Scholar] [CrossRef]
  6. Zoghi, A.; Todorov, S.D.; Khosravi-Darani, K. Bio-detoxification of mycotoxin-contaminated feedstuffs: Using lactic acid bacteria and yeast. Appl. Environ. Biotechnol. 2024, 9, 62–75. [Google Scholar] [CrossRef]
  7. Štyriaková, D.; Hajnal-Jafari, T.; Žunić, V.; Šuba, J.; Prekopová, M.; Yetik, A.K.; Štyriaková, L. Influence of biostimulant applications on vegetative growth and yield of strawberry under full and reduced fertilization. DYSONA-Appl. Sci. 2026, 7, 41–49. [Google Scholar] [CrossRef]
  8. Ayed, L.; M’hir, S.; Nuzzolese, D.; Di Cagno, R.; Filannino, P. Harnessing the Health and Techno-Functional Potential of Lactic Acid Bacteria: A Comprehensive Review. Foods 2024, 13, 1538. [Google Scholar] [CrossRef]
  9. Novikova, V.A.; Bondarenko, K.D.; Sazonov, A.E.; Rozanov, A.S. The Effect of Probiotic Lactic Acid Bacteria on the Symptoms of Mental Disorders. Nanobiotechnol. Rep. 2024, 19, 645–666. [Google Scholar] [CrossRef]
  10. Garzone, S.; Charitos, I.A.; Mandorino, M.; Maggiore, M.E.; Capozzi, L.; Cakani, B.; Lopes, G.C.D.; Bocchio-Chiavetto, L.; Colella, M. Can We Modulate Our Second Brain and Its Metabolites to Change Our Mood? A Systematic Review on Efficacy, Mechanisms, and Future Directions of “Psychobiotics”. Int. J. Mol. Sci. 2025, 26, 1972. [Google Scholar] [CrossRef] [PubMed]
  11. Qiao, N.; Wittouck, S.; Mattarelli, P.; Zheng, J.; Lebeer, S.; Felis, G.E.; Gänzle, M.G. After the storm—Perspectives on the taxonomy of Lactobacillaceae. JDS Commun. 2022, 3, 222–227. [Google Scholar] [CrossRef]
  12. Walter, J.; O’Toole, P.W. Microbe Profile: The Lactobacillaceae. Microbiology 2023, 169, 1414. [Google Scholar] [CrossRef] [PubMed]
  13. Zheng, J.; Wittouck, S.; Salvetti, E.; Franz, C.M.A.P.; Harris, H.M.B.; Mattarelli, P.; O’Toole, P.W.; Pot, B.; Vandamme, P.; Walter, J.; et al. A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and Leuconostocaceae. Int. J. Syst. Evol. Microbiol. 2020, 70, 2782–2858. [Google Scholar] [CrossRef] [PubMed]
  14. Yu, A.O.; Leveau, J.H.J.; Marco, M.L. Abundance, Diversity and Plant—Specific Adaptations of Plant—Associated Lactic Acid Bacteria. Environ. Microbiol. Rep. 2020, 12, 16–29. [Google Scholar] [CrossRef]
  15. Cock, P.J.A.; Antao, T.; Chang, J.T.; Chapman, B.A.; Cox, C.J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422–1423. [Google Scholar] [CrossRef]
  16. Huerta-Cepas, J.; Serra, F.; Bork, P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol. Biol. Evol. 2016, 33, 1635–1638. [Google Scholar] [CrossRef]
  17. Data Apps for Production|Plotly. Available online: https://plotly.com/ (accessed on 24 September 2025).
  18. GeoPy’s Documentation. 2025. Available online: https://geopy.readthedocs.io/en/stable/ (accessed on 24 September 2025).
  19. Freire, M.O.d.L.; Neto, J.P.R.C.; Lemos, D.E.d.A.; de Albuquerque, T.M.R.; Garcia, E.F.; de Souza, E.L.; Alves, J.L.d.B. Limosilactobacillus fermentum Strains as Novel Probiotic Candidates to Promote Host Health Benefits and Development of Biotherapeutics: A Comprehensive Review. Probiotics Antimicrob. Proteins 2024, 16, 1483–1498. [Google Scholar] [CrossRef]
  20. Schneider, E.; Balasubramanian, R.; Ferri, A.; Cotter, P.D.; Clarke, G.; Cryan, J.F. Fibre & fermented foods: Differential effects on the microbiota-gut-brain axis. Proc. Nutr. Soc. 2024, 1–16. [Google Scholar] [CrossRef]
  21. Yilmaz, B.; Bangar, S.P.; Echegaray, N.; Suri, S.; Tomasevic, I.; Lorenzo, J.M.; Melekoglu, E.; Rocha, J.M.; Ozogul, F. The Impacts of Lactiplantibacillus plantarum on the Functional Properties of Fermented Foods: A Review of Current Knowledge. Microorganisms 2022, 10, 826. [Google Scholar] [CrossRef] [PubMed]
  22. Jiang, D.; Li, B.; Zheng, M.; Niu, D.; Zuo, S.; Xu, C. Effects of Pediococcus pentosaceus on fermentation, aerobic stability and microbial communities during ensiling and aerobic spoilage of total mixed ration silage containing alfalfa (Medicago sativa L.). Grassl. Sci. 2020, 66, 215–224. [Google Scholar] [CrossRef]
  23. Benkerroum, N.; Misbah, M.; Sandine, W.E.; Elaraki, A.T. Development and use of a selective medium for isolation of Leuconostoc spp. from vegetables and dairy products. Appl. Environ. Microbiol. 1993, 59, 607–609. [Google Scholar] [CrossRef]
  24. Stefanovic, E.; Kilcawley, K.; Rea, M.; Fitzgerald, G.; McAuliffe, O. Genetic, enzymatic and metabolite profiling of the Lactobacilluscasei group reveals strain biodiversity and potential applications for flavour diversification. J. Appl. Microbiol. 2017, 122, 1245–1261. [Google Scholar] [CrossRef]
  25. Moser, A.; Schafroth, K.; Meile, L.; Egger, L.; Badertscher, R.; Irmler, S. Population Dynamics of Lactobacillus helveticus in Swiss Gruyère-Type Cheese Manufactured With Natural Whey Cultures. Front. Microbiol. 2018, 9, 637. [Google Scholar] [CrossRef] [PubMed]
  26. Marasco, R.; Gazzillo, M.; Campolattano, N.; Sacco, M.; Muscariello, L. Isolation and Identification of Lactic Acid Bacteria from Natural Whey Cultures of Buffalo and Cow Milk. Foods 2022, 11, 233. [Google Scholar] [CrossRef] [PubMed]
  27. Özcan, E.; Selvi, S.S.; Nikerel, E.; Teusink, B.; Öner, E.T.; Çakır, T. A genome-scale metabolic network of the aroma bacterium Leuconostoc mesenteroides subsp. cremoris. Appl. Microbiol. Biotechnol. 2019, 103, 3153–3165. [Google Scholar] [CrossRef]
  28. Fusco, V.; Quero, G.M.; Cho, G.S.; Kabisch, J.; Meske, D.; Neve, H.; Bockelmann, W.; Franz, C.M.A.P. The genus Weissella: Taxonomy, ecology and biotechnological potential. Front. Microbiol. 2015, 6, 155. [Google Scholar] [CrossRef]
  29. He, Y.; Zhu, L.; Chen, J.; Tang, X.; Pan, M.; Yuan, W.; Wang, H. Efficacy of Probiotic Compounds in Relieving Constipation and Their Colonization in Gut Microbiota. Molecules 2022, 27, 666. [Google Scholar] [CrossRef]
  30. Zheng, N.; Guo, R.; Wang, J.; Zhou, W.; Ling, Z. Contribution of Lactobacillus iners to Vaginal Health and Diseases: A Systematic Review. Front. Cell. Infect. Microbiol. 2021, 11, 792787. [Google Scholar] [CrossRef]
  31. Vergalito, F.; Testa, B.; Cozzolino, A.; Letizia, F.; Succi, M.; Lombardi, S.J.; Tremonte, P.; Pannella, G.; Di Marco, R.; Sorrentino, E.; et al. Potential Application of Apilactobacillus kunkeei for Human Use: Evaluation of Probiotic and Functional Properties. Foods 2020, 9, 1535. [Google Scholar] [CrossRef] [PubMed]
  32. Chen, Y.-S.; Yanagida, F.; Shinohara, T. Isolation and identification of lactic acid bacteria from soil using an enrichment procedure. Lett. Appl. Microbiol. 2005, 40, 195–200. [Google Scholar] [CrossRef]
  33. Rabelo, C.H.S.; Basso, F.C.; Lara, E.C.; Jorge, L.G.O.; Härter, C.J.; Mesquita, L.G.; Silva, L.F.P.; Reis, R.A. Effects of Lactobacillus buchneri as a silage inoculant and as a probiotic on feed intake, apparent digestibility and ruminal fermentation and microbiology in wethers fed low-dry-matter whole-crop maize silage. Grass Forage Sci. 2017, 73, 67–77. [Google Scholar] [CrossRef]
  34. Chmielewski, R.A.N.; Frank, J.F. Biofilm Formation and Control in Food Processing Facilities. Compr. Rev. Food Sci. Food Saf. 2003, 2, 22–32. [Google Scholar] [CrossRef]
  35. Yilmaz, P.; Kottmann, R.; Field, D.; Knight, R.; Cole, J.R.; Amaral-Zettler, L.; Gilbert, J.A.; Karsch-Mizrachi, I.; Johnston, A.; Cochrane, G.; et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 2011, 29, 415–420. [Google Scholar] [CrossRef] [PubMed]
  36. Sarton-Lohéac, G.; da Silva, C.G.N.; Mazel, F.; Baud, G.; de Bakker, V.; Das, S.; El Chazli, Y.; Ellegaard, K.; Garcia-Garcera, M.; Glover, N.; et al. Deep Divergence and Genomic Diversification of Gut Symbionts of Neotropical Stingless Bees. mBio 2023, 14, e0353822. [Google Scholar] [CrossRef] [PubMed]
  37. Duar, R.M.; Lin, X.B.; Zheng, J.; Martino, M.E.; Grenier, T.; Pérez-Muñoz, M.E.; Leulier, F.; Gänzle, M.; Walter, J. Lifestyles in transition: Evolution and natural history of the genus Lactobacillus. FEMS Microbiol. Rev. 2017, 41, S27–S48. [Google Scholar] [CrossRef] [PubMed]
  38. Sun, Z.; Harris, H.M.B.; McCann, A.; Guo, C.; Argimón, S.; Zhang, W.; Yang, X.; Jeffery, I.B.; Cooney, J.C.; Kagawa, T.F.; et al. Expanding the biotechnology potential of lactobacilli through comparative genomics of 213 strains and associated genera. Nat. Commun. 2015, 6, 8322. [Google Scholar] [CrossRef]
Figure 1. Generalized schematic of the workflow for the large-scale analysis of Lactobacillaceae ecological niches. The pipeline begins with data acquisition from the NCBI Nucleotide database using a custom Python script, followed by database curation to standardize metadata. A hierarchical dictionary was developed to classify unstructured “isolation source” data into four primary ecological groups. The final stage involved data analysis and the generation of interactive visualizations to map the family’s taxonomic and geographical distribution.
Figure 1. Generalized schematic of the workflow for the large-scale analysis of Lactobacillaceae ecological niches. The pipeline begins with data acquisition from the NCBI Nucleotide database using a custom Python script, followed by database curation to standardize metadata. A hierarchical dictionary was developed to classify unstructured “isolation source” data into four primary ecological groups. The final stage involved data analysis and the generation of interactive visualizations to map the family’s taxonomic and geographical distribution.
Diversity 17 00776 g001
Figure 2. Proportional distribution of Lactobacillaceae across four major habitat groups. The analysis of over 2 million NCBI records reveals that Product-associated habitats (e.g., fermented foods) are the most predominant source, accounting for 57% of all isolates, followed by Animal (28%), Environment (11%), and Plant (4%) sources. A small fraction of records (less than 1%), highlighted in yellow, could not be classified to a specific source and are grouped separately. An interactive version of the figure and others below is available at http://t95004fy.beget.tech/comparsion.html (accessed on 4 October 2025).
Figure 2. Proportional distribution of Lactobacillaceae across four major habitat groups. The analysis of over 2 million NCBI records reveals that Product-associated habitats (e.g., fermented foods) are the most predominant source, accounting for 57% of all isolates, followed by Animal (28%), Environment (11%), and Plant (4%) sources. A small fraction of records (less than 1%), highlighted in yellow, could not be classified to a specific source and are grouped separately. An interactive version of the figure and others below is available at http://t95004fy.beget.tech/comparsion.html (accessed on 4 October 2025).
Diversity 17 00776 g002
Figure 3. Distribution of major Lactobacillaceae genera across the four primary habitat groups. The bar chart illustrates the percent of records for each genus within Product, Animal, Environment, and Plant groups. The number in corresponding bars color reflects a number of NCBI records represented in this habitat group for a certain genera.
Figure 3. Distribution of major Lactobacillaceae genera across the four primary habitat groups. The bar chart illustrates the percent of records for each genus within Product, Animal, Environment, and Plant groups. The number in corresponding bars color reflects a number of NCBI records represented in this habitat group for a certain genera.
Diversity 17 00776 g003
Figure 4. The top 20 most frequently recorded isolation sources for Lactobacillaceae. This horizontal bar chart ranks the most common sources (y-axis) by their corresponding number of samples (x-axis) retrieved from the NCBI database. The data clearly shows that the majority of isolates originate from two main categories: mammalian-associated environments (e.g., feces, gut, breast milk) and fermented food products (e.g., kimchi, sourdough, cheese). Feces stand out as the single most significant source, underscoring the gastrointestinal tract as a primary habitat.
Figure 4. The top 20 most frequently recorded isolation sources for Lactobacillaceae. This horizontal bar chart ranks the most common sources (y-axis) by their corresponding number of samples (x-axis) retrieved from the NCBI database. The data clearly shows that the majority of isolates originate from two main categories: mammalian-associated environments (e.g., feces, gut, breast milk) and fermented food products (e.g., kimchi, sourdough, cheese). Feces stand out as the single most significant source, underscoring the gastrointestinal tract as a primary habitat.
Diversity 17 00776 g004
Figure 5. Heatmap showing the relative abundance of Lactobacillaceae genera across different ecological groups. The y-axis lists the major genera, while the x-axis represents the four primary habitat groups. The color intensity in each cell corresponds to the number of individual records for each genus within a specific habitat; darker shades indicate a higher prevalence. The heatmap reveals clear patterns of specialization, showing that most genera, such as Lactiplantibacillus and Pediococcus, are predominantly associated with specific niches (e.g., “Product”) rather than being uniformly distributed.
Figure 5. Heatmap showing the relative abundance of Lactobacillaceae genera across different ecological groups. The y-axis lists the major genera, while the x-axis represents the four primary habitat groups. The color intensity in each cell corresponds to the number of individual records for each genus within a specific habitat; darker shades indicate a higher prevalence. The heatmap reveals clear patterns of specialization, showing that most genera, such as Lactiplantibacillus and Pediococcus, are predominantly associated with specific niches (e.g., “Product”) rather than being uniformly distributed.
Diversity 17 00776 g005
Figure 6. Niche specialization of Lactobacillaceae genera within food-associated habitats. Each horizontal bar represents a single genus, with the total length corresponding to 100% of its isolation sources from the “Product” group. The colored segments within each bar show the percentage breakdown of isolates from specific food subcategories (e.g., dairy, fermented vegetables, meat). The numbers within the segments indicate the count of records for each subcategory. This chart illustrates the genus-level preferences for specific types of food substrates.
Figure 6. Niche specialization of Lactobacillaceae genera within food-associated habitats. Each horizontal bar represents a single genus, with the total length corresponding to 100% of its isolation sources from the “Product” group. The colored segments within each bar show the percentage breakdown of isolates from specific food subcategories (e.g., dairy, fermented vegetables, meat). The numbers within the segments indicate the count of records for each subcategory. This chart illustrates the genus-level preferences for specific types of food substrates.
Diversity 17 00776 g006
Figure 7. Niche specialization of Lactobacillaceae genera within animal-associated habitats. This chart details the distribution of key genera across specific niches within the “Animal” group. Each horizontal bar represents a genus, and the colored segments illustrate the proportion of isolates from distinct sources such as the gastrointestinal tract, feces, oral cavity, and blood. The number within each segment indicates the count of records.
Figure 7. Niche specialization of Lactobacillaceae genera within animal-associated habitats. This chart details the distribution of key genera across specific niches within the “Animal” group. Each horizontal bar represents a genus, and the colored segments illustrate the proportion of isolates from distinct sources such as the gastrointestinal tract, feces, oral cavity, and blood. The number within each segment indicates the count of records.
Diversity 17 00776 g007
Figure 8. Distribution of Lactobacillaceae genera across different animal host types. The chart demonstrates host specificity for key genera isolated from the “Animal” group. Each horizontal bar represents a single genus (normalized to 100%), and the colored segments show the proportion of isolates obtained from different host types, such as mammals, insects, birds, or sources classified as “Unspecified.” The number of records is provided in each segment.
Figure 8. Distribution of Lactobacillaceae genera across different animal host types. The chart demonstrates host specificity for key genera isolated from the “Animal” group. Each horizontal bar represents a single genus (normalized to 100%), and the colored segments show the proportion of isolates obtained from different host types, such as mammals, insects, birds, or sources classified as “Unspecified.” The number of records is provided in each segment.
Diversity 17 00776 g008
Figure 9. Niche specialization of Lactobacillaceae genera within specific environmental habitats. This chart provides a detailed view of the ecological associations of major genera within the “Environment” category. Each horizontal bar represents a genus, with colored segments indicating the proportion of isolates from distinct niches like rhizospheric soil, wastewater, and biofilms. The number of records is shown in each segment.
Figure 9. Niche specialization of Lactobacillaceae genera within specific environmental habitats. This chart provides a detailed view of the ecological associations of major genera within the “Environment” category. Each horizontal bar represents a genus, with colored segments indicating the proportion of isolates from distinct niches like rhizospheric soil, wastewater, and biofilms. The number of records is shown in each segment.
Diversity 17 00776 g009
Figure 10. Distribution of Lactobacillaceae genera across environmental habitats. The chart breaks down the distribution of genera within the “Environment” group. Each horizontal bar represents a genus, and the colored segments show the proportion of isolates from specific sub-habitats, including soil, liquids and organic waste. The number inside each segment corresponds to the count of records.
Figure 10. Distribution of Lactobacillaceae genera across environmental habitats. The chart breaks down the distribution of genera within the “Environment” group. Each horizontal bar represents a genus, and the colored segments show the proportion of isolates from specific sub-habitats, including soil, liquids and organic waste. The number inside each segment corresponds to the count of records.
Diversity 17 00776 g010
Figure 11. Distribution of Lactobacillaceae genera across different plant-associated habitats. This chart illustrates the prevalence of key genera within the “Plant” group. Each horizontal bar represents a genus, with colored segments showing the proportion of isolates from various plant parts, including the shoot, seeds, flower, and root. The number within each segment indicates the count of records.
Figure 11. Distribution of Lactobacillaceae genera across different plant-associated habitats. This chart illustrates the prevalence of key genera within the “Plant” group. Each horizontal bar represents a genus, with colored segments showing the proportion of isolates from various plant parts, including the shoot, seeds, flower, and root. The number within each segment indicates the count of records.
Diversity 17 00776 g011
Figure 12. Global geographic distribution of the top 20 Lactobacillaceae isolation sources from the NCBI database. The map illustrates the location and density of sampling, with the size of each dot corresponding to the number of isolation sources identified. Significant sampling bias is evident: a high concentration of isolation sources is observed in East Asia, Europe, and North America, with significant underrepresentation in polar areas, Africa, South America, and Central Asia.
Figure 12. Global geographic distribution of the top 20 Lactobacillaceae isolation sources from the NCBI database. The map illustrates the location and density of sampling, with the size of each dot corresponding to the number of isolation sources identified. Significant sampling bias is evident: a high concentration of isolation sources is observed in East Asia, Europe, and North America, with significant underrepresentation in polar areas, Africa, South America, and Central Asia.
Diversity 17 00776 g012
Figure 13. Distribution of unclassified and environmental samples of Lactobacillaceae across the four habitat groups. This chart shows the proportion of records designated as “unclassified” or “environmental samples” at a specific taxonomic level within each of the four major habitat groups. A small fraction of records (less than 1%), highlighted in yellow, could not be classified to a specific source and are grouped separately.
Figure 13. Distribution of unclassified and environmental samples of Lactobacillaceae across the four habitat groups. This chart shows the proportion of records designated as “unclassified” or “environmental samples” at a specific taxonomic level within each of the four major habitat groups. A small fraction of records (less than 1%), highlighted in yellow, could not be classified to a specific source and are grouped separately.
Diversity 17 00776 g013
Figure 14. Proportional composition of sample types within the top 30 ecological niches with the highest abundance of uncharacterized Lactobacillaceae representatives. Each horizontal bar represents the total record from a specific habitat (normalized to 100%) and is segmented to show the relative proportion of isolation sources that are taxonomically classified, unclassified, or derived directly from environmental samples. Habitats are ranked in descending order based on the absolute number of unclassified and environmental samples, placing the most promising niches for novel diversity discovery at the top.
Figure 14. Proportional composition of sample types within the top 30 ecological niches with the highest abundance of uncharacterized Lactobacillaceae representatives. Each horizontal bar represents the total record from a specific habitat (normalized to 100%) and is segmented to show the relative proportion of isolation sources that are taxonomically classified, unclassified, or derived directly from environmental samples. Habitats are ranked in descending order based on the absolute number of unclassified and environmental samples, placing the most promising niches for novel diversity discovery at the top.
Diversity 17 00776 g014
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sokolova, T.S.; Namsaraev, Z.B.; Wolf, E.R.; Kulyashov, M.A.; Akberdin, I.R.; Sazonov, A.E. Mapping Global Biodiversity and Habitat Distribution of Lactobacillaceae Using NCBI Sequence Metadata. Diversity 2025, 17, 776. https://doi.org/10.3390/d17110776

AMA Style

Sokolova TS, Namsaraev ZB, Wolf ER, Kulyashov MA, Akberdin IR, Sazonov AE. Mapping Global Biodiversity and Habitat Distribution of Lactobacillaceae Using NCBI Sequence Metadata. Diversity. 2025; 17(11):776. https://doi.org/10.3390/d17110776

Chicago/Turabian Style

Sokolova, Tatiana S., Zorigto B. Namsaraev, Ekaterina R. Wolf, Mikhail A. Kulyashov, Ilya R. Akberdin, and Aleksey E. Sazonov. 2025. "Mapping Global Biodiversity and Habitat Distribution of Lactobacillaceae Using NCBI Sequence Metadata" Diversity 17, no. 11: 776. https://doi.org/10.3390/d17110776

APA Style

Sokolova, T. S., Namsaraev, Z. B., Wolf, E. R., Kulyashov, M. A., Akberdin, I. R., & Sazonov, A. E. (2025). Mapping Global Biodiversity and Habitat Distribution of Lactobacillaceae Using NCBI Sequence Metadata. Diversity, 17(11), 776. https://doi.org/10.3390/d17110776

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop