Next Article in Journal
Effect of Headstarting Eggstrands of the Endangered Houston Toad (Bufo = [Anaxyrus] houstonensis) from a Captive Assurance Colony on Native Breeding Pond Microbiomes
Previous Article in Journal
Perceptions and Opinions Regarding the Reintroduction of Eurasian Lynx to England: A Preliminary Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Automated Bioinformatic Pipeline to Analyze Biodiversity Data for Conservation Purposes: A Test Case for Colorado Macrofungi

1
Department of Biological Sciences, Purdue University Northwest, Hammond, IN 46323, USA
2
Pikes Peak Mycological Society, Colorado Springs, CO 80919, USA
3
Colorado Natural Heritage Program, Colorado State University, Fort Collins, CO 80521, USA
4
Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
5
Department of Research and Conservation, Denver Botanic Gardens, Denver, CO 80206, USA
*
Author to whom correspondence should be addressed.
Conservation 2025, 5(2), 24; https://doi.org/10.3390/conservation5020024
Submission received: 26 April 2025 / Revised: 20 May 2025 / Accepted: 20 May 2025 / Published: 26 May 2025

Abstract

:
Fungi are of critical importance in supporting biodiversity and the world’s ecosystems, yet their conservation status has only been assessed relatively recently as part of the IUCN’s Red List of threatened species. While there are several challenges to evaluating fungi for conservation purposes, there is an urgent need to bring fungi more broadly into the conservation framework. Here, we present an automated bioinformatic pipeline for processing data from one of the largest fungal biodiversity datasets to assess species conservation status using a test case of conspicuous macrofungi from the state of Colorado. This pipeline can rapidly process existing data from both specimen- and observation-based records available through MyCoPortal for making conservation status assessments, and the approach presented employs ‘fuzzy matching’ techniques for correcting commonly encountered misspelled taxonomic names in the data. Such assessments are required for integrating fungi into the NatureServe conservation status framework. The pipeline can easily be scaled to produce robust assessments, even at the national level, which can be valuable in focusing field activity for verification purposes. Of the available 117,006 biodiversity data records from Colorado, our processing test case produced a final processed dataset of 36,637 macrofungal records from the state. From this, a focus list of 1613 rarely documented Colorado species was produced for consideration, with 30 of these also being found on the Red List. A more comprehensive conservation status assessment based on scoring in the NatureServe framework was then produced that provided status ranking for 2438 unique, valid, and current taxonomic names for Colorado macrofungi in the processed dataset.

1. Introduction

Fungi play important roles in the natural world as decay organisms, pathogens, and symbionts in all aquatic and terrestrial ecosystems studied to date [1,2,3,4,5]. For example, fungal symbiotic relationships occur with organisms across the tree of life (e.g., animals, bacteria, and plants), with fungi mediating linkages between these diverse organisms and the ecosystems they inhabit [6]. Mutualistic mycorrhizal fungi, for instance, form symbiotic relationships with nearly all land plants [7] and these associations help in plant establishment and survival [8,9], fostering a healthy and sustainable ecology [10], as well as improving plant responses to global change factors [11]. Fungi also perform a large portion of the decomposition and nutrient recycling processes on the planet [12] and play a critical role in maintaining soil structure, contributing to aeration and water infiltration, as well as overall soil health [13]. Thus, fungi and their direct relationships with other organisms continue to ensure functioning ecosystems, which is essential for all of life.
Fungal species diversity on Earth is thought to only be outnumbered by that of the insects [14], yet despite their central role in maintaining ecosystems globally, understanding of fungal biodiversity remains incomplete. While there are approximately 155,000 described species of fungi, estimates of fungal diversity range from 1.5 to 3.8 million [15,16,17,18]; thus, roughly 90% or more of all fungal species remain unknown to science [19]. Despite their overall importance in numbers and ecology, fungi have only been included within the conservation framework relatively recently. The International Union for the Conservation of Nature (IUCN) was established in 1948 as the leading international organization working on the conservation of natural resources [20]. The IUCN’s ‘Red List’, an inventory of the conservation status and extinction risk for biological species, was initiated in 1964 [21]. Fungi were not evaluated, however, as part of the Red List efforts until 2013 [22]. By 2024, only 818 fungal species had been evaluated by the IUCN’s trained panels of ‘Assessors’, with 340 being ranked as threatened [23] (see Table 1a). A recent 2025 update examined an additional 482 species, with 1300 fungal species now evaluated in total, and the threatened number being raised to 411 [24] (see Table 1a). Thus, less than 1% of all known fungal species, a small portion of the total estimated fungal diversity, have been evaluated for conservation status, with roughly one-third of those assessed (~32%) being elevated to IUCN Red List status. This suggests that a high number of known fungal species are at risk; therefore, a fungal conservation framework may be crucial for preserving ecosystem function overall, but also for the survival of species in general. Progress, however, has been slow, and the urgent need to more extensively evaluate fungal species for threatened status is recognized within the IUCN, with calls being made for ‘better data’ to facilitate ‘meaningful action to protect fungi’ [25].
There are many challenges to incorporating fungi into conservation frameworks, including the insufficient documentation of fungal biodiversity and poor understanding of ranges for numerous fungal species [4]. Historically, biodiversity assessments have been linked to specimen-based records in natural history collections; however, support for these institutions continues to diminish, with recent specimen digitization efforts fortuitously preserving existing biodiversity data and making it available online [26]. For fungi, specimen digitization was funded as part of the NSF’s Advancing Digitization of Biological Collections, with the Mycology Collections Portal (MyCoPortal, http://mycoportal.org) acting as the central repository for these data [27]. Concurrent with institutional digitization, the potential for community scientists to aid in fungal conservation efforts is increasingly being recognized [28]. The MyCoPortal now hosts ca. 9.8 million records of fungal specimens from 136 natural history, governmental, and university institution across the globe, but mainly representing the United States. The MyCoPortal also aggregates observational records from recognized web-based community scientist initiatives, such as iNaturalist (https://www.inaturalist.org) and Mushroom Observer (https://mushroomobserver.org). As such, the MyCoPortal represents one of the largest datasets of specimen- and observation-based records ever assembled that provides information on the number and ranges of fungal species, and it has demonstrated utility for fungal biodiversity assessments [29].
Here, we respond to the urgent call for improved data [25] and collaboration [30] for assessing species of fungi for conservation status by presenting a modern automated systematic approach for analyzing existing biodiversity data to understand the abundance and rarity of fungal species populations. In addition to leveraging available web-based application program interfaces (APIs), which allow for automated programmatic access to the target databases for requesting and retrieving information, our approach also relied on the conservation status assessment methods previously developed by Master [31] and NatureServe [32], which partners with the IUCN. NatureServe currently documents conservation status and locations of organisms through their database Biotics, which is made available to the public via NatureServe Explorer (https://explorer.natureserve.org). As a ‘test case’ for our conservation assessment approach, we focused on non-lichenized macrofungi, a visible and important component of fungal biodiversity, in the state of Colorado, which represents an ecoregion recognized for high species diversity [33].

2. Materials and Methods

We developed a bioinformatic pipeline that relies on available fungal records from specimens (e.g., metadata from physical, dried collections documenting the presence of a fungus in a particular area and residing in natural history museums or institutes) and observations (e.g., metadata recorded by community scientists documenting the presence of a fungus in a particular area but typically lacking a physical specimen) and processed these data for conservation purposes using the R statistical software (version 4.5.0) platform [34] and the R Studio (version 3.6.0) guided user interface [35]. Our pipeline (Figure 1) gathers these records automatically from the MyCoPortal’s [27] Symbiota [36] API (https://www.mycoportal.org/portal/api/v2/documentation), and in this test case, we focused solely on putative fungal records originating from the state of Colorado; however, other states can be selected in the R script for automated record gathering and processing in the bioinformatic pipeline. Our approach used the following R packages in pipeline processing: data.table, dplyr, ggplot2, httr, jsonlite, measurements, Pracma, RCurl, rvest, stringdist, tcltk, and XML.
Despite Symbiota’s data cleaning tools available in MyCoPortal, we encountered data quality issues in the recovered records, primarily misspelled taxonomic names. The potential for errors is certainly recognized within MyCoPortal (i.e., see https://www.mycoportal.org/portal/misc/usagepolicy.php), and such issues are well known for biodiversity data; however, best practices [37] can be followed in addressing them. To rectify the erroneous spelling problem, after pulling record data from the selected state (Colorado in this case) via MyCoPortal API [38], the first step involved validating the spelling of the generic names (Figure 2). This step used exact matching to generic names from the Index Fungorum (IF; an online index for all fungal taxonomic names; https://www.indexfungorum.org) API (https://www.indexfungorum.org/ixfwebservice/fungus.asmx), which also allowed us to flag non-fungal genera, primarily slime molds (e.g., Eukaryota, Amoebozoa), for removal from the dataset.
Once generic names were validated, the next step involved validating taxonomic names at the species level (Figure 2). First, we combined the validated genus-level name with unvalidated specific epithet (i.e., species name) and used exact matching to this concatenated binomial via the fdex API to check the spelling. This API provided access to the fdex database (https://www.fdex.org), which aggregates and standardizes data from the primary fungal taxonomic name databases of Index Fungorum and MycoBank [39] (https://www.mycobank.org). As we were unable to validate many taxonomic names at the level of genus or species, we included two additional steps that attempted to determine the correct spelling for the binomial (Figure 2). These steps used a ‘fuzzy’ matching approach [see 37] with name comparisons under different spelling suggestion strategies. Here, we included scoring for approximate string matching (amatch in stringdist) based on Jaccard distances to then select and evaluate the closest match. The spelling suggestion strategies used word banks of possible binomials pulled from the fdex API for the first pass, while the second pass leveraged the Microsoft Bing (https://www.bing.com) search engine for spelling suggestions.
After the initial genus- and species-level validation and fuzzy matching steps, we then further validated taxonomic names below the level of species (e.g., varieties or forms), again using the fdex API for exact matching on a validated binomial plus the infraspecific rank (corrected and standardized when applicable) and name. This later step also collapsed infraspecific autonyms to the species level (e.g., Amanita pantherina var. pantherina to Amanita pantherina). With each validating step using the fdex API, we were also able to determine the ‘current name’ following the Index Fungorum convention from each of the ‘original’ taxonomic names provided in the dataset of gathered records. The IF current name convention also condensed all infraspecific names to the level of species. With these steps completed, we used the FUNGuild [40] (https://www.funguild.org) API to determine generic names that represented lichenized (i.e., lichens) or microfungal (e.g., rusts and smuts) taxa, and all the names associated with these genera were then flagged for removal from the dataset.
Once these ‘data cleaning’ steps (see Figure 1 and Figure 2) were completed, we then tallied the number of records associated with each valid and current macrofungal binomial. From this tally, we were able to determine and plot, the most abundantly documented taxa from the state selected (e.g., the ‘Top-50’ most abundant taxa of macrofungal species for Colorado, in this case) as well as recover the more rarely documented macrofungal taxa, which included those binomials associated with only one or two collection records from the state. As the NatureServe conservation status assessment methods uses the ‘element occurrence’ concept to approximate species populations for organisms, we further determined if binomials associated with two records were based on observations or specimens collected at a distance further than one kilometer apart, a standard NatureServe convention for delineating distinct species populations of plants [41]. This step used the Haversine formula available in the Pracma R package to determine metric linear distance, when both records included georeferenced data. When latitude/longitude were not available for the records, we indicated that the records represented two distinct fungal element occurrences (i.e., different populations) when the records were recorded from different counties in the state (viz. a reasonable, generalized estimate).
In the final step of this automated pipeline process, we compared the list of putatively rare taxa to the fungal species included in the IUCN Red List, with those data downloaded from the Red List website (https://www.iucnredlist.org) as a csv file. This later step allowed us to generate a ‘rapid list’ of what were considered to be rare and possibly threatened macrofungal species from the state, with these needing further investigation. Finally, we exported the list of processed (viz. ‘clean’) macrofungal records and their associated metadata from the R environment as a .csv file for further use in conservation status ranking calculations (see below) within the NatureServe framework. The Colorado data generated via the automated bioinformatic pipeline are publicly available (http://www.stbates.org/supplemental.html), while the R script for the pipeline can be accessed via GitHub (https://github.com/stbates/automated_bioinformatic_pipeline/).
For the conservation status ranking calculations, we used a ‘RankingMetricRules.xlsx’ Microsoft Excel spreadsheet and accompanying Python programing language [42] script for ‘bulk’ calculation (referred to here as the ‘bulk calculator’) of range extent for occurrences (EOO), area of occupancy for occurrences (AOO), and number of element occurrences, which are related to the ‘rarity’ factor category in the NatureServe conservation framework [32] (see Table 9 in that publication). Since the bulk calculator is designed to work rapidly in an automated manner without manual review from a qualified biologist, it only ranks taxa according to rarity, bypassing the ‘threats’ and ‘trends’ factor categories that require informed assessment and/or additional data. From these calculated values, each valid and current macrofungal binomial included in the processed records file generated via the pipeline (see Macrofungi_of_Colorado_Condensed.csv in the public directory) was assigned a conservation status rank. The bulk calculator used a one km separation distance (see above) for clustering species into hypothetical element occurrences and assigned conservation status ranks of S1–S5 (i.e., with the potential to be ‘critically imperiled’, ‘imperiled’, ‘vulnerable’, ‘apparently secure’, or ‘secure’, respectively, according to rank) based on rarity [32] (see Table 10 in that publication).
The bulk calculator required a user license for ArcGIS Pro (Esri, Redlands, California) to perform the special calculations needed to estimate the number of distinct element occurrences according to the georeferenced data for each record. The ranking rules for the bulk calculator relied on the ‘RULES’ sheet within the NatureServe Element Rank Estimator [41] Microsoft Excel macro workbook (https://www.natureserve.org/products/conservation-rank-calculator/download). The bulk calculator used an equal-area global geospatial projection from the IUCN EOO Calculator ArcMap Tool to ensure that results were comparable to IUCN’s GeoCAT [43] (https://nc.iucnredlist.org/redlist/content/attachment_files/EOO_Calculator_v1.5.zip). The calculations were run on a computer where the user was logged into an ArcGIS Pro account before running the script, which also required a Python interpreter for handling. The Colorado Natural Heritage Program (CNHP) has been previously used with the bulk calculator to assess conservation status for arthropods, bryophytes, and vascular plants in the state. The Python script and bulk calculator (and related files) are publicly available (https://github.com/chollenb-cnhp/BulkRarityCalculator/).

3. Results and Discussion

The specimen and observation record data gathering step in our automated bioinformatic pipeline recovered 117,006 Colorado records from the MyCoPortal API. After the initial validation steps and removing names representing slime molds, there were 37,598 records that remained, with 5497 unique names being associated with these records. At this point, there were 1286 unique names that were problematic (i.e., validation below the level genus was not possible), with 588 of these being species-level names and 698 being infraspecific-level names, putatively.
After the next round of processing (i.e., fuzzy matching, further validation at or below the species level, and removal of records not associated with macrofungi), 36,637 records (~31% of the original Colorado records) remained in the complete dataset for Colorado that were associated with 3081 unique, valid, and current macrofungal taxonomic names. Of these records, 24,464 (~67%) represented specimen-based records and 12,082 (~33%) represented observation-based records, with 91 (<1%) records providing no ‘basis of record’ data. These numbers demonstrate the importance of the historical specimen-based data, largely made available through fungal specimen digitization funding provided by the NSF, but they also highlight the valuable contribution of community scientists in providing data relevant for conservation purposes. Overall, there was roughly a 56% reduction in the unique names, removing those that were misspelled, not ultimately representing fungal groups, and collapsing all synonymous names to the current name recognized by Index Fungorum. The pipeline also removed all records associated with taxonomic names of lichenized fungi (10,862 records) and microfungi (43,682), as well as slime molds (9847). An additional 1230 records were removed that were associated with 476 unique names that remained problematic in one way or another. For example, some of these names represented plant species (e.g., Pinus edulis) mistakenly entered in the MyCoPortal, while others consisted of ‘herbarium names’ (e.g., Amanita stannea nom. prov.) that have never been validly published.
The ’Top-50’ most abundant macrofungal taxa from Colorado (i.e., those represented by the most specimen- and observation-based records) are given in Figure 3. These numbers are hard to analyze given the potential for various collectors to focus on different taxa for one reason or another, such as Amanita muscaria (the most abundantly collected species in Colorado), which has conspicuous sporocarps as well as ethnomycological uses. However, these abundance data provide inventories that can be instructive for intuitional collections, such as for identifying species that may be over-represented in specimen numbers. Conversely, our approach focused on rarely documented taxa for conservation purposes. Analysis processing in the automated bioinformatic pipeline produced a list of 1613 putatively rare macrofungal taxa in Colorado, which were taxonomic names associated with one or two element occurrences (Table 1). Of these, 1099 represented only one element occurrence, while 216 represent two distinct element occurrences. An additional 251 taxa had two non-distinct element occurrences represented by two different records collected at distances less than one kilometer apart. For all of these putatively rare taxa, 30 were also found among the fungi included on the IUCN Red List. There were an additional 46 macrofungal taxa that were more abundant (i.e., represented by more than two records) and were also included in the Red List (Table 1). These recovered data represented a rapid, automated assessment of macrofungal element occurrences within the NatureServe and Red List conservation frameworks. This informative rarity and Red List status assessment can then be used to focus future collecting efforts to both confirm these results and provide further guidance for monitoring species populations and distributions. Monitoring fungal populations is an important endeavor, especially considering that fungi are ideal indicator candidates for further focus and conservation efforts [30]. The combination of our pipeline rarity assessment with the previously established Red List conservation status was central to this step in our approach as this assessment provides a method for quickly identifying candidate species for further critical examination (see example below).
The bulk calculator further refined assessment more broadly across the recovered records according to the NatureServe scoring methodology for ranking, providing conservation status rank assessments for 2438 macrofungal taxa (79% of the original 3081 unique names) in Colorado that were part of the dataset processed in the pipeline and that included georeferenced data required for the ranking calculations. These taxonomic names were ranked for state conservation status within Colorado as follows: 9 ‘apparently secure’ (S4), 187 ‘vulnerable’ (S3), 433 ‘imperiled’ (S2), 1809 ‘critically imperiled’ (S1), and 0 taxa were found to be ‘secure’ (S5). The high number of imperiled and critically imperiled fungal taxa in bulk calculator ranking highlights the need to include fungi more intensively in statewide, as well as national, conservation programs, but they also indicate gaps in knowledge for fungal diversity preventing more robust conservation assessments. These gaps can only be addressed through more thorough biodiversity inventorying and monitoring efforts. Ranking via the bulk calculator was, however, more or less in line with our rapid rarity and Red List status assessment, suggesting there are over one thousand fungal taxa in Colorado that need additional scrutiny. More broadly, our overall assessment of fungal species in the state provides important additional data that can be used to evaluate taxa for incorporation within the NatureServe Explorer Biotics database for conservation purposes. In addition, these data assessments can provide guidance in other areas of conservation such as providing sets of fungal names for further evaluation under the IUCN’s Red List, especially after applying the methodology developed here to assess rarity more broadly across the nation. While there is currently a web-based GeoCAT platform from the IUCN that calculates several component factors in NatureServe’s conservation status assessment, GeoCAT requires uploading individual species data, one at a time, and is not designed for processing multi-species datasets for an entire taxonomic group. This is the first time the CNHP has used an automated systematic approach to assess conservation status en masse across a single group. Historically, species-by-species and manual approaches have been used, which are labor intensive, consequently leaving acute gaps in our understanding of conservation needs. The most innovative feature of the bulk calculator is that it can be used across different taxonomic groups, while Symbiota software (version 3.3.0), the platform used by MyCoPortal, is involved in hosting biodiversity data for a wide range of groups, such as bryophytes (e.g., https://bryophyteportal.org), lichens (e.g., https://lichenportal.org), and vascular plants (e.g., https://swbiodiversity.org). Thus, the automated bioinformatic pipeline presented here could be modified to focus assessment on these non-fungal groups, while minor adjustment in the R script for the pipeline could easily produce assessments for other fungal groups, such as lichens or microfungi.
Mueller and colleagues [22] stressed that ‘our knowledge of the threat status of fungi remains woefully incomplete’. Our automated bioinformatic pipeline approach was able to rapidly provide abundance and rarity metrics for fungal species populations in Colorado for conservation status assessment under the NatureServe and IUNC Red List frameworks. This test case also demonstrates the potential for using the pipeline on a state-by-state basis in more comprehensively analyzing one of the largest biodiversity datasets for fungi compiled to date in the MyCoPortal. Accordingly, this more global approach could go a long way in helping produce a more complete ‘threat status’ assessment for fungi. Conversely, while our approach did easily and rapidly provide assessment metrics for conservation purposes, the real work of ground truthing the resulting data is just beginning. The legacy and complexity of specimen and observational data, such as those observed here, provide additional challenges in this area; however, these obstacles are not insurmountable, particularly when specimens are available in institutional collections for study. For example, Craterellus cornucopioides (the black trumpet mushroom) was found among the rarely documented macrofungal taxa reported from Colorado that was also included on the Red List for threatened species with a ‘Least Concern’ status. The range for C. cornucopioides extends primarily from the East Coast of the U.S. to the Midwest, with some representation on the West Coast; thus, a report from the Rocky Mountain region was noteworthy. After examining the specimen (DBG-F-023109), it was determined that this collection represents Polyozellus multiplex, which is known to be mycorrhizal with the abundant spruce and firs found in Colorado. Conversely, another mycorrhizal Red List taxon reported from Colorado was Hydnellum mirabile (Figure 4 and Figure 5), which has ‘Vulnerable’ status within Europe due to destruction by the logging of habitats for its conifer host species. While the occurrence of this taxon in Colorado and other parts of the U.S. has been verified through voucher specimens, it is not reported from the U.S. on the IUCN Red List; thus, there is a key gap in the assessment of this species for conservation purposes at the international level. This is an example of a species that would be an excellent candidate for targeted collection within Colorado, so that its conspecificity with the European populations could be confirmed through detailed morphological, ecological, and genetic analysis from fresh material.
The approach demonstrated here also holds potential to offer actionable data metrics for conservation planners and land managers that could directly inform conservation prioritization and policy development within any state. For example, the resultant conservation metrics and associated biodiversity metadata could be combined with those of plants and/or animals for integration into spatial prioritization tools (e.g., Marxan [44] and Zonation 5 [45]) for identifying areas of high conservation value. Similarly, regions exhibiting high concentrations of rare or endemic fungal taxa could be designated as microrefugia or hotspots requiring urgent protection. Land managers could also use the abundance and rarity data generated through the pipeline to tailor forest management interventions, where areas supporting rare fungi could warrant low-impact management (e.g., selective thinning rather than clear cutting), restrictions on soil disturbance, or the establishment of buffer zones. Conversely, areas with high fungal abundance but low rarity may support sustainable use or educational/ecotourism initiatives without compromising conservation value. Data from our pipeline also hold the potential to inform environmental policy by contributing to Red List evaluation, habitat restoration goals, and biodiversity offsetting calculations. For example, rarity-weighted fungal scores can be used to evaluate potential habitat loss or guide reintroduction efforts for sensitive taxa. Furthermore, integration of fungal conservation status metrics into existing biodiversity indices would help align fungal conservation with national biodiversity targets and international frameworks, such as the CBD post-2020 targets [46]. Finally, our automated bioinformatic pipeline approach holds the potential to facilitate longitudinal monitoring, as with the ability to track changes in fungal rarity and abundance over time, managers could evaluate the effectiveness of conservation actions, detect early signs of ecosystem degradation, and adjust strategies proactively.

4. Conclusions

The automated bioinformatic approach presented here represents an important step in moving toward a more robust assessment of fungal biodiversity for conservation purposes. This methodology can now be applied to generate a baseline of fungal species abundance and rarity more broadly across the United States. With this information, species monitoring efforts may begin, allowing land managers and conservation organizations to more readily incorporate key fungal species into their work. The data from fungal monitoring efforts will then be used to build a more comprehensive fungal conservation assessment and add to our understanding of global vs. statewide rarity and strengthen data on fungal biodiversity, ecology, and phenology.
The examples presented here constitute important illustrations for the type of work that needs to be carried out across the U.S. This paper highlights the need for targeted collecting that focuses on addressing key knowledge gaps, especially for fungal populations with more critical Red List status. While our results are limited with the restricted approach for the Colorado test case, they demonstrate a methodology that can be easily and rapidly scaled and show how, with little effort, critical information can be gathered. Further, the approach can be adapted for application to other groups, more narrowly within the fungi, or more broadly for non-fungal group, especially where biodiversity documentation relies on Symbiota software. As we refine and apply our methods, the community of established partnerships (i.e., NatureServe, the Network of Natural Heritage Programs and Conservation Data Centers, the IUCN, etc.) can begin to build a more globally relevant assessment of conservation status for fungi. Given the substantial importance of these organisms for maintaining ecosystem function and their role in the survival of other species, the time is ripe to expand the scope of fungal conservation assessment.

Author Contributions

Conceptualization, S.T.B., A.H., and A.W.W.; methodology, S.T.B. and C.H.; software, S.T.B. and C.H.; validation, S.T.B., J.C., and C.H.; formal analysis, S.T.B., A.H., and C.H.; investigation, S.T.B., C.H., A.H., and A.W.W.; resources, S.T.B. and D.A.; data curation, S.T.B.; writing—original draft preparation, S.T.B., C.H., and A.H.; writing—review and editing, S.T.B., C.H., A.H., A.W.W., and D.A.; visualization, S.T.B.; supervision, S.T.B., A.H., A.W.W., and D.A.; project administration, S.T.B.; funding acquisition, S.T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated through the bioinformatic pipeline for Colorado are available in a publicly shared directory (http://www.stbates.org/supplemental.html), while the related scripts/code are available through respective GitHub repositories for the pipeline (https://github.com/stbates/automated_bioinformatic_pipeline/) and the bulk calculator (https://github.com/chollenb-cnhp/BulkRarityCalculator/).

Acknowledgments

The authors thank the Western Natural Heritage Programs leadership for their valuable discussion and feedback on our approach to conservation assessment and encouragement of fungal conservation efforts in general. We also thank Johan Nitare for graciously sharing his image of Hydnellum mirable from Sweden. One author (S.T.B.) acknowledges continued support from the Nils K. Nelson Endowment provided through the Purdue University Northwest College of Engineering and Sciences within the Department of Biological Sciences.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Fisher, M.C.; Henk, D.A.; Briggs, C.J.; Brownstein, J.S.; Madoff, L.C.; McCraw, S.L.; Burrr, S.J. Emerging fungal threats to animal, plant and ecosystem health. Nature 2012, 484, 186–194. [Google Scholar] [CrossRef] [PubMed]
  2. Peay, K.G.; Kennedy, P.G.; Talbot, J.M. Dimensions of biodiversity in the Earth mycobiome. Nat. Rev. Microbiol. 2016, 14, 434–447. [Google Scholar] [CrossRef]
  3. Kumar, V.; Sarma, V.V.; Thambugala, K.M.; Huang, J.-J.; Li, X.-Y.; Hao, G.-F. Ecology and evolution of marine fungi with their adaptation to climate change. Front. Microbiol. 2021, 12, 719000. [Google Scholar] [CrossRef]
  4. Niskanen, T.; Lücking, R.; Dahlberg, A.; Gaya, E.; Suz, L.M.; Mikryukov, V.; Liimatainen, K.; Druzhinina, I.; Westrip, J.R.; Mueller, G.M.; et al. Pushing the frontiers of biodiversity research: Unveiling the global diversity, distribution, and conservation of fungi. Annu. Rev. Environ. Resour. 2023, 48, 149–176. [Google Scholar] [CrossRef]
  5. Seena, S.; Baschien, C.; Barros, J.; Sridhar, K.R.; Graça, M.A.S.; Mykrä, H.; Bundschuh, H. Ecosystem services provided by fungi in freshwaters: A wake-up call. Hydrobiologia 2023, 850, 2779–2794. [Google Scholar] [CrossRef]
  6. Bahram, M.; Netherway, T. Fungi as mediators linking organisms and ecosystems. FEMS Microbiol. Rev. 2022, 46, fuab058. [Google Scholar] [CrossRef] [PubMed]
  7. van der Heijden, M.G.A.; Martin, F.M.; Selosse, M.-A.; Sanders, I.R. Mycorrhizal ecology and evolution: The past, the present, and the future. New Phytol. 2015, 205, 1406–1423. [Google Scholar] [CrossRef] [PubMed]
  8. van Der Heijden, M.G.A. Arbuscular mycorrhizal fungi as support systems for seedling establishment in grassland. Ecol. Lett. 2004, 7, 293–303. [Google Scholar] [CrossRef]
  9. Liang, M.; Shi, L.; Burslem, D.F.R.P.; Johnson, D.; Fang, M.; Zhang, X.; Yu, S. Soil fungal networks moderate density-dependent survival and growth of seedlings. New Phytol. 2021, 230, 2061–2071. [Google Scholar] [CrossRef]
  10. Martin, F.M.; van der Heijden, M.G.A. The mycorrhizal symbiosis: Research frontiers in genomics, ecology, and agricultural application. New Phytol. 2024, 242, 1486–1506. [Google Scholar] [CrossRef]
  11. Tang, B.; Man, J.; Lehmann, A.; Rillig, M.C. Arbuscular mycorrhizal fungi benefit plants in response to major global change factors. Ecol. Lett. 2023, 26, 2087–2097. [Google Scholar] [CrossRef] [PubMed]
  12. Leifheit, E.F.; Camenzind, T.; Lehmann, A.; Andrade-Linares, D.R.; Fussan, M.; Westhusen, S.; Wineberger, T.M.; Rillig, M.C. Fungal traits help to understand the decomposition of simple and complex plant litter. FEMS Microbiol. Ecol. 2024, 100, fiae033. [Google Scholar] [CrossRef] [PubMed]
  13. Rillig, M.C.; Mummey, D.L. Mycorrhizas and soil structure. New Phytol. 2006, 171, 41–53. [Google Scholar] [CrossRef] [PubMed]
  14. Purvis, A.; Hector, A. Getting the measure of biodiversity. Nature 2000, 405, 212–219. [Google Scholar] [CrossRef]
  15. Hawksworth, D.L. The magnitude of fungal diversity: The 1.5 million species estimate revisited. Mycol. Res. 2001, 105, 1422–1432. [Google Scholar] [CrossRef]
  16. Hawksworth, D.L.; Lücking, R. Fungal diversity revisited: 2.2 to 3.8 million species. Microbiol. Spectr. 2017, 5, 10-1128. [Google Scholar] [CrossRef]
  17. Cheek, M.; Lughadha, E.N.; Kirk, P.; Lindon, H.; Carretero, J.; Looney, B.; Douglas, B.; Haelewaters, D.; Gaya, E.; Llewellyn, T. New scientific discoveries: Plants and fungi. Plants People Planet 2020, 2, 371–388. [Google Scholar] [CrossRef]
  18. Baldrian, P.; Větrovský, T.; Lepinay, C.; Lepinay, C.; Kohout, P. High-throughput sequencing view on the magnitude of global fungal diversity. Fungal Divers. 2022, 114, 539–547. [Google Scholar] [CrossRef]
  19. Antonelli, A.; Fry, C.; Smith, R.J.; Eden, J.; Govaerts, R.H.A.; Kersey, P.; Nic Lughadha, E.; Onstein, R.E.; Simmonds, M.S.J.; Zizka, A.; et al. State of the World’s Plants and Fungi 2023; Royal Botanic Gardens: Kew, UK, 2023; pp. 14–25. [Google Scholar] [CrossRef]
  20. Springer, J.; Campese, J.; Nakangu, B. The Natural Resource Governance Framework; IUCN: Gland, Switzerland, 2021. [Google Scholar] [CrossRef]
  21. Rodrigues, A.S.; Pilgrim, J.D.; Lamoreux, J.F.; Hoffmann, M.; Brooks, T.M. The value of the IUCN Red List for conservation. Trends Ecol. Evol. 2006, 21, 71–76. [Google Scholar] [CrossRef]
  22. Mueller, G.M.; Cunha, K.M.; May, T.W.; Allen, J.L.; Westrip, J.R.S.; Canteiro, C.; Costa-Rezende, D.H.; Drechsler-Santos, E.R.; Vasco-Palacios, A.M.; Ainsworth, A.M.; et al. What do the first 597 global fungal Red List assessments tell us about the threat status of Fungi? Diversity 2022, 14, 736. [Google Scholar] [CrossRef]
  23. The IUCN Red List of Threatened Species. Version 2024-2. Available online: https://nc.iucnredlist.org/redlist/content/attachment_files/2024-2_RL_Table_1a.pdf (accessed on 16 May 2025).
  24. The IUCN Red List of Threatened Species. Version 2025-1. Available online: https://nc.iucnredlist.org/redlist/content/attachment_files/2025-1_RL_Table_1a.pdf (accessed on 16 May 2025).
  25. First 1,000 Fungi on IUCN Red List Reveal Growing Threats—IUCN Red List. Available online: https://iucn.org/press-release/202503/first-1000-fungi-iucn-red-list-reveal-growing-threats-iucn-red-list (accessed on 16 May 2025).
  26. Hilton, E.J.; Watkins-Colwell, G.J.; Huber, S.K. The expanding role of natural history collections. Ichthyol. Herpetol. 2021, 109, 379–391. [Google Scholar] [CrossRef]
  27. Miller, A.N.; Bates, S.T. The mycology collections portal (MyCoPortal). IMA Fungus 2017, 8, A65–A66. [Google Scholar] [CrossRef]
  28. Haelewaters, D.; Quandt, C.A.; Bartrop, L.; Cazabonne, J.; Crockatt, M.E.; Cunha, S.P.; De Lange, R.; Dominici, L.; Douglas, B.; Drechsler-Santos, E.R.; et al. The power of citizen science to advance fungal conservation. Conserv. Lett. 2024, 17, e13013. [Google Scholar] [CrossRef] [PubMed]
  29. Bates, S.T.; Miller, A.N.; the Macrofungi Collections and Microfungi Collections Consortia. The protochecklist of North American nonlichenized Fungi. Mycologia 2018, 110, 1222–1348. [Google Scholar] [CrossRef]
  30. Heilmann-Clausen, J.; Barron, E.S.; Boddy, L.; Dahlberg, A.; Griffith, G.W.; Nordén, J.; Ovaskainen, O.; Perini, C.; Senn-Irlet, B.; Halme, P. A fungal perspective on conservation biology. Conserv. Biol. 2015, 29, 61–68. [Google Scholar] [CrossRef]
  31. Master, L.L. Assessing threats and setting priorities for conservation. Conserv. Biol. 1991, 5, 559–563. [Google Scholar] [CrossRef]
  32. Faber-Langendoen, D.; Nichols, J.; Master, L.; Snow, K.; Tomaino, A.; Bittman, R.; Hammerson, G.; Heidel, B.; Ramsay, L.; Teucher, A.; et al. NatureServe Conservation Status Assessments: Methodology for Assigning Ranks; NatureServe: Arlington, VA, USA, 2012. [Google Scholar]
  33. Ricketts, T.H.; Dinerstein, E.; Olson, D.M.; Loucks, C. Who’s where in North America? BioScience 1999, 49, 369–381. [Google Scholar] [CrossRef]
  34. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018; Available online: https://www.R-project.org/ (accessed on 1 August 2024).
  35. Posit Team. RStudio: Integrated Development Environment for R; Posit Software; PBC: Boston, MA, USA, 2024; Available online: http://www.posit.co/ (accessed on 1 August 2024).
  36. Gries, C.; Gilbert, E.E.; Franz, N.M. Symbiota—A virtual platform for creating voucher-based biodiversity information communities. Biodivers. Data J. 2014, 2, e1114. [Google Scholar] [CrossRef]
  37. Grenié, M.; Berti, E.; Carvajal-Quintero, J.; Dädlow, G.M.; Sagouis, A.; Winter, M. Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods Ecol. Evol. 2023, 14, 12–25. [Google Scholar] [CrossRef]
  38. MyCoPortal. 2025. Available online: http://www.mycoportal.org/portal/index.php (accessed on 1 March 2025).
  39. Crous, P.W.; Gams, W.; Stalpers, J.A.; Robert, V.; Stegehuis, G. MycoBank: An online initiative to launch mycology into the 21st century. Stud. Mycol. 2004, 50, 19–22. [Google Scholar]
  40. Nguyen, N.H.; Song, Z.; Bates, S.T.; Branco, S.; Tedersoo, L.; Menke, J.; Schilling, J.S.; Kennedy, P.G. FUNGuild: An open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecol. 2016, 20, 241–248. [Google Scholar] [CrossRef]
  41. NatureServe. Element Occurrence Data Standard; NatureServe in cooperation with the Network of Natural Heritage Programs and Conservation Data Centers: Rosslyn, VA, USA, 2002; Available online: http://www.natureserve.org/prodServices/eodata.jsp (accessed on 25 March 2025).
  42. van Rossum, G. Python Tutorial, Technical Report CS-R9526; Centrum voor Wiskunde en Informatica (CWI): Amsterdam, The Netherlands, 1995. [Google Scholar]
  43. Bachman, S.; Moat, J.; Hill, A.W.; de la Torre, J.; Scott, B. Supporting Red List threat assessments with GeoCAT: Geospatial conservation assessment tool. ZooKeys 2011, 150, 117–126. [Google Scholar] [CrossRef] [PubMed]
  44. Serra, N.; Kockel, A.; Game, E.T.; Grantham, H.; Possingham, H.P.; McGowan, J. Marxan User Manual: For Marxan Version 2.43 and Above; The Nature Conservancy: Arlington, VA, USA; Pacific Marine Analysis and Research Association: Victoria, BC, Canada, 2020; Available online: https://marxansolutions.org/wp-content/uploads/2021/02/Marxan-User-Manual_2021.pdf (accessed on 19 May 2025).
  45. Moilanen, A.; Lehtinen, P.; Kohonen, I.; Virtanen, E.; Jalkanen, J.; Kujala, H. Novel methods for spatial prioritization with applications in conservation, land use planning and ecological impact avoidance. Methods Ecol. Evol. 2022, 13, 1062–1072. [Google Scholar] [CrossRef]
  46. Friedman, K.; Bridgewater, P.; Agostini, V.; Agardy, T.; Arico, S.; Biermann, F.; Brown, K.; Cresswell, I.D.; Ellis, E.C.; Failler, P.; et al. The CBD Post-2020 biodiversity framework: People’s place within the rest of nature. People Nat. 2022, 4, 1475–1484. [Google Scholar] [CrossRef]
Figure 1. Data flow schematic for automated record gathering and processing (‘data cleaning’) in the bioinformatic pipeline that incorporated various application programming interfaces (APIs). The pipeline accessed specimen- and observation-based records from the selected state (CO in this case), validated fungal taxonomic names (* see Figure 2 for the name handling process), and removed taxa that were not macrofungi (e.g., lichens, microfungi, and slime molds).
Figure 1. Data flow schematic for automated record gathering and processing (‘data cleaning’) in the bioinformatic pipeline that incorporated various application programming interfaces (APIs). The pipeline accessed specimen- and observation-based records from the selected state (CO in this case), validated fungal taxonomic names (* see Figure 2 for the name handling process), and removed taxa that were not macrofungi (e.g., lichens, microfungi, and slime molds).
Conservation 05 00024 g001
Figure 2. Data flow schematic for the pipeline name handling process. Names were first validated by exact matching to existing genera using the IF API. Specific epithets were validated by exact matching to existing binomials using the fdex API, and when applicable, these were corrected to the current name according to the IF convention (as indicated in fdex, e.g., A. livida = A. vaginata). When binomials did not match, the fdex API was accessed to generate a list of potential correct names from the valid generic name plus the first three letters of the incorrect name, with the closest match determined and scored to ensure the selection of the correct name. Remaining problematic names were then checked against the Bing search engine, which offered corrected spelling suggestions (these were also scored to ensure the selection of the correct name and, when applicable, updated to the current name according to the IF convention). Remaining names were further screened manually (e.g., revealing A. stannea as a provisional name used in the herbarium).
Figure 2. Data flow schematic for the pipeline name handling process. Names were first validated by exact matching to existing genera using the IF API. Specific epithets were validated by exact matching to existing binomials using the fdex API, and when applicable, these were corrected to the current name according to the IF convention (as indicated in fdex, e.g., A. livida = A. vaginata). When binomials did not match, the fdex API was accessed to generate a list of potential correct names from the valid generic name plus the first three letters of the incorrect name, with the closest match determined and scored to ensure the selection of the correct name. Remaining problematic names were then checked against the Bing search engine, which offered corrected spelling suggestions (these were also scored to ensure the selection of the correct name and, when applicable, updated to the current name according to the IF convention). Remaining names were further screened manually (e.g., revealing A. stannea as a provisional name used in the herbarium).
Conservation 05 00024 g002
Figure 3. The ‘Top-50’ most abundantly documented macrofungal species in Colorado determined from our automated bioinformatic pipeline for processing biodiversity data.
Figure 3. The ‘Top-50’ most abundantly documented macrofungal species in Colorado determined from our automated bioinformatic pipeline for processing biodiversity data.
Conservation 05 00024 g003
Figure 4. One of two Colorado records for Hydnellum mirable recovered by our bioinformatic pipeline from MyCoPortal (https://www.mycoportal.org); this species is listed under the vulnerable status on the IUCN Red List with the range of occurrence recorded only from Europe (Austria, Czech Republic, Finland, France, Italy, Norway, Russian Federation, Sweden, and Switzerland).
Figure 4. One of two Colorado records for Hydnellum mirable recovered by our bioinformatic pipeline from MyCoPortal (https://www.mycoportal.org); this species is listed under the vulnerable status on the IUCN Red List with the range of occurrence recorded only from Europe (Austria, Czech Republic, Finland, France, Italy, Norway, Russian Federation, Sweden, and Switzerland).
Conservation 05 00024 g004
Figure 5. A field image of Hydnellum mirable from Sweden (image credit: Johan Nltare; https://redlist.info/iucn/species_view/100892). Further work on freshly collected material is required to verify the conspecificity of populations from the United State with those in Europe.
Figure 5. A field image of Hydnellum mirable from Sweden (image credit: Johan Nltare; https://redlist.info/iucn/species_view/100892). Further work on freshly collected material is required to verify the conspecificity of populations from the United State with those in Europe.
Conservation 05 00024 g005
Table 1. Data summary of putatively rare and/or threatened taxa based on the NatureServe elemental occurrence convention, where elemental occurrences tentatively represent distinct populations of fungal species, and Red List status for fungal conservation.
Table 1. Data summary of putatively rare and/or threatened taxa based on the NatureServe elemental occurrence convention, where elemental occurrences tentatively represent distinct populations of fungal species, and Red List status for fungal conservation.
Elemental Occurrences# of Taxa# on Red ListNotes
One109921Records representing collections from one distinct location.
One (+)2514Records representing collections from two distinct locations,
which were less than 1 km apart.
Two2165Records representing collections from two distinct locations,
which were greater than 1 km apart.
Two (+)4646Records representing collections from more than two locations but that appear on the Red List.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bates, S.T.; Chelin, J.; Hollenberg, C.; Honan, A.; Wilson, A.W.; Anderson, D. An Automated Bioinformatic Pipeline to Analyze Biodiversity Data for Conservation Purposes: A Test Case for Colorado Macrofungi. Conservation 2025, 5, 24. https://doi.org/10.3390/conservation5020024

AMA Style

Bates ST, Chelin J, Hollenberg C, Honan A, Wilson AW, Anderson D. An Automated Bioinformatic Pipeline to Analyze Biodiversity Data for Conservation Purposes: A Test Case for Colorado Macrofungi. Conservation. 2025; 5(2):24. https://doi.org/10.3390/conservation5020024

Chicago/Turabian Style

Bates, Scott T., James Chelin, Clark Hollenberg, Amy Honan, Andrew W. Wilson, and David Anderson. 2025. "An Automated Bioinformatic Pipeline to Analyze Biodiversity Data for Conservation Purposes: A Test Case for Colorado Macrofungi" Conservation 5, no. 2: 24. https://doi.org/10.3390/conservation5020024

APA Style

Bates, S. T., Chelin, J., Hollenberg, C., Honan, A., Wilson, A. W., & Anderson, D. (2025). An Automated Bioinformatic Pipeline to Analyze Biodiversity Data for Conservation Purposes: A Test Case for Colorado Macrofungi. Conservation, 5(2), 24. https://doi.org/10.3390/conservation5020024

Article Metrics

Back to TopTop