1. Introduction
Fungi play important roles in the natural world as decay organisms, pathogens, and symbionts in all aquatic and terrestrial ecosystems studied to date [
1,
2,
3,
4,
5]. For example, fungal symbiotic relationships occur with organisms across the tree of life (e.g., animals, bacteria, and plants), with fungi mediating linkages between these diverse organisms and the ecosystems they inhabit [
6]. Mutualistic mycorrhizal fungi, for instance, form symbiotic relationships with nearly all land plants [
7] and these associations help in plant establishment and survival [
8,
9], fostering a healthy and sustainable ecology [
10], as well as improving plant responses to global change factors [
11]. Fungi also perform a large portion of the decomposition and nutrient recycling processes on the planet [
12] and play a critical role in maintaining soil structure, contributing to aeration and water infiltration, as well as overall soil health [
13]. Thus, fungi and their direct relationships with other organisms continue to ensure functioning ecosystems, which is essential for all of life.
Fungal species diversity on Earth is thought to only be outnumbered by that of the insects [
14], yet despite their central role in maintaining ecosystems globally, understanding of fungal biodiversity remains incomplete. While there are approximately 155,000 described species of fungi, estimates of fungal diversity range from 1.5 to 3.8 million [
15,
16,
17,
18]; thus, roughly 90% or more of all fungal species remain unknown to science [
19]. Despite their overall importance in numbers and ecology, fungi have only been included within the conservation framework relatively recently. The International Union for the Conservation of Nature (IUCN) was established in 1948 as the leading international organization working on the conservation of natural resources [
20]. The IUCN’s ‘Red List’, an inventory of the conservation status and extinction risk for biological species, was initiated in 1964 [
21]. Fungi were not evaluated, however, as part of the Red List efforts until 2013 [
22]. By 2024, only 818 fungal species had been evaluated by the IUCN’s trained panels of ‘Assessors’, with 340 being ranked as threatened [
23] (see Table 1a). A recent 2025 update examined an additional 482 species, with 1300 fungal species now evaluated in total, and the threatened number being raised to 411 [
24] (see Table 1a). Thus, less than 1% of all known fungal species, a small portion of the total estimated fungal diversity, have been evaluated for conservation status, with roughly one-third of those assessed (~32%) being elevated to IUCN Red List status. This suggests that a high number of known fungal species are at risk; therefore, a fungal conservation framework may be crucial for preserving ecosystem function overall, but also for the survival of species in general. Progress, however, has been slow, and the urgent need to more extensively evaluate fungal species for threatened status is recognized within the IUCN, with calls being made for ‘better data’ to facilitate ‘meaningful action to protect fungi’ [
25].
There are many challenges to incorporating fungi into conservation frameworks, including the insufficient documentation of fungal biodiversity and poor understanding of ranges for numerous fungal species [
4]. Historically, biodiversity assessments have been linked to specimen-based records in natural history collections; however, support for these institutions continues to diminish, with recent specimen digitization efforts fortuitously preserving existing biodiversity data and making it available online [
26]. For fungi, specimen digitization was funded as part of the NSF’s Advancing Digitization of Biological Collections, with the Mycology Collections Portal (MyCoPortal,
http://mycoportal.org) acting as the central repository for these data [
27]. Concurrent with institutional digitization, the potential for community scientists to aid in fungal conservation efforts is increasingly being recognized [
28]. The MyCoPortal now hosts ca. 9.8 million records of fungal specimens from 136 natural history, governmental, and university institution across the globe, but mainly representing the United States. The MyCoPortal also aggregates observational records from recognized web-based community scientist initiatives, such as iNaturalist (
https://www.inaturalist.org) and Mushroom Observer (
https://mushroomobserver.org). As such, the MyCoPortal represents one of the largest datasets of specimen- and observation-based records ever assembled that provides information on the number and ranges of fungal species, and it has demonstrated utility for fungal biodiversity assessments [
29].
Here, we respond to the urgent call for improved data [
25] and collaboration [
30] for assessing species of fungi for conservation status by presenting a modern automated systematic approach for analyzing existing biodiversity data to understand the abundance and rarity of fungal species populations. In addition to leveraging available web-based application program interfaces (APIs), which allow for automated programmatic access to the target databases for requesting and retrieving information, our approach also relied on the conservation status assessment methods previously developed by Master [
31] and NatureServe [
32], which partners with the IUCN. NatureServe currently documents conservation status and locations of organisms through their database Biotics, which is made available to the public via NatureServe Explorer (
https://explorer.natureserve.org). As a ‘test case’ for our conservation assessment approach, we focused on non-lichenized macrofungi, a visible and important component of fungal biodiversity, in the state of Colorado, which represents an ecoregion recognized for high species diversity [
33].
2. Materials and Methods
We developed a bioinformatic pipeline that relies on available fungal records from specimens (e.g., metadata from physical, dried collections documenting the presence of a fungus in a particular area and residing in natural history museums or institutes) and observations (e.g., metadata recorded by community scientists documenting the presence of a fungus in a particular area but typically lacking a physical specimen) and processed these data for conservation purposes using the R statistical software (version 4.5.0) platform [
34] and the R Studio (version 3.6.0) guided user interface [
35]. Our pipeline (
Figure 1) gathers these records automatically from the MyCoPortal’s [
27] Symbiota [
36] API (
https://www.mycoportal.org/portal/api/v2/documentation), and in this test case, we focused solely on putative fungal records originating from the state of Colorado; however, other states can be selected in the R script for automated record gathering and processing in the bioinformatic pipeline. Our approach used the following R packages in pipeline processing: data.table, dplyr, ggplot2, httr, jsonlite, measurements, Pracma, RCurl, rvest, stringdist, tcltk, and XML.
Despite Symbiota’s data cleaning tools available in MyCoPortal, we encountered data quality issues in the recovered records, primarily misspelled taxonomic names. The potential for errors is certainly recognized within MyCoPortal (i.e., see
https://www.mycoportal.org/portal/misc/usagepolicy.php), and such issues are well known for biodiversity data; however, best practices [
37] can be followed in addressing them. To rectify the erroneous spelling problem, after pulling record data from the selected state (Colorado in this case) via MyCoPortal API [
38], the first step involved validating the spelling of the generic names (
Figure 2). This step used exact matching to generic names from the Index Fungorum (IF; an online index for all fungal taxonomic names;
https://www.indexfungorum.org) API (
https://www.indexfungorum.org/ixfwebservice/fungus.asmx), which also allowed us to flag non-fungal genera, primarily slime molds (e.g., Eukaryota, Amoebozoa), for removal from the dataset.
Once generic names were validated, the next step involved validating taxonomic names at the species level (
Figure 2). First, we combined the validated genus-level name with unvalidated specific epithet (i.e., species name) and used exact matching to this concatenated binomial via the fdex API to check the spelling. This API provided access to the fdex database (
https://www.fdex.org), which aggregates and standardizes data from the primary fungal taxonomic name databases of Index Fungorum and MycoBank [
39] (
https://www.mycobank.org). As we were unable to validate many taxonomic names at the level of genus or species, we included two additional steps that attempted to determine the correct spelling for the binomial (
Figure 2). These steps used a ‘fuzzy’ matching approach [see 37] with name comparisons under different spelling suggestion strategies. Here, we included scoring for approximate string matching (amatch in stringdist) based on Jaccard distances to then select and evaluate the closest match. The spelling suggestion strategies used word banks of possible binomials pulled from the fdex API for the first pass, while the second pass leveraged the Microsoft Bing (
https://www.bing.com) search engine for spelling suggestions.
After the initial genus- and species-level validation and fuzzy matching steps, we then further validated taxonomic names below the level of species (e.g., varieties or forms), again using the fdex API for exact matching on a validated binomial plus the infraspecific rank (corrected and standardized when applicable) and name. This later step also collapsed infraspecific autonyms to the species level (e.g.,
Amanita pantherina var.
pantherina to
Amanita pantherina). With each validating step using the fdex API, we were also able to determine the ‘current name’ following the Index Fungorum convention from each of the ‘original’ taxonomic names provided in the dataset of gathered records. The IF current name convention also condensed all infraspecific names to the level of species. With these steps completed, we used the FUNGuild [
40] (
https://www.funguild.org) API to determine generic names that represented lichenized (i.e., lichens) or microfungal (e.g., rusts and smuts) taxa, and all the names associated with these genera were then flagged for removal from the dataset.
Once these ‘data cleaning’ steps (see
Figure 1 and
Figure 2) were completed, we then tallied the number of records associated with each valid and current macrofungal binomial. From this tally, we were able to determine and plot, the most abundantly documented taxa from the state selected (e.g., the ‘Top-50’ most abundant taxa of macrofungal species for Colorado, in this case) as well as recover the more rarely documented macrofungal taxa, which included those binomials associated with only one or two collection records from the state. As the NatureServe conservation status assessment methods uses the ‘element occurrence’ concept to approximate species populations for organisms, we further determined if binomials associated with two records were based on observations or specimens collected at a distance further than one kilometer apart, a standard NatureServe convention for delineating distinct species populations of plants [
41]. This step used the Haversine formula available in the Pracma R package to determine metric linear distance, when both records included georeferenced data. When latitude/longitude were not available for the records, we indicated that the records represented two distinct fungal element occurrences (i.e., different populations) when the records were recorded from different counties in the state (viz. a reasonable, generalized estimate).
In the final step of this automated pipeline process, we compared the list of putatively rare taxa to the fungal species included in the IUCN Red List, with those data downloaded from the Red List website (
https://www.iucnredlist.org) as a csv file. This later step allowed us to generate a ‘rapid list’ of what were considered to be rare and possibly threatened macrofungal species from the state, with these needing further investigation. Finally, we exported the list of processed (viz. ‘clean’) macrofungal records and their associated metadata from the R environment as a .csv file for further use in conservation status ranking calculations (see below) within the NatureServe framework. The Colorado data generated via the automated bioinformatic pipeline are publicly available (
http://www.stbates.org/supplemental.html), while the R script for the pipeline can be accessed via GitHub (
https://github.com/stbates/automated_bioinformatic_pipeline/).
For the conservation status ranking calculations, we used a ‘RankingMetricRules.xlsx’ Microsoft Excel spreadsheet and accompanying Python programing language [
42] script for ‘bulk’ calculation (referred to here as the ‘bulk calculator’) of range extent for occurrences (EOO), area of occupancy for occurrences (AOO), and number of element occurrences, which are related to the ‘rarity’ factor category in the NatureServe conservation framework [
32] (see Table 9 in that publication). Since the bulk calculator is designed to work rapidly in an automated manner without manual review from a qualified biologist, it only ranks taxa according to rarity, bypassing the ‘threats’ and ‘trends’ factor categories that require informed assessment and/or additional data. From these calculated values, each valid and current macrofungal binomial included in the processed records file generated via the pipeline (see Macrofungi_of_Colorado_Condensed.csv in the public directory) was assigned a conservation status rank. The bulk calculator used a one km separation distance (see above) for clustering species into hypothetical element occurrences and assigned conservation status ranks of S1–S5 (i.e., with the potential to be ‘critically imperiled’, ‘imperiled’, ‘vulnerable’, ‘apparently secure’, or ‘secure’, respectively, according to rank) based on rarity [
32] (see Table 10 in that publication).
The bulk calculator required a user license for ArcGIS Pro (Esri, Redlands, California) to perform the special calculations needed to estimate the number of distinct element occurrences according to the georeferenced data for each record. The ranking rules for the bulk calculator relied on the ‘RULES’ sheet within the NatureServe Element Rank Estimator [
41] Microsoft Excel macro workbook (
https://www.natureserve.org/products/conservation-rank-calculator/download). The bulk calculator used an equal-area global geospatial projection from the IUCN EOO Calculator ArcMap Tool to ensure that results were comparable to IUCN’s GeoCAT [
43] (
https://nc.iucnredlist.org/redlist/content/attachment_files/EOO_Calculator_v1.5.zip). The calculations were run on a computer where the user was logged into an ArcGIS Pro account before running the script, which also required a Python interpreter for handling. The Colorado Natural Heritage Program (CNHP) has been previously used with the bulk calculator to assess conservation status for arthropods, bryophytes, and vascular plants in the state. The Python script and bulk calculator (and related files) are publicly available (
https://github.com/chollenb-cnhp/BulkRarityCalculator/).
3. Results and Discussion
The specimen and observation record data gathering step in our automated bioinformatic pipeline recovered 117,006 Colorado records from the MyCoPortal API. After the initial validation steps and removing names representing slime molds, there were 37,598 records that remained, with 5497 unique names being associated with these records. At this point, there were 1286 unique names that were problematic (i.e., validation below the level genus was not possible), with 588 of these being species-level names and 698 being infraspecific-level names, putatively.
After the next round of processing (i.e., fuzzy matching, further validation at or below the species level, and removal of records not associated with macrofungi), 36,637 records (~31% of the original Colorado records) remained in the complete dataset for Colorado that were associated with 3081 unique, valid, and current macrofungal taxonomic names. Of these records, 24,464 (~67%) represented specimen-based records and 12,082 (~33%) represented observation-based records, with 91 (<1%) records providing no ‘basis of record’ data. These numbers demonstrate the importance of the historical specimen-based data, largely made available through fungal specimen digitization funding provided by the NSF, but they also highlight the valuable contribution of community scientists in providing data relevant for conservation purposes. Overall, there was roughly a 56% reduction in the unique names, removing those that were misspelled, not ultimately representing fungal groups, and collapsing all synonymous names to the current name recognized by Index Fungorum. The pipeline also removed all records associated with taxonomic names of lichenized fungi (10,862 records) and microfungi (43,682), as well as slime molds (9847). An additional 1230 records were removed that were associated with 476 unique names that remained problematic in one way or another. For example, some of these names represented plant species (e.g., Pinus edulis) mistakenly entered in the MyCoPortal, while others consisted of ‘herbarium names’ (e.g., Amanita stannea nom. prov.) that have never been validly published.
The ’Top-50’ most abundant macrofungal taxa from Colorado (i.e., those represented by the most specimen- and observation-based records) are given in
Figure 3. These numbers are hard to analyze given the potential for various collectors to focus on different taxa for one reason or another, such as
Amanita muscaria (the most abundantly collected species in Colorado), which has conspicuous sporocarps as well as ethnomycological uses. However, these abundance data provide inventories that can be instructive for intuitional collections, such as for identifying species that may be over-represented in specimen numbers. Conversely, our approach focused on rarely documented taxa for conservation purposes. Analysis processing in the automated bioinformatic pipeline produced a list of 1613 putatively rare macrofungal taxa in Colorado, which were taxonomic names associated with one or two element occurrences (
Table 1). Of these, 1099 represented only one element occurrence, while 216 represent two distinct element occurrences. An additional 251 taxa had two non-distinct element occurrences represented by two different records collected at distances less than one kilometer apart. For all of these putatively rare taxa, 30 were also found among the fungi included on the IUCN Red List. There were an additional 46 macrofungal taxa that were more abundant (i.e., represented by more than two records) and were also included in the Red List (
Table 1). These recovered data represented a rapid, automated assessment of macrofungal element occurrences within the NatureServe and Red List conservation frameworks. This informative rarity and Red List status assessment can then be used to focus future collecting efforts to both confirm these results and provide further guidance for monitoring species populations and distributions. Monitoring fungal populations is an important endeavor, especially considering that fungi are ideal indicator candidates for further focus and conservation efforts [
30]. The combination of our pipeline rarity assessment with the previously established Red List conservation status was central to this step in our approach as this assessment provides a method for quickly identifying candidate species for further critical examination (see example below).
The bulk calculator further refined assessment more broadly across the recovered records according to the NatureServe scoring methodology for ranking, providing conservation status rank assessments for 2438 macrofungal taxa (79% of the original 3081 unique names) in Colorado that were part of the dataset processed in the pipeline and that included georeferenced data required for the ranking calculations. These taxonomic names were ranked for state conservation status within Colorado as follows: 9 ‘apparently secure’ (S4), 187 ‘vulnerable’ (S3), 433 ‘imperiled’ (S2), 1809 ‘critically imperiled’ (S1), and 0 taxa were found to be ‘secure’ (S5). The high number of imperiled and critically imperiled fungal taxa in bulk calculator ranking highlights the need to include fungi more intensively in statewide, as well as national, conservation programs, but they also indicate gaps in knowledge for fungal diversity preventing more robust conservation assessments. These gaps can only be addressed through more thorough biodiversity inventorying and monitoring efforts. Ranking via the bulk calculator was, however, more or less in line with our rapid rarity and Red List status assessment, suggesting there are over one thousand fungal taxa in Colorado that need additional scrutiny. More broadly, our overall assessment of fungal species in the state provides important additional data that can be used to evaluate taxa for incorporation within the NatureServe Explorer Biotics database for conservation purposes. In addition, these data assessments can provide guidance in other areas of conservation such as providing sets of fungal names for further evaluation under the IUCN’s Red List, especially after applying the methodology developed here to assess rarity more broadly across the nation. While there is currently a web-based GeoCAT platform from the IUCN that calculates several component factors in NatureServe’s conservation status assessment, GeoCAT requires uploading individual species data, one at a time, and is not designed for processing multi-species datasets for an entire taxonomic group. This is the first time the CNHP has used an automated systematic approach to assess conservation status en masse across a single group. Historically, species-by-species and manual approaches have been used, which are labor intensive, consequently leaving acute gaps in our understanding of conservation needs. The most innovative feature of the bulk calculator is that it can be used across different taxonomic groups, while Symbiota software (version 3.3.0), the platform used by MyCoPortal, is involved in hosting biodiversity data for a wide range of groups, such as bryophytes (e.g.,
https://bryophyteportal.org), lichens (e.g.,
https://lichenportal.org), and vascular plants (e.g.,
https://swbiodiversity.org). Thus, the automated bioinformatic pipeline presented here could be modified to focus assessment on these non-fungal groups, while minor adjustment in the R script for the pipeline could easily produce assessments for other fungal groups, such as lichens or microfungi.
Mueller and colleagues [
22] stressed that ‘our knowledge of the threat status of fungi remains woefully incomplete’. Our automated bioinformatic pipeline approach was able to rapidly provide abundance and rarity metrics for fungal species populations in Colorado for conservation status assessment under the NatureServe and IUNC Red List frameworks. This test case also demonstrates the potential for using the pipeline on a state-by-state basis in more comprehensively analyzing one of the largest biodiversity datasets for fungi compiled to date in the MyCoPortal. Accordingly, this more global approach could go a long way in helping produce a more complete ‘threat status’ assessment for fungi. Conversely, while our approach did easily and rapidly provide assessment metrics for conservation purposes, the real work of ground truthing the resulting data is just beginning. The legacy and complexity of specimen and observational data, such as those observed here, provide additional challenges in this area; however, these obstacles are not insurmountable, particularly when specimens are available in institutional collections for study. For example,
Craterellus cornucopioides (the black trumpet mushroom) was found among the rarely documented macrofungal taxa reported from Colorado that was also included on the Red List for threatened species with a ‘Least Concern’ status. The range for
C. cornucopioides extends primarily from the East Coast of the U.S. to the Midwest, with some representation on the West Coast; thus, a report from the Rocky Mountain region was noteworthy. After examining the specimen (DBG-F-023109), it was determined that this collection represents
Polyozellus multiplex, which is known to be mycorrhizal with the abundant spruce and firs found in Colorado. Conversely, another mycorrhizal Red List taxon reported from Colorado was
Hydnellum mirabile (
Figure 4 and
Figure 5), which has ‘Vulnerable’ status within Europe due to destruction by the logging of habitats for its conifer host species. While the occurrence of this taxon in Colorado and other parts of the U.S. has been verified through voucher specimens, it is not reported from the U.S. on the IUCN Red List; thus, there is a key gap in the assessment of this species for conservation purposes at the international level. This is an example of a species that would be an excellent candidate for targeted collection within Colorado, so that its conspecificity with the European populations could be confirmed through detailed morphological, ecological, and genetic analysis from fresh material.
The approach demonstrated here also holds potential to offer actionable data metrics for conservation planners and land managers that could directly inform conservation prioritization and policy development within any state. For example, the resultant conservation metrics and associated biodiversity metadata could be combined with those of plants and/or animals for integration into spatial prioritization tools (e.g., Marxan [
44] and Zonation 5 [
45]) for identifying areas of high conservation value. Similarly, regions exhibiting high concentrations of rare or endemic fungal taxa could be designated as microrefugia or hotspots requiring urgent protection. Land managers could also use the abundance and rarity data generated through the pipeline to tailor forest management interventions, where areas supporting rare fungi could warrant low-impact management (e.g., selective thinning rather than clear cutting), restrictions on soil disturbance, or the establishment of buffer zones. Conversely, areas with high fungal abundance but low rarity may support sustainable use or educational/ecotourism initiatives without compromising conservation value. Data from our pipeline also hold the potential to inform environmental policy by contributing to Red List evaluation, habitat restoration goals, and biodiversity offsetting calculations. For example, rarity-weighted fungal scores can be used to evaluate potential habitat loss or guide reintroduction efforts for sensitive taxa. Furthermore, integration of fungal conservation status metrics into existing biodiversity indices would help align fungal conservation with national biodiversity targets and international frameworks, such as the CBD post-2020 targets [
46]. Finally, our automated bioinformatic pipeline approach holds the potential to facilitate longitudinal monitoring, as with the ability to track changes in fungal rarity and abundance over time, managers could evaluate the effectiveness of conservation actions, detect early signs of ecosystem degradation, and adjust strategies proactively.
4. Conclusions
The automated bioinformatic approach presented here represents an important step in moving toward a more robust assessment of fungal biodiversity for conservation purposes. This methodology can now be applied to generate a baseline of fungal species abundance and rarity more broadly across the United States. With this information, species monitoring efforts may begin, allowing land managers and conservation organizations to more readily incorporate key fungal species into their work. The data from fungal monitoring efforts will then be used to build a more comprehensive fungal conservation assessment and add to our understanding of global vs. statewide rarity and strengthen data on fungal biodiversity, ecology, and phenology.
The examples presented here constitute important illustrations for the type of work that needs to be carried out across the U.S. This paper highlights the need for targeted collecting that focuses on addressing key knowledge gaps, especially for fungal populations with more critical Red List status. While our results are limited with the restricted approach for the Colorado test case, they demonstrate a methodology that can be easily and rapidly scaled and show how, with little effort, critical information can be gathered. Further, the approach can be adapted for application to other groups, more narrowly within the fungi, or more broadly for non-fungal group, especially where biodiversity documentation relies on Symbiota software. As we refine and apply our methods, the community of established partnerships (i.e., NatureServe, the Network of Natural Heritage Programs and Conservation Data Centers, the IUCN, etc.) can begin to build a more globally relevant assessment of conservation status for fungi. Given the substantial importance of these organisms for maintaining ecosystem function and their role in the survival of other species, the time is ripe to expand the scope of fungal conservation assessment.