Building of an Internal Transcribed Spacer (ITS) Gene Dataset to Support the Italian Health Service in Mushroom Identification

This study aims at building an ITS gene dataset to support the Italian Health Service in mushroom identification. The target species were selected among those mostly involved in regional (Tuscany) poisoning cases. For each target species, all the ITS sequences already deposited in GenBank and BOLD databases were retrieved and accurately assessed for quality and reliability by a systematic filtering process. Wild specimens of target species were also collected to produce reference ITS sequences. These were used partly to set up and partly to validate the dataset by BLAST analysis. Overall, 7270 sequences were found in the two databases. After filtering, 1293 sequences (17.8%) were discarded, with a final retrieval of 5977 sequences. Ninety-seven ITS reference sequences were obtained from 76 collected mushroom specimens: 15 of them, obtained from 10 species with no sequences available after the filtering, were used to build the dataset, with a final taxonomic coverage of 96.7%. The other 82 sequences (66 species) were used for the dataset validation. In most of the cases (n = 71; 86.6%) they matched with identity values ≥ 97–100% with the corresponding species. The dataset was able to identify the species involved in regional poisoning incidents. As some of these species are also involved in poisonings at the national level, the dataset may be used for supporting the National Health Service throughout the Italian territory. Moreover, it can support the official control activities aimed at detecting frauds in commercial mushroom-based products and safeguarding consumers.


Introduction
Fungi represent a clade of eukaryotic heterotrophic organisms with a vast ecological and economic impact and whose diversity is likely to be in the millions of species [1,2]. Some of them, belonging to the macro-fungi group, can produce mushrooms, distinctive fruiting bodies from an underground mycelium [3], which have always been consumed worldwide. Wild mushrooms are collected in developing countries and rural communities to contribute to their diet or for local financial benefits [4][5][6]; on the other hand, a small group of mushroom species is cultivated on an industrial scale and marketed at global level. Mushrooms are increasingly appreciated since they fully fit in the current food trends addressing health and sustainable choices. They have, in fact, optimum nutritional values, as gas chromatography-ion trap mass spectrometry of mushroom metabolites [26] or protein profiling based on matrix-assisted laser desorption/ionization mass spectrometry (MALDI-TOF MS) [27], among others.
However, DNA analysis complementing phenotypic observations is the method of choice for specific identification of mushroom species, thanks to its rapidity and relatively low cost [21,28]. Current methods are almost entirely based on analyses of PCR-amplified nuclear rRNA genes, particularly the Internal Transcribed Spacer (ITS) region [2], which has been chosen as a universal DNA barcode marker for fungi due to its high inter-species variation [29]. The ITS region (~600-800 bp) contains two non-coding regions (ITS-1 and ITS-2) separated by the highly conserved small subunit 5.8S rRNA gene [30]. In general, this marker is sequenced and compared to public reference DNA libraries, that are imperative tools for taxonomic assignments. Crucial to this approach is a comprehensive and validated reference set of DNA sequence data from target species to provide accurate and verifiable molecular identification [31]. Several fungi ITS sequences are available in the NIH genetic sequence database, GenBank (https://www.ncbi.nlm.nih.gov/genbank/, accessed date 24 May 2021). In addition, the Barcode of Life Data System (BOLD) database (http://www. boldsystems.org/, accessed date 24 May 2021) is used. GenBank and BOLD, containing the genetic data of millions of specimens, are in fact the reference public databases for the analysis of all organisms [32]. However, pitfalls in both these online depositories are reported, mainly related to the degree of taxon coverage they offer, and the reliability of the sequence-associated identification [33][34][35][36][37]. The need for a well-validated dataset containing accurate sequences has been strongly remarked upon for ensuring the proper identification of fungi [2,22,28,38], since the success of the analysis greatly depends on the technical quality and the correct taxonomic identification of the deposited sequences [28]. Therefore, a preliminary analysis (filtering) aimed at assessing the reliability of the official DNA databases and the creation of curated sequence databases is fundamental to achieve the model of fungi sequence-based identification [2,22]. In particular, the development of an ITS dataset targeting specific groups of mushrooms represents a widely used approach for a rapid identification via barcoding [25].
This study represents the first attempt at building and validating an ITS gene dataset for supporting the National Health Service in mushroom species identification, especially focusing on those responsible for poisoning cases.

Target Species: Selection Criteria
The following criteria were used for selecting the target species to be included in the ITS gene dataset: (1) Mushroom species involved in poisoning incidents in Tuscany during the ten-year period 2007 to 2017 and reported in a list provided by the Tuscany Poison Control Centre (Tus-PCC) ( Table 1) (when only the genus was reported, all the species included in that genus and present in Tuscany were included). The correctness of the species nomenclature in the Tus-PCC list was verified by consulting the online database Index Fungorum (http://www.indexfungorum.org/, accessed date 24 May 2021), owned by the International Mycological Association and constantly updated with all the mycological nomenclatural novelties. Thus, for target species listed with obsolete or nonvalid nomenclature, the valid name was used; (2) species morphologically similar to those reported in the Tus-PCC list and present in the Tuscany Region; and (3) edible mushroom species commonly collected/harvested and other mushroom species (edible or not) widely distributed in Tuscany. Points 2 and 3 were performed by an expert mycologist belonging to TMGA (https://www.agmtmicologia.org/, accessed date 24 May 2021) and mycological inspectorates, which also contributed to the categorization of the selected target species into edible (E), suspected not-edible (SNE), not-edible (NE), toxic (T), and mortal (M).

ITS Sequences Retrieval from Public Genetic Databases
For each target species, all the available ITS sequences were searched and retrieved from GenBank (https://www.ncbi.nlm.nih.gov/genbank/, accessed date 24 May 2021). Additional ITS sequences were retrieved from BOLD (http://www.boldsystems.org/, accessed date 24 May 2021), only if not reporting the info "mined from GenBank", to avoid retrieving duplicates.

Systematic Double-Step ITS Sequences Filtering and Intra-Species Divergence Estimation
All sequences retrieved from the databases were systematically filtered. In detail, only sequences presenting pre-determined inclusion criteria (reported in the following sections) were maintained in the dataset, while the others were discarded.

First Step: Sequence Quality and Technical Check
Sequences were discarded if: (1) declared as "unverified"; (2) reporting biological naming conventions between the genus name and the species name such as "aff." (abbreviation for affinis) or "cf." (abbreviation for confer), which are qualifiers used in taxonomy to indicate different degrees of uncertainty of identification [39]; (3) not including both the ITS-1 and ITS-2 regions; (4) shorter than 550 bp in length; since the ITS region length varies from~600-800 bp in fungi [30], 550 bp was considered as the limit length under which sequences might result in being not informative enough; and (5) presenting IUPAC degenerate base symbols (over 4 N, B, D, H, and V and over 10 R, Y, S, W, K, and M) or homopolymers runs errors, that are associated with a scarce sequence quality. In this phase, sequences submitted with obsolete names were renamed with valid ones. Sequences belonging to species variants were attributed to the species level. All the sequences retained after the first filtering step were aligned with Bioedit 7.0 software [40]. A phylogenetic analysis was conducted in MEGA7 [41] by using the Neighbor Joining (NJ) method based on the Kimura 2 parameter model [42] with 1000 bootstrap replicates. The NJ method was selected as used in GenBank and BOLD for producing distance trees of results. Clusters with bootstrap values ≥ 70% were considered as strongly supported [43]. Sequences clustering with species or genus different from the respective cluster (with bootstrap values ≥ 70%) were queried against the Basic Local Analysis Search Tool (BLAST) on GenBank. They were discarded if they showed identity values ≥ 97-100% [25] with species different to the one declared.

Intra-Species Divergence Estimation
For each target species (for which more than one ITS sequence was maintained after the double-filtering process), the intra-species mean divergence was estimated using MEGA7 software [41].

Mushroom Samples Collection and Morphological Identification
The collection of fresh mushroom samples was performed during 2019 and 2020 by the TMGA in Tuscan territory. Only specimens from target species of the ITS gene dataset were collected. The collection was mainly focused on species for which no sequences or only one sequence was available after the ITS sequences' systematic filtering (Section 2.3), for increasing the ITS gene dataset taxonomic coverage, and on species included in the Tus-PCC list, more commonly involved in poisoning cases. All the samples were morphologically identified by the TMGA. The specimens were photographed in situ. Info on the specimens' collection sites and growth substrate were also annotated. Collected specimens were registered with an internal code and stored at −20 • C until further analysis.

Total DNA Extraction and Evaluation
Total DNA extraction was performed starting from~20 mg of tissue following the protocol described by Armani et al. [44]. The quality and quantity of the DNA extracted from each sample were determined with a NanoDrop ND-1000 spectrophotometer (Nan-oDrop Technologies, Wilmington, DE, US). Each DNA sample was stored at −20 • C until further analysis.

ITS Region Amplification, Sequencing, and Sequence Editing
The ITS region was amplified with the primers ITS1 (5 -TCCGTAGGTGAACCTGCG-3 ) and ITS4-B (5 -CAGGAGACTTGTACACGGTCCAG-3 ) [30,45] using the following PCR protocol: 20 µL reaction volume containing 2 µL of a 10× buffer (BiotechRabbit GmbH, Berlin, Germany), 100 mM of each dNTP (Euroclone Spa, Milano, Italy), 200 nM of forward primer, 200 nM of reverse primer, 1.0 U PerfectTaq DNA Polymerase (BiotechRabbit GmbH, Berlin, Germany), 100 ng of DNA, and DNase free water (Euroclone Spa, Milano, Italy). The following cycling program was applied: denaturation at 95 • C for 3 min; 35 cycles at 95 • C for 30 s, 54 • C for 30 s, and 72 • C for 30 s; and final extension at 72 • C for 7 min. Five microliters of each PCR product was checked by gel electrophoresis on a 2% agarose gel. The amplification of fragments of the expected length was assessed by making a comparison with the standard marker SharpMass™ 50-DNA ladder (Euroclone Spa, Milano, Italy), and the concentration of PCR products by making a comparison with the intensity of the bands of the DNA ladder. Positive reactions were sequenced. Chromatograms were checked, searching for putative reading errors, and these were corrected. The ITS reference sequences were deposited in GenBank (http://www.ncbi.nlm.nih.gov/genbank/, accessed date 24 May 2021). The ITS reference sequences obtained in this study from specimens belonging to species with no sequences available after the double-step filtering process were included in the ITS gene dataset constructed in Section 2.3 and the overall taxonomic coverage (number of species finally present in the database/number of target species) was calculated. The final ITS gene dataset was uploaded and used in Geneious Prime version 2020.0.4 [46].

Evaluation of ITS Region Efficiency in Species Identification
The ability of the ITS region to discriminate among target species included in the ITS gene dataset-and especially those causing poisonings at the national level (Tus-PCC list) -was evaluated according to Badotti et al. [28], who proposed an extensive comparative analysis based on the probability of correct identification of all Basidiomycota sequences deposited in GenBank using a selectively filtered dataset. Accordingly, they classified the ITS performance in species identification for genera of Basidiomycota into the following three distinct categories: good, intermediate, or poor [28].

ITS Gene Dataset Validation
For the validation process, all the reference ITS sequences obtained from this study (except those used to set up the ITS gene dataset in Section 2.5) were submitted to a BLAST analysis against the ITS gene dataset in Geneious Prime version 2020.0.4 [46]. An identity value ≥ 97-100% was selected as the threshold for species identification [25]. Results were discussed according to a comparison with outcomes from Section 2.5.
The Tus-PCC list reported 448 poisoning cases (Table 1), involving 41 species (78.8%) and 11 genera (21.2%). These were related to 38 poisoning cases (8.5%), highlighting the limits of the phenotypic identification during human intoxication. Often, mycologists are called to analyze clinical samples or leftovers of meals, where the key features of the whole specimens may lack. Moreover, despite the data referring to a recent past, obsolete names were provided for 9 of the 41 mushrooms species (21.9%) ( Table 1). This particularly involved the genus Boletus, with three species-B. lupinus, B. luridus, and B. pulchrotinctus -re-classified as Rubroboletus lupinus, Suillellus luridus, and S. pulchrotinctus (Table 1). In fact, recent phylogenetic studies led to major taxonomic rearrangements of the Boletaceae family, introducing several new genera and leaving in the genus Boletus, in its new restricted sense, only B. edulis and closely related species [48].
Overall, a total of 109 species (all Basidiomycota) were selected as targets for the ITS gene dataset from the Tus-PCC list: the 41 reported at the species level and 68 species (toxic or not, and present in the territory) included in the 11 genera. In detail, 21 species were included for Russula spp., 4 for Inocybe spp., 6 for Lepiota spp., 3 for Clitocybe spp., 2 for Entoloma spp., 4 for Macrolepiota spp., 1 for Panaeolus spp., 6 for Ramaria spp., 8 for Agaricus spp., 11 for Amanita spp., and 2 for Rhodocollybia spp. (Table S1). The remaining 133 target species were selected by the criteria reported in points 2 and 3 in Section 2.1.

ITS Sequence Retrieval from Public Databases
Overall, 7270 sequences were found, of which 6715 (92.4%) were found in GenBank and 555 (7.6%) in BOLD systems ( Figure 1; Table S1). Many sequence records appear in both the databases, which are intended to be complementary; in fact, the sequences submitted independently to GenBank are transmitted to BOLD periodically, while records from BOLD are submitted to GenBank when they are published [37]. In this phase, a taxonomic coverage of 93.8% was observed, with ITS sequences available for 227 out of the 242 target species. GenBank showed a taxonomic coverage of 92.6%, with sequences available for 224 species. The taxonomic coverage of BOLD could not be calculated, as the GenBank duplicate sequences were not considered (Section 2.2). However, absence of ITS sequences in this database was observed for many species during the collection phase (data not shown). BOLD factually poorly contributed to the overall taxonomic coverage (only three additional species). This evidence confirms the fact that this genetic database, although containing more curated data, is still not adequately populated [28]. In this respect, the BOLD handbook reported that the number of fungal sequences is relatively limited compared to the number of animal sequences, and thus a successful species match can be improved only if new sequences are added to the database (https:// v3.boldsystems.org/index.php/resources/handbook?chapter=2_databases.html, accessed date 24 May 2021). 242 target species. GenBank showed a taxonomic coverage of 92.6%, with sequences available for 224 species. The taxonomic coverage of BOLD could not be calculated, as the Gen-Bank duplicate sequences were not considered (Section 2.2). However, absence of ITS sequences in this database was observed for many species during the collection phase (data not shown). BOLD factually poorly contributed to the overall taxonomic coverage (only three additional species). This evidence confirms the fact that this genetic database, although containing more curated data, is still not adequately populated [28]. In this respect, the BOLD handbook reported that the number of fungal sequences is relatively limited compared to the number of animal sequences, and thus a successful species match can be improved only if new sequences are added to the database (https://v3.boldsystems.org/index.php/resources/handbook?chapter = 2_databases.html, accessed date 24 May 2021).
Cases of sequences deposited with obsolete names were observed. As also mentioned in Section 3.1, this aspect especially involved the re-classification of some Boletus spp. in other genera (Caloboletus spp.; Neoboletus spp., Rubroboletus spp.; Suillellus spp., Suillus spp., etc.).   Cases of sequences deposited with obsolete names were observed. As also mentioned in Section 3.1, this aspect especially involved the re-classification of some Boletus spp. in other genera (Caloboletus spp.; Neoboletus spp., Rubroboletus spp.; Suillellus spp., Suillus spp., etc.).

Systematic Double-
Step ITS Sequences Filtering, Intra-Species Divergence Estimation, and Evaluation of ITS Region Efficiency 3.3.1. First Step: Sequences Quality and Technical Check This step allowed us to discard 1087 sequences (14.9%), reducing the number of sequences to 6183 ( Figure 1; Table S1). All the inclusion criteria of this step were also considered in the ITS data filtering process performed by Badotti et al. [28], together with the check for the ITS sequences from permanent collections whose taxonomic identifications were curated by specialists (voucher specimens). Although the need for sequences to be associated with accurate specimen data and current species names had already been suggested [22], we decided to also include sequences from non-voucher specimens for assessing the actual data reliability within both public databases and also with the aim to expand the species taxonomic coverage of the dataset. Indeed, several cases of poor-quality sequences belonging to voucher specimens were highlighted in both GenBank and BOLD during the technical check. A number of sequences deposited with obsolete, erroneous, or imprecise names, so-called "dark taxa" [49], were encountered in both databases. This was more understandable in GenBank, since it essentially relies on users to accurately name their sequences [22].

Second Step: Phylogenetic Analysis
During the sequence alignment, one not-aligning sequence was found. The BLAST analysis proved that this sequence, deposited as Ramaria botrytis ITS (GenBank accession n.: KY626150), presented identity values > 99% with sequences of the R. botrytis large subunit RNA ribosomal region, proving it was wrongly deposited. The sequence was discarded, and the subsequent analysis was performed on the 6182 remaining sequences. In this step, 205 sequences (3.3%) clustered separately from the respective species/genus with bootstrap values ≥ 70% (some examples were given in Figure 2). Results from the BLAST analysis of these sequences show identity values ≥ 97-100%, mostly with other co-generic species (n = 168; 81.9%), e.g., the sequence AJ131127, deposited as Agaricus xanthodermus, showed 100% identity values with sequences of A. arvensis and ≤ 89.3% with all the A. xanthodermus sequences. Additionally, cases of matching with species from other genera were found (n = 37; 18%), e.g., the sequence MH855192, deposited as Lacrymaria lacrymabunda, showed > 99% identity values with Psathyrella candolleana and ≤ 84.1% with all the L. lacrymabunda sequences. As observed in Section 3.3.1, sequences from voucher specimens were also involved. Thus, these 205 sequences were discarded and 5977 sequences (Figure 1), belonging to 224 species from 58 genera, 23 families, 7 orders, 2 classes, and 2 phyla, were finally maintained in the ITS gene dataset. In this phase, the dataset showed an overall species coverage of 92.6% (224 out of the 242 target species) and included 12 (5.4%) mortal M, 72 (32.1%) T, 32 (14.3%) NE, 21 (9.4%) SNE, and 87 (38.8%) E species. The Basidiomycota order alone included 233 species belonging to 56 genera.  [42] with 1000 bootstrap replicates on ITS sequences maintained after the first filtering step. Examples of sequences which clustered separately from the respective species/genus with bootstrap values ≥ 70% (highlighted in yellow) were therefore removed from the dataset. In brackets, the number of ITS sequences presenting an intra-species variability < 0.01 with respect to the indicated sequence is reported.
Considering that in our study most of the target species belonged to Basidiomycota, the intra-species divergence value appeared in line with the values observed for this order [53].

Mushroom Sample Collection
Overall, 97 specimens from 76 target species (all Basidiomycota) were collected by the TMGA (Table 2), of which 31 (40.8%) were T, 25 (32.9%) E, 11 (14.5%) NE, 5 (6.6%) Figure 2. Three sections of the Neighbor Joining (NJ) phylogram constructed using the Kimura 2 parameter model [42] with 1000 bootstrap replicates on ITS sequences maintained after the first filtering step. Examples of sequences which clustered separately from the respective species/genus with bootstrap values ≥ 70% (highlighted in yellow) were therefore removed from the dataset. In brackets, the number of ITS sequences presenting an intra-species variability < 0.01 with respect to the indicated sequence is reported.
Overall, 1293 sequences (17.8%) were discarded at the end of the systematic filtering process (Figure 1). Both sequences from GenBank and BOLD were involved, despite the latter being commonly considered as more reliable, containing sequences that surely belong to vouchered specimens (sequences must in fact fulfil some requirements, such as voucher data, collection record, and trace files that are instead not mandatory in GenBank) [28]. Issues in public databases have been already reported: Nilsson et al. [50] highlighted a certain lack of reference libraries' accuracy, referring to the fact that about 20% of the entries deposited in GenBank were incorrectly identified to the species level, and that the majority of entries lacked descriptive and up-to-date annotations, especially regarding the taxonomic names; Schoch et al. [29] estimated that only approximately 50% of the ITS sequences that are deposited in public databases are annotated at the species level. Despite these considerations dating back to a decade ago, many of the mentioned issues still affect public databases.
Considering that in our study most of the target species belonged to Basidiomycota, the intra-species divergence value appeared in line with the values observed for this order [53].

Mushroom Sample Collection
Overall, 97 specimens from 76 target species (all Basidiomycota) were collected by the TMGA (Table 2), of which 31 (40.8%) were T, 25 (32.9%) E, 11 (14.5%) NE, 5 (6.6%) SNE, and 4 (5.3%) M. Sampling from this study confirmed that misidentification may occur when toxic mushrooms appear like edible ones [16]. Tricholoma sejunctum (T), for example, can look identical to T. scalpturatum (E), in shape, form, and color ( Figure 3). Fifteen specimens from 10 species (Amanita caesaria, Cantherellus ferruginascens, Cortinarius semisanguifluus, Desarmillaria tabescens, Macrolepiota venenata, Russula vinosobrunnea, Tricholoma basirubens, T. bresadolanum, T. gausapatum, and T. quercicola) out of the 18 for which no sequences were available after the systematic filtering process were collected. For the remaining eight species (Table S1), the collection was hindered by time limits and by difficulties in the specimen retrieval due to the specific picking season. The dataset will be further implemented in the future (see Section 3.6). Additionally, 12 specimens from 6 species (Amanita excelsa, Infundibulicybe mediterranea, Lepista glaucocana, Russula nobilis, R. torulosa, and Tricholoma fracticum) were collected. Twenty-nine species out of the 76 collected (38.2%) were among those selected from the Tus-PCC list (Section 3.1). In particular, almost all the species involved in the highest number of poisonings (Table  1) Table 2. Samples collected in this study (species and specimens number) and obtained ITS reference sequences used to set up and validate the ITS gene dataset. * sequences matching uniquely with the corresponding species but with identity values < 97%; ** sequences simultaneously matching with the corresponding species and one other of the same edibility, with identity values ≥ 98%.

DNA Extraction, ITS Region Amplification, and ITS Reference Sequences Production
The extraction protocol proposed by Armani et al. [44], although developed on seafood, was proven effective on fungal matrices. A good DNA quality and concentration was observed in all the samples, as the spectrophotometric analysis confirmed medium-high yield and quality (A260/A280 and A260/A230 ratio > 2.0) for all the collected samples (data not shown). The target ITS region was successfully amplified from all the DNA samples. The primer pair ITS1 and ITS4-B was selected since the primer ITS4-B [30], when paired with the universal primer ITS1 [45], efficiently amplifies DNA from all Basidiomycetes and Ascomycetes [30]. All the ITS amplicons were successfully sequenced. Overall, 97 sequences belonging to 76 species were produced (Table 2). After the editing phase, the sequences' length ranged from 591 to 772 bp, in line with the length range reported by Gardes and Bruns [30]. almost all the species involved in the highest number of poisonings (Table 1) w lected: Entoloma sinuatum, Omphalotus olearius, Russula spp. (we collected R. caerul R. heterophylla (E), R. nobilis (NE), R. persicina (NE), R. queletii (T), R. torulosa (T), R brunnea (E)), Rubroboletus satanas, Agaricus xanthodermus, Macrolepiota venenata, phalloides, A. muscaria, Inocybe spp. (we collected I. geophylla (T)), and Lepiota spp. lected L. brunneoincarnata (M), L. clypeolaria (T), and L. ignivolvata (T)).  Table 2. Samples collected in this study (species and specimens number) and obtained IT ence sequences used to set up and validate the ITS gene dataset. * sequences matching un with the corresponding species but with identity values < 97%; ** sequences simultaneous matching with the corresponding species and one other of the same edibility, with identit ≥ 98%.  The production of ITS reference sequences of species for which no sequences or only one sequence was available in the databases after the systematic filtering process allow us to calculate the intra-species overall mean divergence of an additional 10 species. In particular, the average intra-specific variability was 0% for Cantharellus ferruginascens, Cortinarius semisanguifluus, Infundibulicybe mediterranea, Macrolepiota venenata, and Tricholoma gausapatum, 1% from T. fracticum, 2% for Russula nobilis and R. torulosa, 4% for Lepista glaucocana, and 27% for Amanita excelsa. This latter result looks weird, especially considering that the observed average intra-specific variability of Amanita spp. ranged from 0 to 7% (Table S1). Given the low number of available ITS sequences for these additional 10 species (2-4 sequences each), these outcomes should be carefully considered, and the intra-species divergence should be further investigated.

Building of the Final ITS Gene Dataset
Of the 97 reference sequences, the 15 produced from the 10 species (Section 3.4.2) for which no sequences were available after the systematic filtering process were included in the ITS gene dataset for implementing its taxonomic coverage, while the other 82 (belonging to 66 species) were used for the validation phase ( Figure 1; Table 2).

Evaluation of ITS Region Efficiency in Species Identification
Although the ITS region is not equally variable in all groups of fungi [28,52,54,55], most fungal species have been identified based on this region and, consequently, most available sequences belong to this commonly used marker [28,55]. This means that, regardless of several limitations, the ITS region will likely remain the main marker of choice for fungal identification in the immediate future [22]. For these reasons, it was selected as a marker in this study.
As a general rule, a species was considered successfully identified if the minimum inter-specific distance was larger than its maximum intra-specific distance [56]. Applying the categories suggested by Badotti et al. [28] to the target species of this study, the ITS region was considered as a good marker for 19 out of the 56 (32.7%) Basidiomycota target genera considered in this study (Table S2). For the other 10 (17.8%) and 2 (3.6%) genera, the identification performance was considered as intermediate and poor, respectively. The ITS species identification performance was instead not evaluated by Badotti et al. [28] for 23 genera (41.1%) herein included (Table S2). This is probably due to the continuously evolving re-classification and allocation of mushroom genera.
Despite the ITS region being proven as a good marker only for a moderate percentage of the target genera included in this study, it should be noted that these included a plurality of the target species (n = 110; 47.2%); contrariwise, a lower number of species belonged to the genera included in the intermediate (n = 53; 22.7%) and poor (n = 18; 7.7%) categories (Table S2). Additionally, for most of the 109 species selected from the Tus-PCC list (Section 3.1) (n = 75; 68.8%) the ITS region can be considered as a good discrimination marker. For the remaining species, ITS performance was classified as intermediate in 18 species (16.5%), poor in only 1 species (0.9%), and not evaluated in 15 species (13.8%). Thus, especially considering the use for which the ITS gene dataset is intended, namely a tool for supporting the species identification in poisoning cases, we considered these findings particularly promising. To confirm this, a validation process was performed, as reported in the following section.

Validation of the Final ITS Gene Dataset
The ITS dataset validation process was performed in Geneious Prime version 2020.0.4 [38] by BLAST analysis of 82 reference sequences from 66 species (19 E, 4 SNE, 11 NE, 29 T, and 3 M) ( Table 2) which remained after removing the 15 sequences from 10 species used to improve the dataset taxonomic coverage (Section 3.5.1). Doing this, we also validate mushroom identifications by comparing molecular results with morphological identifications performed by an expert mycologist. In fact, only properly identified and labeled sequences should be used as references for accurate fungal identification [57].
In most cases (n = 71; 86.6%), the reference sequences properly matched with identity values ≥ 97-100% uniquely with the corresponding species. Five reference sequences (6.1%) matched uniquely with the corresponding species but with identity values < 97% (Table 2). This could not be explained by the observed average intra-species variability, ranging from 0 to 1% (Table S1). Given the low number of sequences available for these species (Table S1), the estimated intra-species mean divergence may not always be reliable, and more reference sequences should be produced. Additionally, four sequences (4.9%) simultaneously matched with the corresponding species and with others, with identity values ≥ 98%. The two sequences of A. excelsa showed higher identity values with A. pantherina than with the only available co-specific sequence. These two sequences were not deposited in GenBank. In fact, also considering the observed intra-species variability (Section 3.4.2), further investigation is needed. According to these outcomes, 55 out of the 66 species (83.3%) used to validate the ITS gene dataset were unequivocally allocated to a species. These included all the M and NE species and almost all the T (28 out of 29) and SNE (3 out of 4) species (Figure 4).
A. mellea and allied species ranking first [17]. Many of the mentioned species were also frequently involved in poisoning episodes in other EU countries, together with othe strictly local species [11,15,[60][61][62].
Therefore, the ITS gene dataset, opportunely integrated, may also be used by officia laboratories at both the national and European level, which only have to include se quences from reference local species for guaranteeing the territorial coverage.  At the end of the validation, 95 sequences from 75 species were deposited in GenBank (GenBank accession numbers MZ005473-MZ005566).
Despite the validation process being successful for most of these species, it is appropriate to observe that the ITS gene dataset is not 100% effective in identifying mushroom species. As already discussed, this is attributable to some limits of the genetic marker itself. Thus, the use of a multigene analysis relying on comparison of at least three loci had already been proposed for fungi identification [58].
However, by putting together the outcomes from the validation process and those from the evaluation of ITS region efficiency (Section 3.5.2), the ITS gene dataset was proved as particularly efficient in identifying the species mostly involved in poisoning cases. Overall, 373 out of the 448 poisoning cases (83.2%) involved T species; most of them (n = 310; 83.1%) referred to Entoloma sinuatum, Omphalotus olearius, and Clitocybe nebularis (Table 1). Eleven poisoning cases (2.9%) were related to M species, mostly (n = 7; 63.3%) to the well-known Amanita phalloides, which is one of the most common poisonous mushrooms, responsible for 90% of human fatal cases globally [59]. However, 15 poisoning cases (4%) were even related to E species, especially Armillaria mellea (n = 6; 40%), confirming the potential toxicity of any mushroom in certain circumstances [12,14]. This aspect highlights that the control system cannot only rely on the knowledge of a restricted group of "popular" toxic species, but it must necessarily take into consideration any possible mushroom species present in the territory for guaranteeing its effectiveness. Interestingly, many analogies were observed by comparing the Tus-PCC list with the national poisoning cases, as the involved species generally overlap, and the major number of poisonings are caused by E. sinuatum, O. olearius, and C. nebularis also at the national level [17,18]. Likewise, for the most part, the edible species that were proved able to cause poisonings with gastrointestinal syndrome at the national level are the same as of the Tus-PCC list, with A. mellea and allied species ranking first [17]. Many of the mentioned species were also frequently involved in poisoning episodes in other EU countries, together with other strictly local species [11,15,[60][61][62]. Therefore, the ITS gene dataset, opportunely integrated, may also be used by official laboratories at both the national and European level, which only have to include sequences from reference local species for guaranteeing the territorial coverage.

Conclusions
The ITS gene dataset built in this study represents the first attempt at collecting reliable data that could support mushroom species identification, especially for species responsible for poisoning. Its versatility, related to the possibility to constantly update the data, and consequently increase the taxonomic coverage and expand the application field from time to time, undoubtedly represent a strength. In addition, the availability of a dataset "depurated" from erroneous sequences would accelerate the interpretation of the results obtained after the query of official samples. In this light, the ITS gene dataset can represent a valid tool for the National Health Service to quickly react to poisoning incidents by the analysis of clinical samples. This is crucial, as a proper detection of the species involved in the poisoning can guide the most appropriate medical treatment. Moreover, the database can support the official control activities aimed at verifying the actual market condition, detecting fraud incidents, and safeguarding consumers. While many studies aimed at food authentication, in particular seafood, are available [63], few studies applying DNA barcoding for authentication of commercial mushroom products sold on the market have been conducted so far. Raja et al. [25] analyzed mushrooms used as food and/or dietary supplement, while Jensen-Vargas et al. [64] analyzed dried and fresh fungi that were sold in New York City supermarkets. Another recent survey analyzed more than 3500 mushroom samples collected in 35 countries across Yunnan Province [65]. This scarcity of studies could be due to the well-known difficulties in identifying mushrooms species such as, among others, the fact that the correct identification is entirely dependent on the availability of reliable sequences in public databases, the capacity of correct interpretation of BLAST results, and the resolution of the ITS marker alone for a precise identification [35]. It should be noted that, besides its use in standard sequencing methods (DNA barcoding), the ITS gene dataset can also be used in metabarcoding approaches related to high-throughput sequencing technologies, which are especially useful for food authentication from complex food matrices. The ITS gene dataset development especially fits the requirement of Article 98 of Regulation (EU) No. 2017/625 [66] addressing the responsibilities and tasks of European Union reference centers for the authenticity and integrity of the agri-food chain, which shall be responsible for different tasks, among which, where necessary, is establishing and maintaining collections or databases of authenticated reference materials.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/foods10061193/s1, Table S1: Target species of the ITS gene dataset with relative edibility, Table S2: Classification of the genera from Basidiomycota based on the ITS species identification performance reported by Badotti et al. [23] and comparison with target genera and species from this study.