Evaluating the Assembly Strategy of a Fungal Genome from Metagenomic Data: Solorina crocea (Peltigerales, Ascomycota) as a Case Study

García-Muñoz, Ana; Pino-Bodas, Raquel

doi:10.3390/jof11080596

Open AccessArticle

Evaluating the Assembly Strategy of a Fungal Genome from Metagenomic Data: Solorina crocea (Peltigerales, Ascomycota) as a Case Study

by

Ana García-Muñoz

^1,2,*

and

Raquel Pino-Bodas

^1,2,*

¹

Area of Biodiversity and Conservation, Department of Biology and Geology, Physics and Inorganic Chemistry, University Rey Juan Carlos, Móstoles, 28933 Madrid, Spain

²

Global Change Research Institute (IICG-URJC), University Rey Juan Carlos, Móstoles, 28933 Madrid, Spain

^*

Authors to whom correspondence should be addressed.

J. Fungi 2025, 11(8), 596; https://doi.org/10.3390/jof11080596

Submission received: 30 June 2025 / Revised: 1 August 2025 / Accepted: 9 August 2025 / Published: 15 August 2025

(This article belongs to the Section Fungal Genomics, Genetics and Molecular Biology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The advent of next-generation sequencing technologies has given rise to considerably diverse techniques. However, integrating data from these technologies to generate high-quality genomes remains challenging, particularly when starting from metagenomic data. To provide further insight into this process, the genome of the lichenized fungus Solorina crocea was sequenced using DNA extracted from the thallus, which contains the genome of the mycobiont, along with those of the photobionts (a green alga and a cyanobacterium), and other associated microorganisms. Three different strategies were assessed for the assembly of a de novo genome, employing data obtained from Illumina and PacBio HiFi technologies: (1) hybrid assembly based on metagenomic data; (2) assembly based on metagenomic long reads and scaffolded with filtered mycobiont long and short reads; (3) hybrid assembly based on filtered mycobiont short and long reads. Assemblies were compared according to contiguity and completeness criteria. Strategy 2 achieved the most continuous and complete genome, with a size of 55.5 Mb, an N50 of 148.5 kb, and 519 scaffolds. Genome annotation and functional prediction were performed, including identification of secondary metabolite biosynthetic gene clusters. Genome annotation predicted 6151 genes, revealing a high number of genes associated with transport, carbohydrate metabolism, and stress response.

Keywords:

de novo genome assembly; lichenized fungi; metagenomics; functional annotation

1. Introduction

Lichens are complex symbiotic associations between a fungus (the mycobiont) and a green alga or cyanobacterium (the photobiont) [1,2]. Recent research has revealed that lichen thalli harbor a high biodiversity of microorganisms, including bacteria and other fungi [3,4,5,6,7,8]. Several photobionts have also been detected in a single lichen thallus [9]. This presents significant challenges in generating de novo whole genomes from mycobionts. The isolation and axenic culture of mycobionts is a difficult and frequently unsuccessful task [10,11], which has led to sequencing genomes primarily using metagenomic techniques [12,13,14,15,16,17,18,19]. However, the extraction of the mycobiont genome from a metagenomic assembly is not straightforward and depends on databases available for the taxonomic assignment [13]. Although there is a wide variety of bioinformatic programs and pipelines for this purpose, their accuracy strongly relies on the quality of the starting metagenomes [20]. For instance, MetaWRAP [21] and EasyMetagenome [22] are user-friendly pipelines developed to isolate and annotate genomes from metagenome short read data.

Third-generation sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have been demonstrated to increase the continuity of genomes assembled by several hundred-fold [23,24]. Despite the existence of numerous genomes from lichenized fungi, only a limited number of these have been generated using long read sequencing [15,18,25,26,27]. Consequently, the number of chromosomal-level assemblies for these fungi remains limited. However, when data are derived from metagenomics, where diverse organisms are present in uneven proportions, sequencing long reads can also lead to highly fragmented genomes [28]. Consequently, the integration of long and short read sequencing technologies emerges as a promising approach for generating high-quality genomes from metagenomic data. This is because it combines the advantages of short reads, which provide high base accuracy, and long reads, which provide high genome connectivity [29,30]. In contrast, producing similar results using only long reads requires deep sequencing, which leads to an increase in the cost of genomes [31].

There are two main strategies for carrying out the assembly of a genome by combining short and long reads. One is to use short and long reads simultaneously in a hybrid assembly. The second one performs an assembly using the long reads and uses the short reads to polish the generated assembly. Comparisons between these two assembly strategies have been made in bacteria with data from organisms isolated in culture [32,33]. It is difficult to extrapolate and predict which pipeline outperforms others in terms of quality of genomes between different taxonomic groups because it largely depends on the data structure and target genome [34]. A few genomes of lichenized fungi were yielded using both sequencing technologies [15,35], and they used different assembly strategies. Therefore, we are unable to ascertain which of the two strategies is able to generate a higher-quality lichenized fungal genome. The assessment of different assembly methods for generating high-quality genomes of lichenized fungi from metagenomic data can serve as a framework for guiding future research initiatives.

The order Peltigerales encompasses approximately 1300 species [36], including some macrolichens characterized by their notably large thalli. Many of the species of this order have been associated with mature, well-preserved forests with stable environmental conditions. These lichenized fungi establish symbiosis with cyanobacteria, bipartite lichens, or have a green alga as the primary photobiont and a cyanobacterium as secondary photobiont, or tripartite lichens [37]. In the latter case, the generation of high-quality mycobiont genomes from metagenomic data may present an even harder challenge than in bipartite lichens.

In this study, a comparative analysis is conducted on three assembly strategies for the combination of long and short reads from the metagenomics of Solorina crocea (L.) Ach., a tripartite lichen symbiosis. The three strategies are: (a) hybrid assembly using long and short reads; (b) assembly based on long reads, using short reads for scaffolding and polishing the genome; and (c) hybrid assembly using reads from the mycobiont, after a filtration process. The prediction of repetitive elements and putative genes, as well as their functional annotation, was conducted in the final assembly, providing new genetic resources of this species for future research.

2. Materials and Methods

2.1. Characterization of Solorina crocea

Solorina crocea is a terricolous species characterized by a foliose thallus with an orange undersurface associated with the green algae Coccomyxa and with the cyanobacterium Nostoc [38]. It inhabits snow-beds and is widely distributed across arctic–alpine regions of Eurasia and North America [39,40,41,42]. It is a rare species on the Iberian Peninsula, although the populations consist of numerous individuals [43].

2.2. Fungal Material

A few specimens were collected in the Puerto de la Quesera, Segovia (41°12′44″ N, 3°25′10″ W), which is located in the Ayllón massif, belonging to the Sistema Central, at an altitude of above 1700 m. The vegetation is composed of Erica australis L. heathland, and the higher areas are dominated by grassland on an acidic substrate. From a climatic perspective, the Ayllón massif is characterized by being the wettest in the Sistema Central, influenced by the Golfo de Vizcaya and the Northern Iberian System, with a considerable increase in summer rainfall [44]. Consequently, this area presents a high diversity of plant and lichenized fungi species with a boreal and boreo-alpine distribution [45,46,47]. The specimens studied have been deposited in the MACB herbarium (MACB 132680).

2.3. DNA Extraction and Quantification

Specimens were carefully cleaned to ensure the complete removal of all residual substrate. Prior to the DNA extraction, the lichen fragments were immersed in acetone for two hours to remove the secondary metabolites. Subsequently, the acetone extracts were used to identify the secondary metabolites by thin-layer chromatography (TLC), according to the standard procedure described by [48], with two solvents A (180 mL toluene–45 mL dioxan–5 mL acetic acid) and C (170 mL toluene–30 mL acetic acid). The fragments were ground using 3 mm glass beads with a Precellys^® 24 (Bertin Technologie, Berlin, Germany) tissue homogenizer. DNA extractions were carried out using the 2% cetyltrimethyl ammonium bromide (CTAB) method [49,50], and the DNA was resuspended in Tris-Cl (10 mM). The DNA extractions were quantified using a Quantus^TM Fluorimeter (Promega, Madison, WI, USA), yielding a concentration ≥ 45 ng/µL of DNA for all extractions. The ITS rDNA region was amplified for all extractions using the protocol described in [51], and nucleotide BLAST searches v2.15.0 [52] were conducted to remove contaminations.

2.4. Library Preparation and Sequencing

Libraries and sequencing were carried out at Macrogen (Macrogen Inc., Daejeon, South Korea, www.macrogen.com). In order to generate the whole genome of the target species, a combination of short (Illumina Technology, San Diego, CA, USA) and long reads (PacBio Technology, Menlo Park, CA, USA) sequencing was carried out. The TruSeq Nano DNA Kit (Illumina, Inc., San Diego, CA, USA) was used to prepare the short read library and sequenced with 150 bp paired-end reads in NovaSeq 6000. Sequencing was designed to obtain 80× coverage using short reads and 10× coverage using long reads.

2.5. Quality Control and Reads Filter

Prior to assembly of the Illumina reads, the adapters and low-quality regions of Illumina reads were removed with Trimmomatic v0.39 [53]. The specific settings established the quality threshold of sliding windows of four bases with a minimum quality of 15, leading and trailing bases with a quality below 3 were removed, and reads with a length below 36 bp were discarded. In the case of long reads, no quality filtering or error correction step was performed prior to assembly.

The universal fungal barcode ITS sequence (ITS1 and ITS2) was extracted using a Python script (extractITS.py; https://github.com/fantin-mesny, accessed on 29 June 2025) based on ITSx v1.1.3 software [54] to confirm the identity of the sequences by a BLASTn search [55].

2.6. Benchmarking of Strategies for Genome Assembly

2.6.1. Assembly Strategy 1: Hybrid Metagenome Assembly Using Short and Long Reads

Illumina and PacBio HiFi reads were used simultaneously as input for running a hybrid assembly with metaSPAdes v4.0.0 [56] using the default settings (Figure 1).

A taxonomic assignment of the metagenome contigs was made as a previous step for filtering mycobiont contigs. This was conducted using DIAMOND [57], using Uniref90 [58] as a database and BLAST v2.15.0 searches (e-value cutoff = 1 × 10⁻²⁵) using a customized database. To elaborate on the latter database, all Peltigerales genomes available at JGI and NCBI were inspected using GC plots, which indicate whether the genome may be chimeric, including contigs from other organisms. In total, the database includes 13 genomes of Lecanoromycetes, five of which belong to Peltigerales (Table S1). The mycobiont contigs were retained using BlobTools v.1.1.1 software [59]. It consisted of the clustering of contigs according to taxonomy assignment, GC content, and coverage. As inputs, we use a file containing read coverage data generated using Minimap2 v.2.28-r1209 [60] and two taxonomic assignments files coming from DIAMOND v2.1.10 and BLAST v2.15.0 searches. The first round of BlobTools generated a summary of read coverage, GC content, and taxonomic identification at the phylum level of all metagenomic contigs. Contigs that matched “Ascomycota” or “no-hit” were retained using the seqkit v2.9.0 function [61]. Contigs with unusually high or low GC content were eliminated as they are likely to be contaminants that have been incorrectly assigned to Ascomycota. To ensure the removal of any remaining contaminants, a second round of BlobTools was implemented using the filtered contigs from the first run as input.

2.6.2. Assembly Strategy 2: Genome Assembly Using Long Reads and Scaffolding Using Short Reads

Three different assemblers, which support metagenomic data, were used to assemble PacBio HiFi long reads: Canu v2.2 [62], metaFlye v2.9.5-b1801 [63], and Hifiasm-meta v0.3-r074 [64]. Hifiasm-meta was developed exclusively for metagenomic data from PacBio HiFi reads [64]. The other assemblers support several sequencing technologies, and a specific configuration of settings for the management of PacBio HiFi reads was selected. In addition, the assemblies from metagenomes previously generated were merged in pairs using the Quickmerge software [65] (Figure 1).

The quality of each metagenome resulting from different assemblers (Canu, metaFlye, and Hifiasm-meta) and their combinations by Quickmerge (Canu + Hifiasm-meta, Canu + metaFlye, and Hifiasm-meta + metaFlye) was evaluated by comparing the number of contigs > 500 base pairs, the length of the largest contig, and the N50 contig. These metrics were calculated in QUAST v5.0.2 [66]. In the assessment, the genome size was not considered because the fungal genome size is overestimated in the metagenome. Additionally, BUSCO (Benchmarking Universal Single-Copy Orthologs; [67]), with the Ascomycota_odb10 database, was used to assess the completeness of metagenomes. Regarding the N50 and BUSCO values, the metagenome resulting from the merging of Hifiasm-meta and metaFlye assemblies achieved the best quality and was selected for the subsequent filter steps that generated the mycobiont assembly (Table 1).

The mycobiont contigs were retained from the selected metagenome by combining two approaches. The first approach was based on running two rounds of BlobTools, following the steps outlined in Strategy 1. The other approach to filtering mycobiont contigs from the metagenome was based on BUSCO v5.8.2. The metagenome was filtered using the complete sequences of BUSCOs to ensure that Ascomycota core genes were retained in the filtered contigs. The contigs obtained with both approaches were combined into a unique file. Duplicated contigs were subsequently removed with a seqkit command. The mycobiont contigs were scaffolded with short reads to obtain a final mycobiont assembly.

2.6.3. Assembly Strategy 3: Hybrid Assembly Based on Mycobiont Reads Previously Filtered

Long and short reads belonging to the mycobiont were used to generate a new hybrid assembly using SPAdes v4.0.0 [68,69]. Mycobiont long reads were extracted from the raw PacBio HiFi reads by mapping them to filtered contigs (obtained with BlobTools and BUSCO as detailed in the previous section from Strategy 2). Mapping was performed using Minimap2.

To extract the mycobiont short reads, a metagenome was first generated by assembling Illumina short reads. We tested the performance of two different assemblers, one of them was MEGAHIT v2.19 [70], with default k-mer (33, 55, 77, and 99), and the other was metaSPAdes, also with default k-mer (33, 55, 77, 99, and 127) settings. According to the contiguity principle, the best short read metagenome assembly was obtained with MEGAHIT. Both approaches, BlobTools and BUSCO, were used to obtain the mycobiont contigs from the long read assembly, were used to filter mycobiont contigs from the MEGAHIT metagenome assembly. A file containing read coverage data was generated using BBMap v39.15 [71]. Additionally, CONCOCT [72], a metagenome binning tool, was used to cut the contigs into 10 kb sections. The binned contigs were combined with the BlobTools results and visualized with a script modified from [19] in R v.4.2.3 [73]. The script was based on the ggplot2 v.3.5.1 package [74]. Bins that matched Ascomycota with similar values for GC content and coverage were combined to generate a metagenome-assembled genome (MAG). The MAG was subjected to a second round of BlobTools to remove contaminants. Last, the results were filtered according to the criteria mentioned above by seqkit commands to extract the mycobiont contigs. These contigs were used to extract the short reads corresponding to the mycobiont by mapping the metagenomic Illumina reads against them with BBMap. These mycobiont short reads were then used in the hybrid assembly along with the mycobiont long reads in this strategy and to scaffold the assemblies generated in Strategy 2, as described below.

2.7. Scaffolding and Polishing Mycobiont Assemblies

Scaffolding and gap-closing of the mycobiont assemblies obtained with Strategies 1, 2, and 3 were performed in Redundans v2.0.1 [75]. The mycobiont short and long reads were used for genome polishing. To improve the scaffolded assemblies, polishing tools were used iteratively in subsequent rounds. First, the filtered long reads were mapped to the final assembly using Minimap2, generating the input for Racon v1.5.0 [76]. This step was repeated to run three rounds of Racon. The assembly resulting from Racon was additionally polished using the mycobiont short reads. They were mapped to the Racon assembly using BWA mem v0.7.18-r1243 [77] to generate the required file for Pilon v1.24 [78]. Mapping was iteratively performed for running three rounds of Pilon. Finally, the final mycobiont assembly was obtained.

2.8. Quality Assessment of Mycobiont Assemblies

An evaluation of the mycobiont assemblies obtained by means of different strategies was conducted using QUAST v5.0.2 [66]. This assessment was undertaken according to the same criteria as were used for the metagenome selection. Additionally, the genome completeness was assessed using (BUSCO) [79] against “Ascomycota_odb10”, which contains a set of 1706 Ascomycota core genes. The tool BUSCO was run in genome mode and with MetaEuk settings [80] as it is faster for fungi [81].

2.9. Repetitive Element Library Construction

Genome assemblies generated with the three strategies were employed to create a customized repetitive element library. Both de novo and structure-based approaches were used for the identification of transposons and other repetitive elements. First, repetitive elements were identified de novo using RepeatModeler2 v2.0.3 [82]. Then, the structure-based program MITE Tracker [83] was used to identify Miniature Inverted-repeat Transposable Elements (MITEs). The identification of other structure-based repetitive elements was achieved using several programs implemented in the Extensive de novo TE Annotator (EDTA) package v2.2.2 [84], such as HelitronScanner [85] for helitrons, TIR Learner v3 [86] for terminal inverted repeat (TIR) transposons, LTRHarvest [87], and LTR_FINDER [88] for long-terminal (LTR) elements.

We removed potential non-transposon protein-coding genes from this repetitive element library. For this purpose, a transposon-free protein database was built. First, the available proteomes of Peltigerales species (Table S1: Lobaria pulmonaria, Lobaria immixta, Pseudocyphellaria aurata, Sticta canariensis, and Peltigera leucophlebia) were combined with the curated Uniprot–Swissprot database. A Blastp search of the resulting protein database against the RepeatPeps library, implemented in RepeatMasker, was performed to remove the transposons. Then, the obtained transposon-free protein database was used to filter the non-transposon protein-coding genes from the repetitive element library by running Blastp. Finally, redundancy was reduced with CD-HIT-EST [89] by clustering the sequences with a 95% similarity threshold, obtaining the final repetitive element library. The sequences in this library were used to determine the proportion of the genome covered by repetitive elements using RepeatMasker [90,91]. This program provides the relative proportion of the two types of transposable elements (Class I or retroelements and Class II or DNA transposons) as well as other types of repetitive elements.

The landscape of divergence between repetitive elements and their consensus sequences was calculated by parsing the results from RepeatMasker by a perl script (https://github.com/4ureliek/Parsing-RepeatMasker-Outputs, accessed on 29 June 2025). This analysis estimates the CpG-corrected Kimura two-parameter distance between each repetitive element and its consensus sequence. To obtain the consensus sequence of each repetitive element, their sequences were aligned and trimmed with MUSCLE [92] and TrimAl [93], respectively. The conserved regions were extracted using Gblocks v0.91b [94], and the final consensus sequences were created with the tool ‘em_cons’ v6.6.0.0 in EMBOSS [95].

In order to compare the abundance of repetitive elements with phylogenetically related species, the same pipeline was employed to annotate the five genomes of the order Peltigerales available in NCBI and JGI. A repetitive element library was constructed for each species, which was used to calculate the proportion of their genomes that were covered by repetitive elements and to plot the landscape of repetitive elements. The completeness of these genomes was also assessed by BUSCO.

2.10. Gene Prediction and Functional Annotation

Gene prediction was performed by running four rounds in MAKER2 v3.01.04 [96]. The transcriptome of Lobaria pulmonaria (BioProject: PRJNA403314), a closely related species belonging to the same family, was used to provide RNA evidence. As protein evidence, the TE-free protein database of Peltigerales was used (see Section 2.9).

The repetitive regions of the genome were soft-masked using RepeatMasker with the customized TE database and the TE library implemented in RepeatMasker. Transcript and protein evidence were aligned to the genome using Blastx and tBlastx, respectively. Alignments were polished using the Exonerate v2.4.0 software [97] implemented in the MAKER2 pipeline. Only contigs longer than 10,000 bp were considered for annotation [98]. Ab initio gene prediction was achieved using SNAP 2006-07-28 [99] and AUGUSTUS v3.5.0 [100] pre-trained with Saccharomyces species parameters. The gene models predicted from each round were filtered to retain those with a maximum AED score of 0.25 and a minimum length of 50 bp for training SNAP in the subsequent rounds. The fourth round of MAKER2 generated the final set of predicted gene models.

The functional annotation of the predicted genes was conducted using a range of databases implemented in the FunAnnotate v1.8.17 pipeline [101]. The Pfam domains and Gene Ontology (GO) terms were detected by InterProScan v5.73-104.0 [102], while proteases were identified by MEROPS [103] and carbohydrate-active enzymes or CAZymes by dbCAN [104]. Additional annotations were added using eggNOG-mapper (emapper v.2.1.12) [105] based on eggNOG orthology data [106] and DIAMOND for sequence searches. More detailed functional descriptions of genes were obtained by BLAST searches against the UniProtKB database. Secondary metabolite gene clusters were predicted using antiSMASH v7.1.0 [107].

The number of genes per scaffold was extracted from the annotation file. The GC content and length of each scaffold were calculated using seqkit. Gene density was calculated by dividing the number of genes on each scaffold by its length. Gene densities were plotted in R using the ggplot2 v.3.5.1 package [74].

3. Results

Illumina and PacBio HiFi sequencing generated a total of 27,029,432 and 71,378 reads, respectively. Illumina reads yielded consistent 151 bp paired-end short reads, while PacBio HiFi generated reads with an average length of 5134 bp. Assembly strategies comprised both types of reads, generating genomes with contrasting statistics.

3.1. Mycobiont Assembly Resulting from Strategy 1

The metagenome assembly generated with Strategy 1 was characterized by a high degree of fragmentation, with a high number of contigs (Table 1). However, it presents a high N50 value (Table 1). The mycobiont assembly resulting from this strategy comprised 4157 scaffolds > 500 bp, with an N50 value of 63.23 kb (Table 2). The majority of the scaffolds were found to be shorter than 100 kb, with the shortest one being 0.7 kb (Figure S1a) and gene densities below 200 genes/Mb (Figure S1b). The largest scaffolds contained a mean of 36.3 genes (Figure S1c). The largest scaffold was 347.7 kb, with a GC content of 36.1% (Figure S1d) and a gene density of 106 genes/Mb. BUSCO analysis indicated a completeness of 95.4% for this assembly, corresponding to the presence of 1627 (94%) complete single-copy orthologs.

The annotation predicted a total of 7352 genes (Table 3), with >50% of genes predicted to be well-supported (AED score < 0.1). The BUSCO analysis yielded an annotation completeness estimate of 89.4%. Functional annotation assigned 30,404 functional terms. The total number of genes with at least an annotated function was 6411 (87.2%). A total of 19 biosynthetic gene clusters (BGCs) were identified (Table 3). A total of 10,136 repetitive elements were identified, covering 22.53% of the mycobiont genome. Retrotransposons and DNA transposons were present at similar frequencies, covering 11.34% and 10.26% of the genome, respectively (Table S2).

3.2. Mycobiont Assembly Resulting from Strategy 2

All metagenome assemblies using exclusively long reads showed similar values of GC content and length of the largest contig. The assembly produced by metaFlye exhibited the greatest fragmentation, with an elevated number of contigs. The assemblies generated by merging the assemblies produced by Canu + metaFlye or Hifiasm-meta + metaFlye exhibited reduced fragmentation and enhanced continuity. Despite the Canu + metaFlye assembly exhibiting a lower number of contigs and the largest contig being larger than the Hifiasm-meta + metaFlye assembly, its N50 and BUSCO values were lower, indicating worse genome completeness. Therefore, the Hifiasm-meta + metaFlye assembly was considered the optimal quality for the generation of the mycobiont assembly.

This strategy yielded a mycobiont genome with higher quality than the other strategies in terms of continuity, containing 519 scaffolds > 500 bp and higher values of N50. Most scaffolds were shorter than 200 kb (Figure S2a). The largest scaffold comprised 547.6 kb, with a GC content of 35.5% (Figure S2d). Most scaffolds presented gene densities greater than 150 genes/Mb (Figure S2b), and the largest scaffolds (500 kb–1 Mb) contained a mean value of 50 genes (Figure S2c). The completeness analysis, based on BUSCO, estimated a completeness of 96.7%, corresponding with 1633 (95.7%) conserved single-copy orthologs (Table 2).

The annotation predicted a total of 6151 genes (Table 3), with more than 60% of the predicted genes being well-supported. BUSCO analysis estimated an annotation completeness of 91.6%. Functional annotations yielded a total of 52,648 functional terms annotated with different databases (Table S3). The total number of genes with a predicted function annotation was 5649 (91.8%). AntiSMASH identified 18 biosynthetic gene clusters. A total of 813 repetitive elements were identified, covering 22.18% of the mycobiont genome (Table 3).

3.3. Mycobiont Assembly Resulting from Strategy 3

The mycobiont assembly resulting from this strategy was very fragmented, comprising 6280 scaffolds > 500 bp and with the lowest N50 values (Table 2). Most scaffolds were shorter than 50 kb (Figure S3a). The largest scaffold has 140.1 kb and a GC content of 42.5% (Figure S3d). The gene density was 186 genes/Mb. Most of the scaffolds have a gene density below 200 genes/Mb (Figure S3b), while the largest scaffolds contained a mean of 19 genes (Figure S3c). This assembly showed the lowest completeness, as indicated by the BUSCO analysis (Table 2).

Gene prediction identified a total of 2767 genes, and more than 60% of these were well-supported. BUSCO analysis estimated an annotation completeness of 33.5%. Functional annotation assigned 14,611 functional terms (Table 3). The total number of genes with at least an annotation function was 2368 (85%). Ten biosynthetic gene clusters were identified using antiSMASH (Table 3).

3.4. Functional Features of the Solorina crocea Genome

These features correspond to the annotation of the mycobiont assembly resulting from Strategy 2, which had a higher quality in terms of continuity and completeness (Table 2 and Table 3). The annotation of the assemblies from Strategies 1 and 3 can be found in Supplementary Materials (Table S3).

The number of DNA transposons identified was greater than that of retroelements. However, retroelements represented a higher percentage of the genome. Specifically, retrotransposons covered 7.65% of the total genome, while DNA transposons covered 5.85%. The most abundant subclass of retrotransposons was the LTR elements, which constituted 7.01% of the whole genome and were classified mainly into Gypsy/DIRS1 and Ty1/Copia. Other non-LTR retrotransposons, such as LINEs, comprised only 0.64% of the retrotransposons, while SINEs were absent. Five superfamilies of DNA transposons were identified: Tc1-IS630-Pogo (3.64%), hobo-Activator (0.41%), MULE-MuDR (0.13%), PiggyBac (0.11%), and Tourist/Harbinger (0.04%). The proportion of repetitive elements that remained unclassified was 7.20%. Simple and low complexity repeats were in a very low proportion (0.62 and 0.18%, respectively) while no satellites were found. The landscape of divergence between the most frequent groups of repetitive elements, including the unclassified elements, showed a low level of divergence (<10%) compared to their consensus sequence (Figure S4). The proportion of repetitive elements found in the genomes of other Peltigerales species was: 18.06% in Sticta canariensis, 12.85% in Pseudocyphellaria aurata, 31.17% in Peltigera leucophlebia, 14.07% in Lobaria pulmonaria, and 15.92% in Lobaria immixta. Retroelements represented the major fraction of repetitive elements in all genomes (Table S2). The landscape of repetitive elements also showed a low divergence in all these species (Figure S4).

InterPro annotation, which comprised several databases, retrieved the majority of functional terms, especially from Pfam domains and Gene Ontology (GO). More than half of the genes matched GO terms categorized as molecular function, followed by the biological processes category (Figure 2a). The most frequently functionally annotated terms within these categories were those involved in transport activity, carbohydrate metabolism, and protein activity (Figure 2b). Specifically, the more frequent terms found in the biological processes category were “transmembrane transport”, “translation”, “protein phosphorylation”, and “carbohydrate metabolic process” while the more frequent terms within the molecular function category were termed as “ATP binding”, “protein binding”, “RNA binding”, and “structural constituent of ribosome” (Figure 2b). Pfam annotation mainly yielded protein domains for kinases, transporters, and stress-related genes (Figure 3). The most frequent term was “protein kinase domain”. Specific transport-related domains were noted as “sugar (and other) transporter”, “Major Facilitator Superfamily”, and “Ammonium Transporter Family”, while stress-related domains were noted as “Cytochrome P450” and “TCP-1/cpn60 chaperonin family” (Figure 3). Detailed functional descriptions from UniProtKB complemented the general processes identified by GO terms and Pfam domains. These descriptions involved stress responses, activation of signal transduction, and protein folding and stability (Figure 4). In particular, the most frequent terms were noted as “T-complex protein 1 subunit epsilon”, “Ribosome-associated molecular chaperone SSB1”, “Peroxisomal hydratase-dehydrogenase-epimerase”, “Glycogen synthase kinase 1”, “ER-derived vesicles protein ERV14”, “Elongation factor 3”, “Chitin synthase export chaperone”, and ATP-dependent DNA helicase II subunit 2” (Figure 4). Similar terms as those found in the other annotations were also identified by a homology search; for example, “Cytochrome P450 monooxygenase ORF9”, “Ammonium transporter 1”, and “GTP-binding protein ypt1” (Figure 4).

The genome analysis in antiSMASH of S. crocea revealed 18 regions containing biosynthetic gene clusters (BGCs): 5 terpene synthases, 4 post-translationally modified peptides-like (fungal-RiPP-like), 2 polyketide synthases type I (T1PKS), 1 polyketide synthases type III (T3PKS), 2 non-ribosomal peptide synthetases (NRPS), 1 NRP-metallophore/NRPS, 1 NRPS/indole, and 2 NRPS/T1PKS. The detected 18 regions cover 784 kb, comprising only 1.4% of the genome.

4. Discussion

The assembly of genomes from metagenomic data is a considerably more complex process than conventional assembly, which starts from an isolated genome. The presence of species at different abundances and species phylogenetically related within the sample that share genomic regions represent the most challenging factors during the process of genome assembly from metagenomic data [28,108]. Nevertheless, metagenomic techniques have become essential for the sequencing of genomes of certain organisms, including lichenized fungi. In this study, the de novo genome of Solorina crocea was generated from metagenomic data that were sequenced using PacBio and Illumina technologies. This assembly was compared with those generated in previous studies of other Peltigerales species (see Table S1). Based on continuity and completeness criteria, this is currently the best quality genome reported for the order.

4.1. Assessing Different Assembly Strategies and Other Considerations

It is postulated that genomes assembled using multiple sequencing techniques generate better quality assemblies than those assembled using only short or long reads [109,110]. In accordance with the strategies employed in preceding studies for genome assembly combining short and long reads [30,109,111], three strategies were utilized in the present study. Our results clearly demonstrate that the assembly method has a great influence on its quality. In this study, starting from the same dataset, very different results were obtained, in terms of continuity and completeness, depending on the strategy and assembler used (Table 1, Table 2 and Table 3). The findings of this study indicated that the most effective strategy was Strategy 2, which generated an assembly based on long reads, followed by improvement through scaffolding and polishing processes with the filtered mycobiont reads. Despite that, hybrid strategies have been shown to produce high-quality assemblies in other studies, even using metagenomic data [111]. The results obtained from our analyses indicated that both Strategy 1 and Strategy 3 resulted in highly fragmented assemblies, in contrast to Strategy 2. One plausible explanation for the fragmentation of the genome resulting from Strategy 1 is the retention of numerous short contigs. Indeed, the total number of scaffolds contained within this assembly was 10,136, most of them < 500 bp. Nevertheless, the final assembly of Strategy 1 showed higher quality metrics than the assembly generated by Strategy 3, as indicated by the number of scaffolds, N50, and BUSCO scores (Table 2). Regarding annotation, Strategies 1 and 2 appear to be similar. Indeed, the number of predicted genes is higher than that of Strategy 2 (Table 3). The most substantial discrepancy was observed in the number of repetitive elements identified, which was higher in the more fragmented assemblies. The hybrid strategy was previously employed to assemble the genome of the mycobiont Lasallia pustulata from metagenomic data [35]. The assembly obtained was of a notably high quality, with 49 scaffolds and an N50 = 1.8 Mb. The observed discrepancies with our results may be attributable to the depth of long read sequencing. As demonstrated in [35], the sequencing coverage was critical for the quality of the final mycobiont assembly. Other studies have also proved the dependence of the number of scaffolds on the coverage of long reads sequenced by PacBio HiFi [20]. Nevertheless, sequencing using PacBio technology remains expensive in comparison with the costs of Illumina technology. Therefore, the addition of the Illumina reads in posterior steps, such as scaffolding and polishing, could be a highly effective strategy to improve the quality of the final genome, as implemented in this study. Scaffolding the filtered contigs with other kinds of sequencing data has been demonstrated to yield high-quality genomes in lichenized fungi; for example, the assembly of the Acarospora socialis genome resulted from an assembly based also on PacBio reads, which was scaffolded using Omni-C data [25]. In accordance with prior studies [111,112], our results indicate that the final polishing step (implemented here using Pilon) is pivotal in improving genome quality.

The critical step in Strategy 3 was the extraction of mycobiont reads after the binning process. This assembly was performed with 43.5% of the original long reads and 11.26% of the original short reads. Despite these data being derived from metagenomics, the mycobiont is the most abundant organism within the lichen thallus, and the recovery of a greater number of reads from it was to be expected. The mapping process is subject to bias associated with the accuracy of aligning short reads in multiple positions [113].

Regarding the assemblers used in Strategy 2, large differences were observed among the assemblies. The marked differences in assembly performance between different long read-based assemblers are to be expected and have been demonstrated by extensive benchmarking works [112,114,115]. They may be explained by their efficiency in managing metagenomic data. For example, metaFlye performs well for non-uniform read coverage distribution and can assemble the genome of species highly abundant in the sample [63]. In contrast, Hifiasm-meta is based on an algorithm able to recover also low-abundant organisms [116]. Hifiasm-meta has yielded successful results in the assembly of fungal genomes, including some derived from metagenomic data [117,118,119]. Indeed, its suitability to obtain the best genomes and metagenomes from HiFi reads has been demonstrated by [120] in an exhaustive comparison across several datasets. On the other hand, the benchmarking of long reads assemblers by [112] demonstrated that differences in the genome quality achieved by different assemblers are stronger for low-depth data, which could explain the disparity found in this study.

An additional step was implemented in the pipeline by merging two assemblies from long reads. However, the best result was yielded by merging the metagenome assemblies of metaFlye and Hifiasm-meta, resulting in a less fragmented assembly with larger contigs than the original ones and with high values of N50 and BUSCO. The accuracy of these two assemblers working separately was previously demonstrated [112,120]. The combination of assemblies produced by different programs was expected to create more contiguous and accurate assemblies [65]. This strategy was the one used to assemble the genomes of two species belonging to the lichenized genus Letharia, by combining the contigs generated with metaFlye and Canu [15].

The availability of enough genomes from phylogenetically related species is critical for the step of metagenome filtering [121]. However, contamination has been detected in numerous published genomes [122]. Indeed, the customized Peltigerales database was considerably reduced because contamination was detected in many of the available genomes, including one from the species target, S. crocea. Other assemblies of Peltigerales were excluded from this database because only the metagenomes are published [123], and mycobiont scaffolds are not available.

Despite the assembly generated by Strategy 2 being less fragmented than the assemblies obtained by Strategies 1 and 3, the number of scaffolds is high. This is due to the fact that the average length of reads obtained using PacBio is relatively short (average = 5012 bp). It should be noted that the length of the fragments obtained during the DNA extraction process has a significant impact on the subsequent library and sequencing processes. The combination of HiFi with Hi-C or generated ultra-long reads (ca. 4 Mb) by Nanopore might be more efficacious if the objective is to assemble a genome at the chromosomal level. Nevertheless, these technologies can be costly and are not essential for many genomic applications. In addition, higher coverage sequencing using Illumina and PacBio could improve the S. crocea genome. Regarding the quality of the genome annotation, RNA sequencing from this species is needed to improve it.

4.2. Genomic Features of Solorina crocea

The S. crocea assembly comprises 519 scaffolds and a genome size of 55.50 Mb, which is consistent with the genome size range of other Peltigerales genome assemblies (16.85–130.50 Mb) [17,124], although the genome size of our assembly is notably smaller than another S. crocea genome (93.7 Mb) previously assembled. Unfortunately, no flow cytometry data exist for this species that would confirm which is the most accurate value. Our examination of this other assembly of S. crocea (Bioproject PRJEB77567) revealed the presence of putative non-fungal scaffolds or other artifacts (Figure S5). This finding could potentially explain the difference in genome size between the two assemblies.

Many of the functional terms found in S. crocea are commonly reported in fungal genomes [125,126,127,128,129,130], including major facilitator superfamily (MFS) and aldo-keto reductases, transmembrane transport, cytochrome P450, protein kinases, ATP binding, and chitin synthases. MFS and aldo-keto reductases are relevant in carbon metabolism [131,132]. Chitin synthases are involved in fungal cell wall formation [133], but variations in these genes have been observed between species with different lifestyles [134]. The cytochrome P450 superfamily is implicated in numerous physiological processes, including the biosynthesis of secondary metabolites [135,136]. An expansion of this superfamily has been detected in lichenized fungi belonging to Lecanoromycetes [130]. Protein kinases participate in pathways associated with stress responses in fungi [137]. It has been demonstrated that certain specific kinases are pivotal for the establishment of lichen symbiosis and are conserved across lichenized fungi [138].

Other gene families detected in the genome of S. crocea in high abundance have relevant functions in lichenized fungi. The relatively high frequency of ammonium transporters might also be associated with lichenization, as these genes have been shown to be involved during the establishment of mycobiont–photobiont contact [139]. These transporters in S. crocea could be involved in the transfer of ammonium from the cyanobacterial partners to the mycobiont [140]. Functional analysis also yielded the term noted as “sugar and other transporters”, which also comprises organic alcohols according to its definition in InterPro. Thus, it can refer to ribitol transporter, expanded in lichenized fungi [130], apart from the transporter for glucose, the carbon source that cyanobacterial photobionts transfer to mycobionts [141].

The repetitive element content in fungal genomes is highly variable, ranging from 0 to 30% [142]. Within the genomes of lichenized fungi, a considerable variation in repetitive elements content has also been observed, ranging from approximately 1% found in Gyalolechia flavorubescens to 21.26% found in Umbilicaria pustulata [125,130,143,144,145]. The proportion of repetitive elements identified in the genome of S. crocea is 22.18%, which falls within the range of lichenized fungi and within the range identified in the genome of the Peltigerales species annotated in this study (from 12.85 to 31.17%). Within this order, the variation in the content of repetitive elements in Lobariaceae genomes was less than in Peltigeraceae. However, the number of repetitive elements identified in S. crocea and P. leucophlebia was very similar (813 and 829, respectively). Regarding the proportion of each of the types of repetitive elements, the most abundant in the S. crocea genome were the LTRs, which have been found to be the most abundant in fungal genomes [142,146]. In other fungal groups, the prevalence of repetitive elements has been associated with a pathogenic lifestyle [147,148]. Solorina crocea also presents a high proportion of DNA transposons. These and the other types of repetitive elements found in S. crocea present a low divergence with respect to the consensus sequences, which was similarly found in the repetitive elements of U. pustulata [145].

Secondary metabolites in lichens can represent a significant proportion of the dry weight of the thallus [149,150,151]. Therefore, it was expected to find a high number of biosynthetic gene clusters (BGCs) in the genome of S. crocea. The majority of BGCs identified in the genome of S. crocea are associated with terpene synthesis. Numerous biological functions have been attributed to these compounds, including the establishment of symbiotic relationships [152,153]. Consequently, it is logical to predict the presence of a high number of these clusters in the genomes of lichenized fungi [144,154,155,156,157]. The two large terpene cluster groups (Clan 1 and Clan 2) identified by [156] in lichenized fungi have been detected in Peltigerales.

Additionally, BGCs related to polyketide pathways have also been detected. Some of them should be involved in the synthesis of anthraquinones, norsolorinic acid, and averantin, which have been previously reported in S. crocea [158]. RiPPs have been studied little in Ascomycota, and their biological function in lichenized fungi is unknown [159]. However, recent research has shown that RiPPs are very common in the genomes of lichenized fungi [159,160]. A total of four cluster RiPPs have been detected in the genome of S. crocea, which is within the range found in the genomes of other Peltigerales species, although their abundance is larger in other species from the order than in S. crocea [159,160]. NRPS are also widespread among a large variety of Lecanoromycetes species [26,156,159,161]. Three regions encoding PKS are present in the S. crocea genome. PKS are multifunctional enzymes involved in the synthesis of polyketides [162]. These compounds constitute the major group of secondary metabolites synthesized by lichenized fungi [162].

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jof11080596/s1, Figure S1: Characteristics of the genome assembly obtained with Strategy 1; Figure S2: Characteristics of the genome assembly obtained with Strategy 2; Figure S3: Characteristics of the genome assembly obtained with Strategy 3; Figure S4: Repeat landscapes of the genomes of S. crocea and other available Peltigerales; Figure S5: Distribution of GC content in the genomes obtained with Strategies 1, 2, and 3 and another genome of Solorina crocea previously published; Table S1: Statistics of available Peltigerales genomes; Table S2: Benchmarking of the content of repetitive elements in genomes resulting from Strategies 1, 2, and 3; Table S3: Functional annotation of the genome of Solorina crocea obtained with the different strategies; Table S4: Abundance of repetitive elements in available Peltigerales genomes.

Author Contributions

Conceptualization, A.G.-M. and R.P.-B.; methodology, R.P.-B.; formal analysis, A.G.-M.; investigation, A.G.-M. and R.P.-B.; resources, A.G.-M. and R.P.-B.; writing—original draft preparation, A.G.-M. and R.P.-B.; writing—review and editing, R.P.-B.; funding acquisition, R.P.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by Agencia Estatal de Investigación (Spain), under the program of Consolidación Investigadora, project number CNS2022-135904. R.P.-B. was supported by CAM Atracción de Talento program (2020-T1/AMB-19852). A.G.-M. was supported by the Consolidación Investigadora program (CNS2022-135904).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The genome and its structural annotation have been deposited at GenBank under the accession JBODMP000000000 (BioProject: PRJNA1262437) and will be released after publication. Code is currently stored in a GitHub repository (https://github.com/anagarciamu15/Assembly-Strategies, accessed on 29 June 2025) and will be made publicly available upon acceptance of the manuscript.

Acknowledgments

We would like to thank the Technology Support Center at the URJC, where the analyses were carried out.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BGC	Biosynthetic Gene Cluster
BUSCO	Benchmarking Universal Single-Copy Orthologs
PKS	Polyketide synthase
NRPS	Non-ribosomal peptide synthetase

References

Honneger, R. Lichen-forming fungi and their photobionts. In Plant Relationships, 2nd ed.; Deising, H.B., Ed.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5, pp. 307–333. [Google Scholar]
Hawksworth, D.L.; Grube, M. Lichens Redefined as Complex Ecosystems. New Phytol. 2020, 227, 1281–1283. [Google Scholar] [CrossRef]
Hodkinson, B.P.; Lutzoni, F. A Microbiotic Survey of Lichen-Associated Bacteria Reveals a New Lineage from the Rhizobiales. Symbiosis 2009, 49, 163–180. [Google Scholar] [CrossRef]
Grube, M.; Cardinale, M.; Berg, G. 17 Bacteria and the Lichen Symbiosis. In Fungal Associations; Hock, B., Ed.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 9, pp. 363–372. [Google Scholar]
U’Ren, J.M.; Lutzoni, F.; Miadlikowska, J.; Laetsch, A.D.; Arnold, A.E. Host and Geographic Structure of Endophytic and Endolichenic Fungi at a Continental Scale. Am. J. Bot. 2012, 99, 898–914. [Google Scholar] [CrossRef]
Spribille, T.; Tuovinen, V.; Resl, P.; Vanderpool, D.; Wolinski, H.; Aime, M.C.; Schneider, K.; Stabentheiner, E.; Toome-Heller, M.; Thor, G.; et al. Basidiomycete Yeasts in the Cortex of Ascomycete Macrolichens. Science 2016, 353, 488–492. [Google Scholar] [CrossRef]
Cometto, A.; Leavitt, S.D.; Millanes, A.M.; Wedin, M.; Grube, M.; Muggia, L. The Yeast Lichenosphere: High Diversity of Basidiomycetes from the Lichens Tephromela atra and Rhizoplaca melanophthalma. Fungal Biol. 2022, 126, 587–608. [Google Scholar] [CrossRef]
Wang, Q.; Li, J.; Yang, J.; Zou, Y.; Zhao, X.-Q. Diversity of Endophytic Bacterial and Fungal Microbiota Associated with the Medicinal Lichen Usnea longissima at High Altitudes. Front. Microbiol. 2022, 13, 958917. [Google Scholar] [CrossRef]
Dědková, K.; Vančurová, L.; Muggia, L.; Steinová, J. The Plurality of Photobionts within Single Lichen Thalli. Symbiosis 2025, 95, 35–63. [Google Scholar] [CrossRef]
McDonald, T.R.; Gaya, E.; Lutzoni, F. Twenty-Five Cultures of Lichenizing Fungi Available for Experimental Studies on Symbiotic Systems. Symbiosis 2013, 59, 165–171. [Google Scholar] [CrossRef]
Rosabal, D.; Pino-Bodas, R. A Review of Laboratory Requirements to Culture Lichen Mycobiont Species. J. Fungi 2024, 10, 621. [Google Scholar] [CrossRef] [PubMed]
Greshake, B.; Zehr, S.; Dal Grande, F.; Meiser, A.; Schmitt, I.; Ebersberger, I. Potential and Pitfalls of Eukaryotic Metagenome Skimming: A Test Case for Lichens. Mol. Ecol. Resour. 2016, 16, 511–523. [Google Scholar] [CrossRef]
Meiser, A.; Otte, J.; Schmitt, I.; Grande, F.D. Sequencing Genomes from Mixed DNA Samples—Evaluating the Metagenome Skimming Approach in Lichenized Fungi. Sci. Rep. 2017, 7, 14881. [Google Scholar] [CrossRef] [PubMed]
Pizarro, D.; Divakar, P.K.; Grewe, F.; Leavitt, S.D.; Huang, J.-P.; Dal Grande, F.; Schmitt, I.; Wedin, M.; Crespo, A.; Lumbsch, H.T. Phylogenomic Analysis of 2556 Single-Copy Protein-Coding Genes Resolves Most Evolutionary Relationships for the Major Clades in the Most Diverse Group of Lichen-Forming Fungi. Fungal Divers. 2018, 92, 31–41. [Google Scholar] [CrossRef]
McKenzie, S.K.; Walston, R.F.; Allen, J.L. Complete, High-Quality Genomes from Long-Read Metagenomic Sequencing of Two Wolf Lichen Thalli Reveals Enigmatic Genome Architecture. Genomics 2020, 112, 3150–3156. [Google Scholar] [CrossRef] [PubMed]
Tagirdzhanova, G.; Saary, P.; Tingley, J.P.; Díaz-Escandón, D.; Abbott, D.W.; Finn, R.D.; Spribille, T. Predicted Input of Uncultured Fungal Symbionts to a Lichen Symbiosis from Metagenome-Assembled Genomes. Genome Biol. Evol. 2021, 13, evab047. [Google Scholar]
Resl, P.; Bujold, A.R.; Tagirdzhanova, G.; Meidl, P.; Freire Rallo, S.; Kono, M.; Fernández-Brime, S.; Guðmundsson, H.; Andrésson, Ó.S.; Muggia, L.; et al. Large Differences in Carbohydrate Degradation and Transport Potential among Lichen Fungal Symbionts. Nat. Commun. 2022, 13, 2634. [Google Scholar] [CrossRef]
Llewellyn, T.; Mian, S.; Hill, R.; Leitch, I.J.; Gaya, E. First Whole Genome Sequence and Flow Cytometry Genome Size Data for the Lichen-Forming Fungus Ramalina farinacea (Ascomycota). Genome Biol. Evol. 2023, 15, evad074. [Google Scholar] [CrossRef]
Llewellyn, T.; Nowell, R.W.; Aptroot, A.; Temina, M.; Prescott, T.A.K.; Barraclough, T.G.; Gaya, E. Metagenomics Shines Light on the Evolution of “Sunscreen” Pigment Metabolism in the Teloschistales (lichen-Forming Ascomycota). Genome Biol. Evol. 2023, 15, evad002. [Google Scholar] [CrossRef] [PubMed]
Ghurye, J.S.; Cepeda-Espinoza, V.; Pop, M. Metagenomic Assembly: Overview, Challenges and Applications. Yale J. Biol. Med. 2016, 89, 353–362. [Google Scholar]
Uritskiy, G.V.; DiRuggiero, J.; Taylor, J. MetaWRAP-a Flexible Pipeline for Genome-Resolved Metagenomic Data Analysis. Microbiome 2018, 6, 158. [Google Scholar] [CrossRef]
Bai, D.; Chen, T.; Xun, J.; Ma, C.; Luo, H.; Yang, H.; Cao, C.; Cao, X.; Cui, J.; Deng, Y.-P.; et al. EasyMetagenome: A User-Friendly and Flexible Pipeline for Shotgun Metagenomic Analysis in Microbiome Research. Imeta 2025, 4, e70001. [Google Scholar] [CrossRef]
Amarasinghe, S.L.; Su, S.; Dong, X.; Zappia, L.; Ritchie, M.E.; Gouil, Q. Opportunities and Challenges in Long-Read Sequencing Data Analysis. Genome Biol. 2020, 21, 30. [Google Scholar] [CrossRef]
Kim, C.; Pongpanich, M.; Porntaveetus, T. Unraveling Metagenomics through Long-Read Sequencing: A Comprehensive Review. J. Transl. Med. 2024, 22, 111. [Google Scholar] [CrossRef]
Adams, J.N.; Escalona, M.; Marimuthu, M.P.A.; Fairbairn, C.W.; Beraut, E.; Seligmann, W.; Nguyen, O.; Chumchim, N.; Stajich, J.E. The Reference Genome Assembly of the Bright Cobblestone Lichen, Acarospora socialis. J. Hered. 2023, 114, 707–714. [Google Scholar] [CrossRef]
Cho, M.; Lee, S.J.; Choi, E.; Kim, J.; Choi, S.; Lee, J.H.; Park, H. An Antarctic Lichen Isolate (Cladonia borealis) Genome Reveals Potential Adaptation to Extreme Environments. Sci. Rep. 2024, 14, 1342. [Google Scholar] [CrossRef] [PubMed]
Leavitt, S.D.; DeBolt, A.; McQuhae, E.; Allen, J.L. Genomic Resources for the First Federally Endangered Lichen: The Florida Perforate Cladonia (Cladonia perforata). J. Fungi 2023, 9, 698. [Google Scholar] [CrossRef] [PubMed]
Lapidus, A.L.; Korobeynikov, A.I. Metagenomic Data Assembly—The Way of Decoding Unknown Microorganisms. Front. Microbiol. 2021, 12, 613791. [Google Scholar] [CrossRef] [PubMed]
Yang, C.; Chowdhury, D.; Zhang, Z.; Cheung, W.K.; Lu, A.; Bian, Z.; Zhang, L. A Review of Computational Tools for Generating Metagenome-Assembled Genomes from Metagenomic Sequencing Data. Comput. Struct. Biotechnol. J. 2021, 19, 6301–6314. [Google Scholar] [CrossRef]
Xu, G.; Zhang, L.; Liu, X.; Guan, F.; Xu, Y.; Yue, H.; Huang, J.-Q.; Chen, J.; Wu, N.; Tian, J. Combined Assembly of Long and Short Sequencing Reads Improve the Efficiency of Exploring the Soil Metagenome. BMC Genom. 2022, 23, 37. [Google Scholar] [CrossRef]
Eisenhofer, R.; Nesme, J.; Santos-Bay, L.; Koziol, A.; Sørensen, S.J.; Alberdi, A.; Aizpurua, O. A Comparison of Short-Read, HiFi Long-Read, and Hybrid Strategies for Genome-Resolved Metagenomics. Microbiol. Spectr. 2024, 12, e0359023. [Google Scholar] [CrossRef]
Goldstein, S.; Beka, L.; Graf, J.; Klassen, J.L. Evaluation of Strategies for the Assembly of Diverse Bacterial Genomes Using MinION Long-Read Sequencing. BMC Genom. 2019, 20, 23. [Google Scholar] [CrossRef]
Zhang, P.; Jiang, D.; Wang, Y.; Yao, X.; Luo, Y.; Yang, Z. Comparison of De Novo Assembly Strategies for Bacterial Genomes. Int. J. Mol. Sci. 2021, 22, 7668. [Google Scholar] [CrossRef]
Ekblom, R.; Wolf, J.B.W. A Field Guide to Whole-Genome Sequencing, Assembly and Annotation. Evol. Appl. 2014, 7, 1026–1042. [Google Scholar] [CrossRef] [PubMed]
Greshake Tzovaras, B.; Segers, F.H.I.D.; Bicker, A.; Dal Grande, F.; Otte, J.; Anvar, S.Y.; Hankeln, T.; Schmitt, I.; Ebersberger, I. What Is in Umbilicaria pustulata? A Metagenomic Approach to Reconstruct the Holo-Genome of a Lichen. Genome Biol. Evol. 2020, 12, 309–324. [Google Scholar] [CrossRef] [PubMed]
Lücking, R.; Hodkinson, B.P.; Leavitt, S.D. The 2016 Classification of Lichenized Fungi in the Ascomycota and Basidiomycota—Approaching One Thousand Genera. Bryologist 2017, 119, 361. [Google Scholar] [CrossRef]
Miadlikowska, J.; Lutzoni, F. Phylogenetic Revision of the genus Peltigera (lichen-forming Ascomycota) Based on Morphological, Chemical, and Large Subunit Nuclear Ribosomal DNA Data. Int. J. Plant Sci. 2000, 161, 925–958. [Google Scholar] [CrossRef]
Cannon, P.; Magain, N.; Sérusiaux, E.; Yahr, R.; Coppins, B.; Sanderson, N.; Simkin, J. Peltigerales: Peltigeraceae Including the Genera Crocodia, Lobaria, Lobarina, Nephroma, Peltigera, Pseudocyphellaria, Ricasolia, Solorina and Sticta. In Revisions of Lichen British Society; Lichen British Society: London, UK, 2021. [Google Scholar]
Galloway, D. The Lichen Genus Solorina Ach. (Peltigeraceae, Lichenized Ascomycotina) in New Zealand. Cryptogamie. Bryol. Lichénologie 1998, 19, 137–146. [Google Scholar]
Smith, C.W. The Lichens of Great Britain and Ireland; Smith, C.W., Ed.; Lichen British Society: London, UK, 2009. [Google Scholar]
Zhurbenko, M. Clypeococcum Lenae (Dothideomycetes), a New Lichenicolous Species from the Arctic, with a Key to Species of Lichenicolous Fungi on Solorina. Opusc. Philolichenum 2020, 19, 199–207. [Google Scholar] [CrossRef]
Zheng, T.; Wang, L.; Ai, M.; Gan, Y.; Fan, R.; Zhang, Y.; Worthy, F.R.; Jin, J.; Meng, W.; Zhang, S.; et al. Taxonomic Revision of Solorina (Peltigeraceae, Ascomycota), Reveals a New Genus and Three New Species. J. Fungi 2025, 11, 169. [Google Scholar] [CrossRef]
Burgaz, A.R.; Martínez, I. Estudio Del Género Solorina Ach. (Ascomicetes liquenizados) En La Península Ibérica. Bot. Complut. 1998, 55, 0214–4565. [Google Scholar]
Blanco, E.; Múgica, F.; Ollero, H.S. Encuadre Geobotánico de La Sierra de Guadarrama. Ambient. Rev. Minist. Medio Ambiente 2013, 103, 50–67. [Google Scholar]
Luceño, M.; Vargas, P. Guía Botánica del Sistema Central Español; Piramide Ediciones: Madrid, Spain, 1991. [Google Scholar]
Martinez, M.I.; Aragon, R.G. Epiphytic Lichens from the North Face of Puerto de La Quesera, Macizo de Ayllon (Central Spain). Cryptogam. Bryol. Lichenol. 1996, 17, 143–156. [Google Scholar]
Ruiz-Labourdette, D.; Martínez, F.; Martín-López, B.; Montes, C.; Pineda, F.D. Equilibrium of Vegetation and Climate at the European Rear Edge. A Reference for Climate Change Planning in Mountainous Mediterranean Regions. Int. J. Biometeorol. 2011, 55, 285–301. [Google Scholar] [CrossRef]
Orange, A.; James, P.; White, F.; Orange, G. Microchemical Methods for the Identification of Lichens; British Lichen Society: London, UK, 2001. [Google Scholar]
Doyle, J.J.; Doyle, J.L. A Rapid DNA Isolation Procedure for Small Quantities of Fresh Leaf Tissue. Phytochem. Bull. 1987, 19, 11–15. [Google Scholar]
van Burik, J.-A.H.; Schreckhise, R.W.; White, T.C.; Bowden, R.A.; Myerson, D. Comparison of Six Extraction Techniques for Isolation of DNA from Filamentous Fungi. Med. Mycol. 1998, 36, 299–303. [Google Scholar] [CrossRef]
Pino-Bodas, R.; Ahti, T.; Stenroos, S.; Martín, M.P.; Burgaz, A.R. Multilocus Approach to Species Recognition in the Cladonia humilis Complex (Cladoniaceae, Ascomycota). Am. J. Bot. 2013, 100, 664–678. [Google Scholar] [CrossRef] [PubMed]
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef]
Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
Bengtsson-Palme, J.; Ryberg, M. Improved Software Detection and Extraction of ITS1 and ITS 2 from Ribosomal ITS Sequences of Fungi and Other Eukaryotes for Analysis of Environmental Sequencing data. Methods Ecol. Evol. 2013, 4, 914–919. [Google Scholar] [CrossRef]
Madden, T.L. The BLAST Sequence Analysis Tool. NCBI Handb. 2013, 2, 425–436. [Google Scholar]
Nurk, S.; Meleshko, D.; Korobeynikov, A.; Pevzner, P.A. metaSPAdes: A New Versatile Metagenomic Assembler. Genome Res. 2017, 27, 824–834. [Google Scholar] [CrossRef] [PubMed]
Buchfink, B.; Xie, C.; Huson, D. Fast and Sensitive Protein Alignment Using DIAMOND. Nat. Methods 2014, 12, 59–60. [Google Scholar] [CrossRef]
Suzek, B.E.; Huang, H.; McGarvey, P.; Mazumder, R.; Wu, C.H. UniRef: Comprehensive and Non-Redundant UniProt Reference Clusters. Bioinformatics 2007, 23, 1282–1288. [Google Scholar] [CrossRef]
Laetsch, D.; Blaxter, M. BlobTools: Interrogation of Genome Assemblies. F1000Research 2017, 6, 1287. [Google Scholar] [CrossRef]
Li, H. Minimap2: Pairwise Alignment for Nucleotide Sequences. Bioinformatics 2017, 34, 3094–3100. [Google Scholar] [CrossRef] [PubMed]
Shen, W.; Le, S.; Li, Y.; Hu, F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 2016, 11, e0163962. [Google Scholar] [CrossRef]
Koren, S.; Walenz, B.P.; Berlin, K.; Miller, J.R.; Bergman, N.H.; Phillippy, A.M. Canu: Scalable and Accurate Long-Read Assembly via Adaptive K-Mer Weighting and Repeat Separation. Genome Res. 2017, 27, 722–736. [Google Scholar] [CrossRef] [PubMed]
Kolmogorov, M.; Bickhart, D.M.; Behsaz, B.; Gurevich, A.; Rayko, M.; Shin, S.B.; Kuhn, K.; Yuan, J.; Polevikov, E.; Smith, T.P.L.; et al. metaFlye: Scalable Long-Read Metagenome Assembly Using Repeat Graphs. Nat. Methods 2020, 17, 1103–1110. [Google Scholar] [CrossRef]
Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H. Haplotype-Resolved de Novo Assembly Using Phased Assembly Graphs with Hifiasm. Nat. Methods 2021, 18, 170–175. [Google Scholar] [CrossRef]
Chakraborty, M.; Baldwin-Brown, J.G.; Long, A.D.; Emerson, J.J. Contiguous and Accurate de Novo Assembly of Metazoan Genomes with Modest Long Read Coverage. Nucleic Acids Res. 2016, 44, e147. [Google Scholar]
Gurevich, A.A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: Quality Assessment Tool for Genome Assemblies. Bioinformatics 2013, 29, 1072–1075. [Google Scholar] [CrossRef]
Seppey, M.; Manni, M.; Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol. Biol. 2019, 1962, 227–245. [Google Scholar]
Antipov, D.; Korobeynikov, A.; McLean, J.S.; Pevzner, P.A. hybridSPAdes: An Algorithm for Hybrid Assembly of Short and Long Reads. Bioinformatics 2016, 32, 1009–1015. [Google Scholar] [CrossRef] [PubMed]
Prjibelski, A.; Antipov, D.; Meleshko, D.; Lapidus, A.; Korobeynikov, A. Using SPAdes De Novo Assembler. Curr. Protoc. Bioinform. 2020, 70, e102. [Google Scholar] [CrossRef]
Li, D.; Liu, C.-M.; Luo, R.; Sadakane, K.; Lam, T. MEGAHIT: An Ultra-Fast Single-Node Solution for Large and Complex Metagenomics Assembly via Succinct de Bruijn Graph. Bioinformatics 2014, 31, 1674–1676. [Google Scholar] [CrossRef]
Bushnell, B. BBMap: A Fast, Accurate, Splice-Aware Aligner; Lawrence Berkeley National Laboratory: Berkeley, CA, USA, 2014. [Google Scholar]
Alneberg, J.; Bjarnason, B.S.; de Bruijn, I.; Schirmer, M.; Quick, J.; Ijaz, U.Z.; Loman, N.J.; Andersson, A.F.; Quince, C. CONCOCT: Clustering cONtigs on COverage and ComposiTion. arXiv 2013, arXiv:1312.4038. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2014. [Google Scholar]
Wickham, H. Getting Started with Qplot. In ggplot2; Springer: New York, NY, USA, 2009; pp. 9–26. ISBN 9780387981406. [Google Scholar]
Pryszcz, L.P.; Gabaldón, T. Redundans: An Assembly Pipeline for Highly Heterozygous Genomes. Nucleic Acids Res. 2016, 44, e113. [Google Scholar] [CrossRef]
Vaser, R.; Sovic, I.; Nagarajan, N.; Šikić, M. Racon-Rapid Consensus Module for Raw de Novo Genome Assembly of Long Uncorrected Reads. In Proceedings of the London Calling Conference, Amsterdam, The Netherlands, 28 October 2016. [Google Scholar]
Li, H. Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM. arXiv 2013, arXiv:1303.3997. [Google Scholar] [CrossRef]
Walker, B.J.; Abeel, T.; Shea, T.; Priest, M.; Abouelliel, A.; Sakthikumar, S.; Cuomo, C.A.; Zeng, Q.; Wortman, J.; Young, S.K.; et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE 2014, 9, e112963. [Google Scholar] [CrossRef]
Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef] [PubMed]
Levy Karin, E.; Mirdita, M.; Söding, J. MetaEuk-Sensitive, High-Throughput Gene Discovery, and Annotation for Large-Scale Eukaryotic Metagenomics. Microbiome 2020, 8, 48. [Google Scholar] [CrossRef]
Manni, M.; Berkeley, M.R.; Seppey, M.; Zdobnov, E.M. BUSCO: Assessing Genomic Data Quality and beyond. Curr. Protoc. 2021, 1, e323. [Google Scholar] [CrossRef] [PubMed]
Flynn, J.M.; Hubley, R.; Goubert, C.; Rosen, J.; Clark, A.G.; Feschotte, C.; Smit, A.F. RepeatModeler2 for Automated Genomic Discovery of Transposable Element Families. Proc. Natl. Acad. Sci. USA 2020, 117, 9451–9457. [Google Scholar] [CrossRef] [PubMed]
Crescente, J.M.; Zavallo, D.; Helguera, M.; Vanzetti, L.S. MITE Tracker: An Accurate Approach to Identify Miniature Inverted-Repeat Transposable Elements in Large Genomes. BMC Bioinform. 2018, 19, 348. [Google Scholar] [CrossRef] [PubMed]
Su, W.; Ou, S.; Hufford, M.B.; Peterson, T. A Tutorial of EDTA: Extensive De Novo TE Annotator. Methods Mol. Biol. 2021, 2250, 55–67. [Google Scholar]
Xiong, W.; He, L.; Lai, J.; Dooner, H.K.; Du, C. HelitronScanner Uncovers a Large Overlooked Cache of Helitron Transposons in Many Plant Genomes. Proc. Natl. Acad. Sci. USA 2014, 111, 10263–10268. [Google Scholar] [CrossRef] [PubMed]
Su, W.; Gu, X.; Peterson, T. TIR-Learner, a New Ensemble Method for TIR Transposable Element Annotation, Provides Evidence for Abundant New Transposable Elements in the Maize Genome. Mol. Plant 2019, 12, 447–460. [Google Scholar] [CrossRef]
Ellinghaus, D.; Kurtz, S.; Willhoeft, U. LTRharvest, an Efficient and Flexible Software for de Novo Detection of LTR Retrotransposons. BMC Bioinform. 2008, 9, 18. [Google Scholar] [CrossRef]
Xu, Z.; Wang, H. LTR_FINDER: An Efficient Tool for the Prediction of Full-Length LTR Retrotransposons. Nucleic Acids Res. 2007, 35, W265–W268. [Google Scholar] [CrossRef]
Li, W.; Godzik, A. Cd-Hit: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef]
Nishimura, D. RepeatMasker. Biotech Softw. Internet Rep. 2000, 1, 36–39. [Google Scholar] [CrossRef]
Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinform. 2004, 5, 4–10. [Google Scholar] [CrossRef]
Edgar, R.C. MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef]
Capella-Gutiérrez, S.; Silla-Martínez, J.M.; Gabaldón, T. trimAl: A Tool for Automated Alignment Trimming in Large-Scale Phylogenetic Analyses. Bioinformatics 2009, 25, 1972–1973. [Google Scholar] [CrossRef] [PubMed]
Castresana, J. Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis. Mol. Biol. Evol. 2000, 17, 540–552. [Google Scholar] [CrossRef] [PubMed]
Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16, 276–277. [Google Scholar] [CrossRef] [PubMed]
Holt, C.; Yandell, M. MAKER2: An Annotation Pipeline and Genome-Database Management Tool for Second-Generation Genome Projects. BMC Bioinform. 2011, 12, 491. [Google Scholar] [CrossRef] [PubMed]
Slater, G.S.C.; Birney, E. Automated Generation of Heuristics for Biological Sequence Comparison. BMC Bioinform. 2005, 6, 31. [Google Scholar] [CrossRef]
Campbell, M.S.; Holt, C.; Moore, B.; Yandell, M. Genome Annotation and Curation Using MAKER and MAKER-P. Curr. Protoc. Bioinform. 2014, 48, 4–11. [Google Scholar] [CrossRef]
Korf, I. Gene Finding in Novel Genomes. BMC Bioinform. 2004, 5, 59. [Google Scholar] [CrossRef]
Stanke, M.; Morgenstern, B. AUGUSTUS: A Web Server for Gene Prediction in Eukaryotes That Allows User-Defined Constraints. Nucleic Acids Res. 2005, 33, W465–W467. [Google Scholar] [CrossRef]
Palmer, J.M.; Stajich, J.E. Funannotate v1. 8.1: Eukaryotic Genome Annotation. 2020. Available online: https://github.com/nextgenusfs/funannotate (accessed on 29 June 2025).
Jones, P.; Binns, D.; Chang, H.-Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-Scale Protein Function Classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef]
Rawlings, N.D.; Barrett, A.J.; Bateman, A. MEROPS: The Peptidase Database. Nucleic Acids Res. 2009, 38, D227–D233. [Google Scholar] [CrossRef]
Yin, Y.; Mao, X.; Yang, J.; Chen, X.; Mao, F.; Xu, Y. dbCAN: A Web Resource for Automated Carbohydrate-Active Enzyme Annotation. Nucleic Acids Res. 2012, 40, W445–W451. [Google Scholar] [CrossRef]
Cantalapiedra, C.P.; Hernández-Plaza, A.; Letunic, I.; Bork, P.; Huerta-Cepas, J. eggNOG-Mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 2021, 38, 5825–5829. [Google Scholar] [CrossRef]
Huerta-Cepas, J.; Forslund, K.; Coelho, L.P.; Szklarczyk, D.; Jensen, L.J.; Von, M.C.; Bork, P. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Mol. Biol. Evol. 2017, 34, 2115–2122. [Google Scholar] [CrossRef]
Medema, M.; Blin, K.; Cimermancic, P.; Jager, V.D.; Zakrzewski, P.; Fischbach, M.; Weber, T.; Takano, E.; Breitling, R. antiSMASH: Rapid Identification, Annotation and Analysis of Secondary Metabolite Biosynthesis Gene Clusters in Bacterial and Fungal Genome Sequences. Nucleic Acids Res. 2011, 39, W339–W346. [Google Scholar] [CrossRef]
Cao, L.; Liao, L.; Su, C.; Mo, T.; Zhu, F.; Qin, R.; Li, R. Metagenomic Analysis Revealed the Microbiota and Metabolic Function during Co-Composting of Food Waste and Residual Sludge for Nitrogen and Phosphorus Transformation. Sci. Total Environ. 2021, 773, 145561. [Google Scholar] [CrossRef]
Mavromatis, K.; Land, M.L.; Brettin, T.S.; Quest, D.J.; Copeland, A.; Clum, A.; Goodwin, L.; Woyke, T.; Lapidus, A.; Klenk, H.P.; et al. The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation. PLoS ONE 2012, 7, e48837. [Google Scholar] [CrossRef] [PubMed]
Miller, J.R.; Zhou, P.; Mudge, J.; Gurtowski, J.; Lee, H.; Ramaraj, T.; Walenz, B.P.; Liu, J.; Stupar, R.M.; Denny, R.; et al. Hybrid Assembly with Long and Short Reads Improves Discovery of Gene Family Expansions. BMC Genom. 2017, 18, 541. [Google Scholar] [CrossRef] [PubMed]
Gorman, Z.; Chen, J.; de Leon, A.A.P.; Wallis, C.M. Comparison of Assembly Platforms for the Assembly of the Nuclear Genome of Trichoderma Harzianum Strain PAR3. BMC Genom. 2023, 24, 454. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Liu, C.-G.; Yang, S.-H.; Wang, X.; Bai, F.-W.; Wang, Z. Benchmarking of Long-Read Sequencing, Assemblers and Polishers for Yeast Genome. Brief. Bioinform. 2022, 23, bbac146. [Google Scholar] [CrossRef]
Al Kawam, A.; Khatri, S.; Datta, A. A Survey of Software and Hardware Approaches to Performing Read Alignment in next Generation Sequencing. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 1202–1213. [Google Scholar] [CrossRef] [PubMed]
Murigneux, V.; Rai, S.K.; Furtado, A.; Bruxner, T.J.C.; Tian, W.; Harliwong, I.; Wei, H.; Yang, B.; Ye, Q.; Anderson, E.; et al. Comparison of Long-Read Methods for Sequencing and Assembly of a Plant Genome. Gigascience 2020, 9, giaa146. [Google Scholar] [CrossRef]
Jung, H.; Jeon, M.-S.; Hodgett, M.; Waterhouse, P.; Eyun, S.-I. Comparative Evaluation of Genome Assemblers from Long-Read Sequencing for Plants and Crops. J. Agric. Food Chem. 2020, 68, 7670–7677. [Google Scholar] [CrossRef]
Feng, X.; Cheng, H.; Portik, D.; Li, H. Metagenome Assembly of High-Fidelity Long Reads with Hifiasm-Meta. Nat. Methods 2022, 19, 671–674. [Google Scholar] [CrossRef]
Purayil, G.P.; Almarzooqi, A.Y.; El-Tarabily, K.A.; You, F.M.; AbuQamar, S.F. Fully Resolved Assembly of Fusarium Proliferatum DSM106835 Genome. Sci. Data 2023, 10, 705. [Google Scholar] [CrossRef]
Yao, G.; Chen, W.; Sun, J.; Wang, X.; Wang, H.; Meng, T.; Zhang, L.; Guo, L. Gapless Genome Assembly of Fusarium Verticillioides, a Filamentous Fungus Threatening Plant and Human Health. Sci. Data 2023, 10, 229. [Google Scholar] [CrossRef]
Song, Y.; Zhang, M.; Liu, Y.-Y.; Li, M.; Xie, X.; Qi, J. Haplotype-Phased Chromosome-Level Genome Assembly of Cryptoporus Qinlingensis, a Typical Traditional Chinese Medicine Fungus. J. Fungi 2025, 11, 163. [Google Scholar] [CrossRef]
Yu, W.; Luo, H.; Yang, J.; Zhang, S.; Jiang, H.; Zhao, X.; Hui, X.; Sun, D.; Li, L.; Wei, X.-Q.; et al. Comprehensive Assessment of 11 de Novo HiFi Assemblers on Complex Eukaryotic Genomes and Metagenomes. Genome Res. 2024, 34, 326–340. [Google Scholar] [CrossRef] [PubMed]
Yue, Y.; Huang, H.; Qi, Z.; Dou, H.-M.; Liu, X.-Y.; Han, T.-F.; Chen, Y.; Song, X.-J.; Zhang, Y.-H.; Tu, J. Evaluating Metagenomics Tools for Genome Binning with Real Metagenomic Datasets and CAMI Datasets. BMC Bioinform. 2020, 21, 334. [Google Scholar] [CrossRef] [PubMed]
Cornet, L.; Baurain, D. Contamination Detection in Genomic Data: More Is Not Enough. Genome Biol. 2022, 23, 60. [Google Scholar] [CrossRef]
Vecherskii, M.V.; Khayrullin, D.R.; Shadrin, A.M.; Lisov, A.V.; Zavarzina, A.G.; Zavarzin, A.A.; Leontievsky, A.A. Metagenomes of Lichens Solorina Crocea and Peltigera Canina. Microbiol. Resour. Announc. 2022, 11, e0100021. [Google Scholar] [CrossRef]
Tagirdzhanova, G.; Saary, P.; Cameron, E.S.; Allen, C.C.G.; Garber, A.I.; Escandón, D.D.; Cook, A.T.; Goyette, S.; Nogerius, V.T.; Passo, A.; et al. Microbial Occurrence and Symbiont Detection in a Global Sample of Lichen Metagenomes. PLoS Biol. 2024, 22, e3002862. [Google Scholar] [CrossRef]
Wang, Y.-Y.; Liu, B.; Zhang, X.-Y.; Zhou, Q.-M.; Zhang, T.; Li, H.; Yu, Y.-F.; Zhang, X.-L.; Hao, X.-Y.; Wang, M.; et al. Genome Characteristics Reveal the Impact of Lichenization on Lichen-Forming Fungus Endocarpon pusillum Hedwig (Verrucariales, Ascomycota). BMC Genom. 2014, 15, 34. [Google Scholar] [CrossRef] [PubMed]
King, R.; Urban, M.; Hammond-Kosack, M.C.U.; Hassani-Pak, K.; Hammond-Kosack, K.E. The Completed Genome Sequence of the Pathogenic Ascomycete Fungus Fusarium Graminearum. BMC Genom. 2015, 16, 544. [Google Scholar] [CrossRef]
Prasad, P.; Varshney, D.; Adholeya, A. Whole Genome Annotation and Comparative Genomic Analyses of Bio-Control Fungus Purpureocillium lilacinum. BMC Genom. 2015, 16, 1004. [Google Scholar] [CrossRef] [PubMed]
Yuan, Y.; Wu, F.; Si, J.; Zhao, Y.-F.; Dai, Y.-C. Whole Genome Sequence of Auricularia heimuer (Basidiomycota, Fungi), the Third Most Important Cultivated Mushroom Worldwide. Genomics 2019, 111, 50–58. [Google Scholar] [CrossRef]
Mao, Z.; Yang, P.; Liu, H.; Mao, Y.; Lei, Y.; Hou, D.; Ma, H.; Liao, X.; Jiang, W. Whole-Genome Sequencing and Analysis of the White-Rot Fungus Ceriporia lacerata Reveals Its Phylogenetic Status and the Genetic Basis of Lignocellulose Degradation and Terpenoid Synthesis. Front. Microbiol. 2022, 13, 880946. [Google Scholar] [CrossRef] [PubMed]
Song, H.; Kim, K.-T.; Park, S.-Y.; Lee, G.-W.; Choi, J.; Jeon, J.; Cheong, K.-C.; Choi, G.; Hur, J.; Lee, Y.-H. A Comparative Genomic Analysis of Lichen-Forming Fungi Reveals New Insights into Fungal Lifestyles. Sci. Rep. 2022, 12, 10724. [Google Scholar] [CrossRef]
Reddy, V.S.; Shlykov, M.A.; Castillo, R.; Sun, E.I.; Saier, M.H., Jr. The Major Facilitator Superfamily (MFS) Revisited: The Major Facilitator Family Revisited. FEBS J. 2012, 279, 2022–2035. [Google Scholar] [CrossRef]
Müller, A.; Mӓkelӓ, M.R.; de Vries, R.P. Aldo-Keto Reductases, Short Chain Dehydrogenases/reductases, and Zinc-Binding Dehydrogenases Are Key Players in Fungal Carbon Metabolism. Adv. Appl. Microbiol. 2025, 130, 123–157. [Google Scholar]
Bowman, S.M.; Free, S.J. The Structure and Synthesis of the Fungal Cell Wall. Bioessays 2006, 28, 799–808. [Google Scholar] [CrossRef]
Li, M.; Jiang, C.; Wang, Q.; Zhao, Z.; Jin, Q.; Xu, J.-R.; Liu, H. Evolution and Functional Insights of Different Ancestral Orthologous Clades of Chitin Synthase Genes in the Fungal Tree of Life. Front. Plant Sci. 2016, 7, 37. [Google Scholar] [CrossRef]
Crešnar, B.; Petrič, S. Cytochrome P450 Enzymes in the Fungal Kingdom. Biochim. Biophys. Acta 2011, 1814, 29–35. [Google Scholar] [CrossRef]
Mlambo, G.; Padayachee, T.; Nelson, D.R.; Syed, K. Genome-Wide Analysis of the Cytochrome P450 Monooxygenases in the Lichenized Fungi of the Class Lecanoromycetes. Microorganisms 2023, 11, 2590. [Google Scholar] [CrossRef]
Hagiwara, D.; Sakamoto, K.; Abe, K.; Gomi, K. Signaling Pathways for Stress Responses and Adaptation in Aspergillus Species: Stress Biology in the Post-Genomic Era. Biosci. Biotechnol. Biochem. 2016, 80, 1667–1680. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Li, R.; Wang, D.; Qian, B.; Bian, Z.; Wei, J.; Wei, X.; Xu, J.-R. Regulation of Symbiotic Interactions and Primitive Lichen Differentiation by UMP1 MAP Kinase in Umbilicaria muhlenbergii. Nat. Commun. 2023, 14, 6972. [Google Scholar] [CrossRef]
Armaleo, D.; Müller, O.; Lutzoni, F.; Andrésson, Ó.S.; Blanc, G.; Bode, H.B.; Collart, F.R.; Dal Grande, F.; Dietrich, F.; Grigoriev, I.V.; et al. The Lichen Symbiosis Re-Viewed through the Genomes of Cladonia grayi and Its Algal Partner Asterochloris glomerata. BMC Genom. 2019, 20, 605. [Google Scholar] [CrossRef] [PubMed]
Rikkinen, J. Symbiotic Cyanobacteria in Lichens. In Algal and Cyanobacteria Symbioses; Grube, M., Seckbach, J.A., Muggia, L., Eds.; World Scientific (Europe): London, UK, 2017; pp. 147–167. [Google Scholar]
Palmqvist, K. Cyanolichens: Carbon Metabolism. In Cyanobacteria in Symbiosis; Rai, A.N., Bergman, B., Rasmussen, E., Eds.; Springer: Dordrecht, The Netherlands, 2005; pp. 73–96. [Google Scholar]
Castanera, R.; López-Varas, L.; Borgognone, A.; LaButti, K.; Lapidus, A.; Schmutz, J.; Grimwood, J.; Pérez, G.; Pisabarro, A.G.; Grigoriev, I.V.; et al. Transposable Elements versus the Fungal Genome: Impact on Whole-Genome Architecture and Transcriptional Profiles. PLoS Genet. 2016, 12, e1006108. [Google Scholar] [CrossRef]
Armstrong, E.E.; Prost, S.; Ertz, D.; Westberg, M.; Frisch, A.; Bendiksby, M. Draft Genome Sequence and Annotation of the Lichen-Forming Fungus Arthonia radiata. Genome Announc. 2018, 6, 10-1128. [Google Scholar] [CrossRef] [PubMed]
Dal Grande, F.; Meiser, A.; Greshake Tzovaras, B.; Otte, J.; Ebersberger, I.; Schmitt, I. The Draft Genome of the Lichen-Forming fungus Lasallia hispanica (Frey) Sancho & A. Crespo. Lichenologist 2018, 50, 329–340. [Google Scholar]
Grande, F.D.; Jamilloux, V.; Choisne, N.; Calchera, A.; Rolshausen, G.; Petersen, M.; Schulz, M.; Nilsson, M.A.; Schmitt, I. Transposable Elements in the Genome of the Lichen-Forming Fungus Umbilicaria pustulata and Their Distribution in Different Climate Zones along Elevation. Biology 2021, 11, 24. [Google Scholar] [CrossRef]
Santana, M.; Queiroz, M.V. Transposable Elements in Fungi: A Genomic Approach. Sci. J. Genet. Gene Ther. 2015, 1, 10–16. [Google Scholar] [CrossRef]
Castanera, R.; Borgognone, A.; Pisabarro, A.G.; Ramírez, L. Biology, Dynamics, and Applications of Transposable Elements in Basidiomycete Fungi. Appl. Microbiol. Biotechnol. 2017, 101, 1337–1350. [Google Scholar] [CrossRef] [PubMed]
Muszewska, A.; Steczkiewicz, K.; Stepniewska-Dziubinska, M.M.; Ginalski, K. Transposable Elements Contribute to Fungal Genes and Impact Fungal Lifestyle. Sci. Rep. 2019, 9, 4307. [Google Scholar] [CrossRef]
Asplund, J.; Gauslaa, Y. Content of Secondary Compounds Depends on Thallus Size in the Foliose Lichen Lobaria pulmonaria. Lichenologist 2007, 39, 273–278. [Google Scholar] [CrossRef]
Stocker-Wörgötter, E. Metabolic Diversity of Lichen-Forming Ascomycetous Fungi: Culturing, Polyketide and Shikimate Metabolite Production, and PKS Genes. Nat. Prod. Rep. 2008, 25, 188–200. [Google Scholar] [CrossRef]
Solhaug, K.A.; Lind, M.; Nybakken, L.; Gauslaa, Y. Possible Functional Roles of Cortical Depsides and Medullary Depsidones in the Foliose Lichen Hypogymnia physodes. Flora 2009, 204, 40–48. [Google Scholar] [CrossRef]
Jiang, M.; Wu, Z.; Guo, H.; Liu, L.; Chen, S. A Review of Terpenes from Marine-Derived Fungi: 2015–2019. Mar. Drugs 2020, 18, 321. [Google Scholar] [CrossRef]
Avalos, M.; Garbeva, P.; Vader, L.; van Wezel, P.G.; Dickschat, J.S.; Ulanova, D. Biosynthesis, Evolution and Ecology of Microbial Terpenoids. Nat. Prod. Rep. 2022, 39, 249–272. [Google Scholar] [CrossRef]
Abdel-Hameed, M.; Bertrand, R.L.; Piercey-Normore, M.D.; Sorensen, J.L. Putative Identification of the Usnic Acid Biosynthetic Gene Cluster by de Novo Whole-Genome Sequencing of a Lichen-Forming Fungus. Fungal Biol. 2016, 120, 306–316. [Google Scholar] [CrossRef]
Calchera, A.; Dal Grande, F.; Bode, H.B.; Schmitt, I. Biosynthetic Gene Content of the “Perfume Lichens” Evernia prunastri and Pseudevernia furfuracea. Molecules 2019, 24, 203. [Google Scholar] [CrossRef]
Singh, G.; Pasinato, A.; Yriarte, A.L.-C.; Pizarro, D.; Divakar, P.K.; Schmitt, I.; Dal Grande, F. Are There Conserved Biosynthetic Genes in Lichens? Genome-Wide Assessment of Terpene Biosynthetic Genes Suggests Ubiquitous Distribution of the Squalene Synthase Cluster. BMC Genom. 2024, 25, 936. [Google Scholar] [CrossRef]
Pasinato, A.; Singh, G. Lichens Are a Treasure Chest of Bioactive Compounds: Fact or Fake? New Phytol. 2025, 246, 389–395. [Google Scholar] [CrossRef]
Gagunashvili, A.N.; Davídsson, S.P.; Jónsson, Z.O.; Andrésson, O.S. Cloning and Heterologous Transcription of a Polyketide Synthase Gene from the Lichen Solorina Crocea. Mycol. Res. 2009, 113, 354–363. [Google Scholar] [CrossRef]
Gerasimova, J.V.; Beck, A.; Scheunert, A.; Kulkarni, O. De Novo Genome Assembly of Toniniopsis dissimilis (Ramalinaceae, Lecanoromycetes) from Long Reads Shows a Comparatively High Composition of Biosynthetic Genes Putatively Involved in Melanin Synthesis. Genes 2024, 15, 1029. [Google Scholar] [CrossRef] [PubMed]
Pasinato, A.; Singh, G. Bioinformatic Exploration of RiPP Biosynthetic Gene Clusters in Lichens. Fungal Biol. Biotechnol. 2025, 12, 6. [Google Scholar] [CrossRef] [PubMed]
Gill, H.; Sorensen, J.L.; Collemare, J. Lichen Fungal Secondary Metabolites: Progress in the Genomic Era toward Ecological Roles in the Interaction. In Plant Relationships: Fungal-Plant Interactions; Scott, B., Mesarich, C., Eds.; Springer: Cham, Switzerland, 2023; Volume 5, pp. 185–208. [Google Scholar]
Stocker-Wörgötter, E. Biochemical Diversity and Ecology of Lichen-Forming Fungi: Lichen Substances, Chemosyndromic Variation and Origin of Polyketide-Type Metabolites (Biosynthetic Pathways). In Recent Advances in Lichenology; Upreti, D., Divakar, P., Shukla, V., Bajpai, R., Eds.; Springer: New Delhi, India, 2015; pp. 161–179. [Google Scholar]

Figure 1. Bioinformatics workflow describing the three strategies tested to assemble the genome of S. crocea. The figure indicates the programs used in each step. Metagenomic and filtered mycobiont data are represented in different colors, as shown by the legend. The red cross indicates the assemblies discarded because of worse continuity and completeness values. The green mark indicates the selected assemblies and the assembly used as an input in the following steps. Created in Biorender. Ana García-Muñoz (2025) https://app.biorender.com/citation/6899a743b180816c448a7908.

Figure 2. Functional classification of S. crocea genes based on gene ontology (GO) annotation. (a) Proportion of genes annotated by each GO category (BP: biological processes, MF: molecular function; CC: cellular components); (b) Top 15 of more abundant GO terms of the most represented categories.

Figure 3. Result of functional annotation carried out by Pfam domains. The most abundant functional terms are shown.

Figure 4. More frequent homology-based functional annotations retrieved from searches against the UniProtKB database.

Table 1. Comparison of metagenome results using Strategies 1 and 2. The selected metagenome from Strategy 2 for mycobiont contigs filtering and reads extraction is shaded. Assembly metrics were obtained using QUAST. BUSCO scores stand for complete (C), complete and single-copy (S), complete and duplicated (D), fragmented (F), and missing (M) BUSCOs.

Sequencing Technology	Assembly Tool	No. Contigs	Largest Contig (bp)	Genome Size (Mb)	N50 (kb)	%GC Content	%BUSCO Scores
Strategy 1: hybrid assembly
PacBio HiFi + Illumina	metaSPAdes	22,583	347,633	121.69	41.75	39.16	C: 97.8 [S: 96.2, D: 1.6], F: 0.5, M: 1.7
Strategy 2: long reads + short reads scaffolding
PacBio HiFi	Canu	4834	242,057	68.16	15.31	41.34	C: 43.0 [S: 42.1, D: 0.9], F: 5.9, M: 51.2
PacBio HiFi	Hifiasm-meta	4804	230,570	108.66	25.57	41.61	C: 67.3 [S: 65.2, D: 2.1], F: 5.3, M: 27.4
PacBio HiFi	metaFlye	6089	230,570	118.01	24.83	41.56	C: 68.6 [S: 66.8, D: 1.9], F: 5.3, M: 26.0
PacBio HiFi	Canu + Hifiasm-meta	4802	327,659	109.03	25.65	41.61	C: 67.4 [S: 65.2, D: 2.1], F: 5.3, M: 27.3
PacBio HiFi	Canu + metaFlye	3941	409,251	96.59	29.29	41.41	C: 60.8 [S: 59.2, D: 1.6], F: 5.8, M: 34.2
PacBio HiFi	Hifiasm-meta + metaFlye	4325	264,302	117.58	32.54	41.68	C: 72.6 [S: 70.3, D: 2.3], F: 4.7, M: 22.6

Table 2. Final mycobiont assemblies generated by Strategies 1, 2, and 3, after scaffolding and polishing. The table shows the parameters used to assess the continuity and completeness of the assemblies. These metrics were obtained by QUAST. BUSCO scores stand for complete (C), complete and single-copy (S), complete and duplicated (D), and fragmented (F) and missing (M) BUSCOs.

Assembly Strategy	No. Scaffolds	Largest Contig (bp)	Genome Size (Mb)	N50 (kb)	%GC Content	%BUSCO Scores
Strategy 1: hybrid assembly	4157	347,633	90.81	63.23	37.46	C: 95.4 [S: 94.0, D: 1.4], F: 0.6, M: 4.0
Strategy 2: metagenome & scaffolding with short reads	519	547,588	55.50	142.84	37.24	C: 96.7 [S: 95.7, D: 0.9], F: 0.7, M: 2.6
Strategy 3: hybrid assembly of filtered long and short reads	6280	140,100	33.71	16.58	42.43	C: 37.6 [S: 37.5, D: 0.1], F: 1.0, M: 61.4

Table 3. A comparison of annotations derived from assemblies obtained by means of the three strategies. BUSCO scores stand for complete (C), complete and single-copy (S), complete and duplicated (D), and fragmented (F) and missing (M) BUSCOs.

	Strategy 1	Strategy 2	Strategy 3
No. genes	7352	6151	2767
Genes < 0.1 AED score	56%	61%	53%
BUSCO (annotated gene set)	C: 89.4% [S: 88.1%, D: 1.3%], F: 2.9%, M: 7.7%	C: 91.6% [S: 90.7%, D: 0.8%], F: 2.8%, M: 5.7%	C: 33.5% [S: 33.5%,D: 0%], F: 1.6%, M: 64.8%
No. repetitive elements	10,136	813	8739
Genome covered by repetitive elements	22.53%	22.18%	22.04%
BGCs	19	18	10
Genome covered by BGCs	0.25%	1.40%	0.23%
Functional terms	30,404	52,648	14,611
Annotated genes	87.2%	91.8%	85.0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

García-Muñoz, A.; Pino-Bodas, R. Evaluating the Assembly Strategy of a Fungal Genome from Metagenomic Data: Solorina crocea (Peltigerales, Ascomycota) as a Case Study. J. Fungi 2025, 11, 596. https://doi.org/10.3390/jof11080596

AMA Style

García-Muñoz A, Pino-Bodas R. Evaluating the Assembly Strategy of a Fungal Genome from Metagenomic Data: Solorina crocea (Peltigerales, Ascomycota) as a Case Study. Journal of Fungi. 2025; 11(8):596. https://doi.org/10.3390/jof11080596

Chicago/Turabian Style

García-Muñoz, Ana, and Raquel Pino-Bodas. 2025. "Evaluating the Assembly Strategy of a Fungal Genome from Metagenomic Data: Solorina crocea (Peltigerales, Ascomycota) as a Case Study" Journal of Fungi 11, no. 8: 596. https://doi.org/10.3390/jof11080596

APA Style

García-Muñoz, A., & Pino-Bodas, R. (2025). Evaluating the Assembly Strategy of a Fungal Genome from Metagenomic Data: Solorina crocea (Peltigerales, Ascomycota) as a Case Study. Journal of Fungi, 11(8), 596. https://doi.org/10.3390/jof11080596

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the Assembly Strategy of a Fungal Genome from Metagenomic Data: Solorina crocea (Peltigerales, Ascomycota) as a Case Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Characterization of Solorina crocea

2.2. Fungal Material

2.3. DNA Extraction and Quantification

2.4. Library Preparation and Sequencing

2.5. Quality Control and Reads Filter

2.6. Benchmarking of Strategies for Genome Assembly

2.6.1. Assembly Strategy 1: Hybrid Metagenome Assembly Using Short and Long Reads

2.6.2. Assembly Strategy 2: Genome Assembly Using Long Reads and Scaffolding Using Short Reads

2.6.3. Assembly Strategy 3: Hybrid Assembly Based on Mycobiont Reads Previously Filtered

2.7. Scaffolding and Polishing Mycobiont Assemblies

2.8. Quality Assessment of Mycobiont Assemblies

2.9. Repetitive Element Library Construction

2.10. Gene Prediction and Functional Annotation

3. Results

3.1. Mycobiont Assembly Resulting from Strategy 1

3.2. Mycobiont Assembly Resulting from Strategy 2

3.3. Mycobiont Assembly Resulting from Strategy 3

3.4. Functional Features of the Solorina crocea Genome

4. Discussion

4.1. Assessing Different Assembly Strategies and Other Considerations

4.2. Genomic Features of Solorina crocea

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI