High Diversity of Type I Polyketide Genes in Bacidia rubella as Revealed by the Comparative Analysis of 23 Lichen Genomes

Fungi involved in lichen symbioses produce a large array of secondary metabolites that are often diagnostic in the taxonomic delimitation of lichens. The most common lichen secondary metabolites—polyketides—are synthesized by polyketide synthases, particularly by Type I PKS (TI-PKS). Here, we present a comparative genomic analysis of the TI-PKS gene content of 23 lichen-forming fungal genomes from Ascomycota, including the de novo sequenced genome of Bacidia rubella. Firstly, we identify a putative atranorin cluster in B. rubella. Secondly, we provide an overview of TI-PKS gene diversity in lichen-forming fungi, and the most comprehensive Type I PKS phylogeny of lichen-forming fungi to date, including 624 sequences. We reveal a high number of biosynthetic gene clusters and examine their domain composition in the context of previously characterized genes, confirming that PKS genes outnumber known secondary substances. Moreover, two novel groups of reducing PKSs were identified. Although many PKSs remain without functional assignments, our findings highlight that genes from lichen-forming fungi represent an untapped source of novel polyketide compounds.


Introduction
Fungi synthesize an extensive array of chemically and functionally diverse natural products, termed secondary metabolites, with roles in defense, self-protection and development [1,2]. Based on their properties and the core enzymes and precursors involved in their biosynthesis, four major groups of fungal secondary metabolites are distinguished: polyketides, non-ribosomal peptides (NRPS), terpenoids and tryptophan derivatives [3]. Most fungal secondary metabolites are encoded by genes located adjacent to each other (i.e., "clustered") in the genome [2,4].
The presence of optional domains has led to the classification of fungal PKSs into three subgroups based on their ability to perform redox reactions. The first subgroup

DNA Isolation and Sequencing
To obtain concentrated high-molecular-weight genomic DNA, about 1 cm 2 of mycelium was taken and ground in liquid nitrogen with a pre-cooled pistil. Genomic DNA was isolated using the MagAttract HMW DNA Kit following the manufacturer's protocol and subsequent purification steps with magnetic beads (HMW genomic DNA from fresh or frozen tissue protocol), resulting in a total yield of 1 µg. The final concentration was 9.53 ng/µL, measured with a NanoDrop 1000 spectrophotometer (Peqlab Biotechnologie, GmbH, Erlangen, Germany) and Qubit 4 Fluorometer (ThermoFisher Scientific, Waltham, MA, USA) using 1X dsDNA HS Assay Kits. DNA extraction, PCR amplification and Sanger sequencing (using nrITS primers) were performed to evaluate possible contamination and confirm the cultures' identities. First, a paired-end library was constructed using Illumina DNA Prep (earlier known as Nextera DNA Flex Library Prep) and sequenced on NovaSeq 6000 (NovaSeq SP 150 bp Paired-end Flow Cell, Illumina) at the Biomedical Sequencing Facility (BSF, Vienna, Austria). A total concentration of 0.2 µg was used for Illumina library preparation. To supplement the Illumina data, we prepared several Oxford Nanopore libraries using the SQK-LSK109 kit and sequenced them on a MinION sequencer using R9.4.1 flow cells (altogether, four runs were conducted). The total yield of DNA was in the range of 0.12-0.31 µg.

Data Generation and Initial Read-Quality Assessment
To obtain a high-quality genome, we used a hybrid assembly approach using Illumina short-reads and Oxford Nanopore long reads. In total, we produced 17 Gbp of raw pairedend Illumina reads with 500× coverage. Raw data inspection with FastQC v0.11.7 [33] indicated high-quality reads (quality score (Q) above 34) and no excessive adapter contamination or read duplication. Basecalling of Nanopore raw signals was performed using Flappie v2.1.3 (git commit 4de542f; https://github.com/nanoporetech/flappie) into a total of 22 Gbp of raw sequences up to 94.5 Kb of read length.

Genome Completeness and Quality Assessment
The paired-end Illumina reads and unpaired nanopore reads were assembled using hybridSPAdes [34] with k-mer sizes of 33, 55, 77 and 127. To examine the assembly for potential non-target contigs, de novo assembly was subjected to BLASTX using DIA-MOND [35] against a custom database comprising the protein sets of the NCBI nr database (downloaded in July 2018). The results of this DIAMOND search were used as input for blobtools v1.1.1 [36]. The final results showed the absence of foreign contigs; therefore, further filtering was unnecessary. The quality of the assembly and genome statistics were assessed using QUAST v5.0.2 [37]. The completeness of the genomes was assessed using all single-copy BUSCO genes of the ascomycota_odb9 set (part of the phylociraptor pipeline; git commit 4cfd3c4; accessed on 2 July 2021; see below).

Dataset Construction
The dataset consists of the de novo sequenced genome of B. rubella and twenty-two additional representative genomes of Lecanoromycetes, the largest radiation of lichen-forming fungi. We included in our study twenty-three genomes obtained from pure fungal cultures with the exception of B. gigantensis and R. intermedia, which were obtained by sequencing the whole thallus ( Table 1). The dataset aims to identify known and possible unknown BGCs placing them together with previously characterized PKSs from other fungi and bacteria. The genomes were downloaded using phylociraptor, which automatically downloads genomes from NCBI and combines them with additionally specified genomes provided by the user [38]. We kept the original taxon names from NCBI for convenience, even though the current taxonomic status might differ.

Genome Annotation
The publicly available genomes were sequenced with different sequencing technologies at different times and are thus of varying quality. To make annotations comparable, we performed ab initio gene calling and functional annotations for all of them. The downloaded genomes were analyzed using the smsi-funannotate (https://github.com/reslp/ smsi-funannotate; git commit 398a144; accessed on 4 June 2021) pipeline based on funannotate v.1.8.7 [39]. First, we removed duplicated identical contigs (funannotate clean) in each assembly. To avoid long contig/scaffold names, we sorted our assembly by contig length and then renamed the fasta headers (funannotate sort). Afterwards, we made a soft repeat masking of the assembly using tantan (funannotate mask) [40]. The following steps included gene prediction using the gene-callers Augustus v3.3.2 [41], snap [42], GlimmerHMM v3.0.4 [43] and Genemark ES v4.68 [44]. For Augustus, we used Aspergillus nidulans as a pre-trained species. All the pipelines were run on the cluster of the University of Graz and our in-house Linux Server at LMU Munich. The genome statistics are summarized in Table 4

Annotation of Biosynthetic Gene Clusters
The identification, annotation and analysis of secondary metabolite BGCs in the studied fungal genome sequences were performed with antiSMASH v6.0 as a part of the smsi-funannotate pipeline and thus all BGC numbers are referred to antiSMASH prediction. We chose all BGCs from the annotation results that contained orthologous core TI-PKSs for our comparative genomic and phylogenetic analyses. The GBK output files with annotated BGCs and predicted genes were visualized on the antiSMASH webserver (fungal version; accessed: November 2021) [45].

Identification of Atranorin Biosynthetic Gene Cluster and Phylogeny
To identify atranorin candidate genes, we downloaded sequences from PKS16 of Cladonia grayi (GenBank: ADM79459; 2089 aa) and PKS23 of Stereocaulon alpinum (GenBank: QXF68953; 2500 aa). Both PKSs were previously reported as possible candidates involved in the biosynthesis of atranorin (see Section 3.1.1 Atranorin in Section 3 Results and Discussion for details). As atranorin synthesis involves an oxidation, the cluster must contain a cytochrome P450 as well as additional genes, O-methyltransferase (OMT), and transporter gene [29]. The presence of these genes was verified by our annotation results, showing OMT (PF08241), cytochrome P450 (PF00067), and transporter gene (PF00400) present in the putative atranorin BGC in B. rubella. The gene arrows plot, comparing two atranorin clusters in Bacidia rubella and Cladonia rangiferina, was drawn using the gggenes v0.4.1 R package (https://github. com/wilkox/gggenes/; accessed on 20 April 2022).
We used BLASTP v2.9.0+ [46] to detect the orthologs of those characterized sequences in all PKS sequences we identified on the twenty-three studied genomes. We filtered BLAST output to retain sequences with a minimum identity of 30% over the alignment length and a minimum query coverage of 50%, sorted for the highest bit score and lowest e-value. Additionally, we used the best blast subject hits with the highest percent similarity to the query gene. The best hits from the blast results (>30%) were selected for the PKS23 alignment. The taxon selection was confirmed with the clade from the large TI-PKS phylogenetic tree.

Topology Test to Confirm Atranorin Biosynthetic Genes
Our maximum-likelihood phylogeny recovered the B. rubella (Ramalinaceae) atranorin gene (bacrubpred_000804) as sister to a clade comprised of Cladonia rangiferina and S. alpinum (claranpred_005882 and QXF68953, respectively), which both belong to Cladoniaceae. Miadlikowska et al. [50] recovered Parmeliaceae as the closest relative of Cladoniaceae, with a clade of these two families being sister to Ramalinaceae. Thus, we tested if the monophyly of a clade comprising Bacidia + Cladoniaceae is significantly supported against the expected phylogenetic relationship comprising Parmeliaceae + Cladoniaceae. For this, we compared (1) an unconstrained ML tree to recover Bacidia + Cladoniaceae as monophyletic, and (2) a constrained tree with Bacidia sister to Cladoniaceae + Parmeliaceae ( Figure 1). Cladoniaceae, with a clade of these two families being sister to Ramalinaceae. Thus, we tested if the monophyly of a clade comprising Bacidia + Cladoniaceae is significantly supported against the expected phylogenetic relationship comprising Parmeliaceae + Cladoniaceae. For this, we compared (1) an unconstrained ML tree to recover Bacidia + Cladoniaceae as monophyletic, and (2) a constrained tree with Bacidia sister to Cladoniaceae + Parmeliaceae ( Figure 1). For the topology test, we used PKS23 sequences in the strict sense, i.e., from the species known as atranorin produces with Gyalolechia flavorubescens and Xanthoria elegans as outgroups. The constrained tree can be multifurcating and need not contain all species; therefore, finally, we shortened our tree to the taxa we wanted to test. First, we performed a constrained search for both topologies using the LG model for the amino acid dataset. Then, we concatenated both trees and implemented tree topology tests, based on the Kishino-Hasegawa test [51], Shimodaira-Hasegawa test [52], Expected Likelihood Weight [53], and approximately unbiased (AU) test [54] using IQ-TREE (m LG -z concate-nated_trees.treels -n 0 -zb 10,000 -au). We considered tree topology to be unlikely if its test p-value was < 0.05 (marked with a "-" sign). The trees were visualized and examined using FigTree 1.3.1 [55].

Identification of Orthologues and Orthogroups with BUSCO and OrthoFinder
Orthofinder uses a hybrid approach based on sequence similarity estimated by a blast all-vs-all search with diamond and subsequent reconstruction of gene trees and a species tree to identify (single copy) orthologs and paralogs. To identify orthologues of extracted PKSs, we inferred orthogroups with Orthofinder v2.5.4 [56] for (1) all predicted proteins and (2) all extracted TI-PKSs, independently. The independent runs for two datasets were conducted to check for the concordance of the final results. As the results were For the topology test, we used PKS23 sequences in the strict sense, i.e., from the species known as atranorin produces with Gyalolechia flavorubescens and Xanthoria elegans as outgroups. The constrained tree can be multifurcating and need not contain all species; therefore, finally, we shortened our tree to the taxa we wanted to test. First, we performed a constrained search for both topologies using the LG model for the amino acid dataset. Then, we concatenated both trees and implemented tree topology tests, based on the Kishino-Hasegawa test [51], Shimodaira-Hasegawa test [52], Expected Likelihood Weight [53], and approximately unbiased (AU) test [54] using IQ-TREE (-m LG -z concatenated_trees.treels -n 0 -zb 10,000 -au). We considered tree topology to be unlikely if its test p-value was <0.05 (marked with a "-" sign). The trees were visualized and examined using FigTree 1.3.1 [55].

Identification of Orthologues and Orthogroups with BUSCO and OrthoFinder
Orthofinder uses a hybrid approach based on sequence similarity estimated by a BLAST all-vs-all search with DIAMOND and subsequent reconstruction of gene trees and a species tree to identify (single copy) orthologs and paralogs. To identify orthologues of extracted PKSs, we inferred orthogroups with Orthofinder v2.5.4 [56] for (1) all predicted proteins and (2) all extracted TI-PKSs, independently. The independent runs for two datasets were conducted to check for the concordance of the final results. As the results were not contradictory, we discussed the result of TI-PKS only. The Markov Cluster (MCL) inflation parameter was set up by default (1.5). The trees utilized by Orthofinder were reconstructed using Fasttree v2.1.10 (-m msa, -A muscle, and -S diamond (default)) [57].

Type I Iterative PKS Alignment
We performed a phylogenetic analysis of all TI-PKSs identified in twenty-three studied genomes of Lecanoromycetes, to assess the relationship of non-reducing PKSs (NR-PKS) and reducing PKSs (R-PKS) (including partly reducing PKSs). First, we prepared a list of all identified TI-PKS based on our functional annotations. We retrieved the sequences based on the corresponding gene ID numbers included in the gff3 file of each genome using a custom python script (select_transcript.py: https://github.com/reslp/genomics/ blob/master/select_transcripts.py; accessed on 13 October 2021). We included predicted ketoacyl synthase (KS) amino acid alignment reported by Kroken et al. [14] to enhance our dataset. The inclusion of reference PKSs enables us to compare the tree topology proposed in Kroken et al. [14] to our larger sampling of putative PKS genes.

General Characteristics of De Novo Bacidia rubella Genome
We report the de novo assembled genome of the lichen-forming fungus Bacidia rubella obtained from an axenic fungal culture. Our hybrid approach to sequencing, using Illumina short-read with Oxford Nanopore long-read, resulted in a high-quality genome assembly of B. rubella. The final assembly had a size of 33.52 Mb in 51 scaffolds, an N50 of 1.77 Mb (Table 1), and was 98% BUSCO complete (fungi_odb9). Using multiple ab-initio gene-calling methods, 8773 genes were identified ( Table 1). The similarity of standard genome metrics between B. rubella and B. gigantensis [63] suggests a high-quality B. rubella genome ( Table 1).
The genome of B. rubella contains 31 BGCs (Table 2), including six non-reducing and four reducing TI-PKS sequences. This TI-PKS biosynthetic arsenal is similar to that of B. gigantensis, containing 11 identified TI-PKS genes (seven NR-PKSs and four R-PKSs, respectively; Table 2). Despite the overall similarity of TI-PKS gene numbers in B. rubella and B. gigantensis, our BLAST-based comparison of these biosynthetic clusters showed only two significant hits between the two species: one for R-PKS (bacgigpred_003963 and bacrubpred_000636), and one for NR-PKS (bacgigpred_008278 and bacrubpred_000202), with 62.8 and 63% similarity in amino-acid sequences, respectively. This result is not surprising, given that both fungal species differ in their secondary metabolite profiles. The only secondary compound of B. rubella is atranorin, which is the most widespread secondary metabolite found in the genus Bacidia s. lat. (Ekman 1996) [64]. In contrast, B. gigantensis is currently the only known Bacidia species to produce homosekikaic acid [65]. Table 2. Overview of biosynthetic gene clusters and polyketide synthase families found in the twenty-three studied fungal genomes. The occurrence of the major secondary substance of B. rubella, atranorin, is highlighted. All sequences except the de novo sequenced B. rubella genome are from NCBI, if not otherwise specified in the first column.  The two species Ramalina intermedia and R. peruviana belong to the same family as the genus Bacidia but differ in having many more BGCs compared to B. gigantensis and B. rubella. In detail, R. intermedia contains 54 BGCs, including thirteen non-reducing, seventeen reducing and one partially reducing TI-PKS sequences. Ramalina peruviana, on the other hand, contains 43 BGCs including nine non-reducing and seven reducing genes, and one partially reducing gene (Table 2).

Atranorin
Atranorin is the main secondary compound known in B. rubella [64], and its biosynthetic pathway has received attention in previous studies on lichen-forming fungi [26,29]. Our phylogenetic results-which included previously identified sequences of genes involved in atranorin production from Cladonia species and Stereocaulon alpinum-revealed a putative PKS23 homolog in B. rubella. This is consistent with the phylogenetic investigation of Kim et al. [29], where PKS23 sequences were reported to group together with sequences from atranorin-producing lichen-forming fungi (see Section 3.3 Type I PKS phylogeny). The domain configuration of B. rubella PKS23 also supports our phylogenetic results, showing the same organization of putative atranorin BGC in B. rubella as has been reported for Cladonia rangiferina (Figure 2). The B. rubella PKS23 cluster contains a cytochrome P450 domain (atr2) required for oxidation, as well as an O-methyltransferase (OMT) domain (atr3) and transporter gene (atr4), which are involved in atranorin biosynthesis [29]. A BLASTp search of the OMT domain (atr3) in B. rubella PKS23 resulted in 36% protein sequence identity to Trt5 (UniProtKB accession no. Q0C8A3). This level of similarity is consistent with previous findings by Kim et al. [29] for C. rangiferina PKS23.
sults, showing the same organization of putative atranorin BGC in B. rubella as has been reported for Cladonia rangiferina ( Figure 2). The B. rubella PKS23 cluster contains a cytochrome P450 domain (atr2) required for oxidation, as well as an O-methyltransferase (OMT) domain (atr3) and transporter gene (atr4), which are involved in atranorin biosynthesis [29]. A BLASTp search of the OMT domain (atr3) in B. rubella PKS23 resulted in 36% protein sequence identity to Trt5 (UniProtKB accession no. Q0C8A3). This level of similarity is consistent with previous findings by Kim et al. [29] for C. rangiferina PKS23.  The newly identified, putative PKS23 sequence of B. rubella was recovered as a sister to PKS23 sequences from C. rangiferina and S. alpinum, suggesting a sister-group relationship of Bacidia with Cladoniaceae (See Figure 3 in Section 3.3: Type I PKS Phylogeny), whereby this clade formed a sister-group to the PKS23 sequences from Parmeliaceae (schematically shown in Figure 1). This is in contrast to the family-level taxonomy previously established, e.g., by Peršoh et al. [66] and Miadlikowska et al. [50], where Cladoniaceae is more closely related to Parmeliaceae than to Ramalinaceae. To test the monophyly of the clade comprising Bacidia + Cladoniaceae compared to a group comprised of Parmeliaceae and Cladoniaceae, we performed a phylogenetic tree topology test on the PKS23 dataset. Specifically, we compared the unconstrained ML tree recovering Bacidia + Cladoniaceae as monophyletic (Topology 1) versus the constrained tree recovering Cladoniaceae + Parmeliaceae as monophyletic and Bacidia sister to that branch (Topology 2; Figure 1).
Our results showed that the topology constraining Cladoniaceae + Parmeliaceae as monophyletic is significantly less likely (0.8%, Figure 1, Table 3). Possible explanations are that the ancestor of Bacidia may have acquired the PKS23 gene from an ancestor of Cladoniaceae via horizontal gene transfer or that the PKS23 genes were generated by convergent evolution. It is obvious that these hypotheses require additional testing with augmented sampling of PKS23 sequences from Cladoniaceae, Parmeliaceae, and Ramalinaceae. Both scenarios may also explain the scattered occurrence of atranorin in Ramalinaceae. Even though a PKS16 gene from Cladonia grayi was first shown to be involved in grayanic acid production by Armaleo et al. [18], a homolog of this gene from C. rangiferina was later suggested to be involved in atranorin production by Elshobary et al. [26]. In our phylogeny, these genes group together with genes from several other Cladonia and lichenforming fungi (PKS16 clade, See Figure 3 in Section 3.3: Type I PKS Phylogeny), but not all of them are known as atranorin producers (Table 2). Moreover, atranorin is a B-orcinol depside, and a corresponding biosynthetic gene would require a CMeT domain to add a methyl group to the depside ring. This is not the case for PKS16, which is thought to be involved in the synthesis of orcinol depsides [25]. Our comparison of known metabolites in lichen-forming fungi containing PKS16 does not reveal a clear pattern either (See Figure 3 in section: Type I PKS Phylogeny, Group NR-I, PKS16 clade; Table 2). As such, the biosynthetic role of PKS16 genes remains elusive.

Homosekikaic Acid
Homosekikaic acid is a major secondary metabolite in Bacidia gigantensis as well as in both Ramalina intermedia and R. peruviana. However, it has not been found in B. rubella. To our knowledge, there was no putative PKS reported as being involved in the biosynthesis of homosekikaic acid, and its biosynthesis has not been characterized using gene expression or heterologous expression experiments. To identify candidate genes involved in the biosynthesis of homosekikaic acid in these three species, we used BLASTp on all predicted BGC sequences from the two Bacidia and Ramalina species. We identified three BGC candidates with BLAST similarity ranging from 53 to 93%, but none of them were a TI-PKS. Instead, we recovered Type 3-PKS homologs from B. gigantensis (BGC 1.2), R. intermedia (BGC 6.1) and R. peruviana (BGC 223.1) as potential homosekikaic biosynthetic genes; however, none of B. rubella BGCs showed a high similarity to them. Furthermore, our TI-PKS phylogeny did not reveal any clade with both Ramalina species and B. gigantensis that excluded B. rubella. A possible explanation for the observed pattern is that the BGC responsible for synthesizing homosekikaic acid is present in the genome of B. rubella, but not expressed, and therefore homosekikaic acid is not produced in detectable amount. This hypothesis requires testing with gene expression experiments and analyses of substance profiles.

Other Biosynthetic Genes Identified In Silico in the Two Bacidia Species
Using further in silico analyses with antiSMASH, we were able to identify genes encoding enzymes to synthesize secondary metabolites such as clavaric acid (100% similarity) and squalestatin S1 (40% similarity). These terpenes have been identified in both Bacidia. Moreover, we identified a monascorubrin biosynthetic gene in B. rubella confirmed by 100% BLAST identity to the monascorubrin biosynthetic gene from Talaromyces (Penicillium) marneffei (PKS3: HM070047) [67]. Apart from monascorubrin biosynthesis, PKS3 genes were suggested to be involved in the production of the well-known toxin citrinin, as well as a yellow pigment, ankaflavin [67]. Monascorubrin and its related compounds are polyketides used as natural red colorants for food [68]. In B. rubella, monascorubrin could be responsible for the characteristic orange to the orange-brown coloration of the apothecia. However, these substances have not yet been reported from B. rubella.
In the genome of B. gigantensis, we identified genes possessing high sequence similarity with genes involved in the production of pyranonigrin E (100% similarity to BGC0001124), naphthopyrone (100% similarity to BGC0000107), and melanin (100% similarity to BGC0001265). However, in most cases, only a part of the sequences showed a high percentage of similarity; therefore, our in-silico based report is provisional. Additional detailed studies are necessary to confirm these first functional assignments.

Biosynthetic Gene Composition in Twenty-Three Annotated Fungal Genomes
The basic statistics for all twenty-three studied genomes are provided in Table 4. According to BUSCO homology searches against the fungal dataset (fungi_odb9), most genomes were highly gene complete. We included only two genomes with completeness below 90%, namely Alectoria sarmentosa (75.8%) and Graphis scripta (88.6%). The number of predicted genes was in the range of 6756 to 11,072. The lowest number of predicted genes was observed for Ramalina peruviana (most likely due to issues in the quality of the assembly given the lower sequencing depth of 2×), and the highest number was observed for Evernia prunastri (Table 4).  We investigated the BGCs predicted in all twenty-three studied genomes that belong to different taxonomic groups and synthesize a plethora of secondary metabolites ( Table 2;  Table 4). Our results revealed a high number of BGCs, with an average of 48 clusters per genome. The smallest number was recovered in the genome of Lasallia pustulata (26 BGCs), and the highest in Evernia prunastri (86 BGCs), which agrees with the results reported by Calchera et al. based on 15 lichenized genomes [6]. In nearly half the genomes, NR-PKSs are more common than R-PKS (Table 2). This evidence is in contrast to previous results reporting that R-PKS gene numbers exceed the number of NR-PKS genes [6]. As such, a larger data set is necessary to confirm whether R-PKS or NR-PKSs are more numerous in the genomes of lichen-forming fungi. The total number of TI-PKSs identified across all studied genomes was 478. Of those, 44.35% were of the non-reducing (NR), 53.35% of reducing (R), and 2.3% of partly reducing (PR) PKSs. The highest number of PKS genes was found in E. prunastri (36 TI-PKS clusters) and C. rangiferina (34 clusters) and the lowest numbers were in B. rubella (10 TI-PKS clusters), B. gigantensis (11 clusters), and Cyanodermella asteris (10 clusters) ( Table 2).
Our broad genomic sampling provides phylogenetic context for the NR-PKS genes of B. rubella and B. gigantensis. The six TI-PKS genes from B. rubella are NR-PKSs (three in Subclade I and three in Subclade II), while for B. gigantensis seven TI-PKS genes are NR-PKSs (all are in Subclade I).
The discrepancy between the large number of recovered TI-PKS sequences and the few experimentally verified secondary metabolites (Table 2) raises questions about the role these genes play in the secondary metabolism, and about the products they produce (e.g., [9,24,25,29]. However, results linking PKS genes to lichen secondary metabolites beyond in silico methods are still scarce.

Type I PKS Phylogeny
Our maximum-likelihood phylogeny of TI-PKS with a total of 624 sequences recovered from twenty-three fungal genomes and supplemented with previously published sequences is the largest analysis of biosynthetic gene content of various TI-PKS genes in lichen-forming fungi to date.
The TI-PKSs phylogenetic tree is divided into six main subgroups ( Figure 3): Bacterial Type II PKS (including bacterial and mitochondrial ketoacyl-ACP-synthetases; used as outgroup), Bacterial Type I PKS (also including some fungal sequences), animal fatty acid synthase (FAS), PR-PKS, NR-PKS, and R-PKS. Lichen PKS genes are distributed across the three of these main subgroups, viz. PR-PKS, NR-PKS, and R-PKS. These groups were also recognized by Kroken et al. [14] and Calchera et al. [6], but both used the single KS domain for tree reconstruction, and the latter subsumed PR-PKS and R-PKS.  to Orthogroup 1 (green); the nine NR-PKS groups (NR-I to NR-IX) belong to Orthogroups 2, 3 and 4 (dark blue, pink and light green, respectively); the ten R-PKS groups (R-I to R-X) belong to Orthogroups 5, 6, 7, 8, and 9 (light blue, red, peach, orange, and lilac, respectively). Characteristic secondary substances for the groupings are given in the corresponding colored boxes.
• TI-PKS domain content mostly corresponds to phylogeny

•
The discrepancy of grouping PKS genes Assigning PKS genes to different groups in fungi on the account of gene domain composition and their synthesized products was introduced by Kroken et al. [14] and later refined based on DH-domain pocket sizes by Ahuja et al. [69] and Liu et al. [70]. This classification was used in several studies investigating PKS gene diversity in lichen-forming fungi, e.g., [29,71]. Despite this existing classification, it remains unclear if all groups indeed form monophyletic clades and if all genes from one group synthesize the same (or at least chemically similar) substances. In our phylogeny, previously proposed groups were not always monophyletic (e.g., NR-II; Figure 3), and experimentally characterized genes from one group have been shown to produce different substances (e.g., PKS81 and AGNPKS1 in NR-V; Figure 3). Additionally, we performed an ortholog clustering using Orthofinder on all TI-PKS sequences to provide an objective way of identifying groups of sequences. This resulted in nine orthogroups (Figure 3: colored clades), named Orthogroup 1 to Orthogroup 9, respectively. Orthogroup 1 corresponds to the PR-PKSs (Figure 3). NR-PKSs are spread across Orthogroups 2, 3, and 4 but none of them correspond explicitly to any of the nine groups identified before by Pizarro et al. [71] and Kim et al. [29]. Orthogroups 5, 6, 7, 8, and 9 contain R-PKSs, where only Orthogroup 6 (R-IV) and 7 (R-V) agree with groups defined in Kroken et al. [14] and Punya et al. [72]. However, all Orthogroups identified here form well supported clades in the PKS phylogeny ( Figure 3). The discrepancy between our results and previous studies needs to be investigated in subsequent studies. In our phylogeny, several groups of the NR-PKS and R-PKS showed differences in the PKS domains present corresponding to the supported clades; these groups do contain support from different sources and thus merit discussion (see below).

Lichen-Forming Fungi Contain Only a Few Partially Reducing PKSs in the Phylogenetic Neighbourhood to Bacterial PKSs
The PR-PKS sequences formed a well-supported clade sister to bacterial Type I PKS sequences with other fungal PKSs (PR-PKS; Figure 3). Their domain configuration is KS-AT-[DH]-KR-ACP, including only a single reductase domain (KR). In contrast to the large NR-and R-PKS groups, PR-PKS contains genes mainly from Aspergillus and Penicillium, and only a few genes from lichen-forming fungi. Genes from the studied fungal genomes have the typical PR-PKS domain composition, but in two genes from two Ramalina (ram-intpred_001715 and ramperpred_002012), the DH domain was missing. PR-PKS genes form a sister clade to Bacterial Type I PKSs (Figure 2). Bacterial T1-PKSs often possess a KR domain catalyzing the first step in the reductive modification of beta-carbonyl centers in the growing polyketide chain. This domain requires NADPH to reduce the keto-to a hydroxy group [73].
The "simple" domain configuration of Bacterial and PR-PKS genes compared to NRand R-PKSs has not escaped our attention. Our study was not designed to specifically test how PKS genes in Ascomycetes were acquired and how they diversified. However, the placement of Bacterial and NR-PKS sequences as the earliest branches in our phylogeny supports the hypothesis suggested by Kroken et al. [14] that fungal TI-PKS genes could have been acquired by an ancient horizontal gene transfer event between bacteria and fungi. Additionally, our results suggest that ancestral TI-PKS genes may have been partially reducing. Under this scenario, NR-and R-PKS evolution could have been connected to domain structure modification (such as gaining SAT in NR-PKS) and subsequent functional diversification. However, this cannot be concluded with certainty without a greatly expanded sample of genomes from other fungal groups, including genomes from early branching fungal lineages and comprehensive ancestral state reconstructions.

Fungal PKSs Producing Non-Reduced Polyketides
The Diversity of NR-PKS Sequences In previous studies, NR-PKSs were divided into nine major groups based on protein sequence similarity and PKS domain content [14,29,69,71]. Similar to previous studies, we observed characteristic domain configurations for the different clades in our phylogeny. The domain structure of NR-PKSs in the studied fungi can be generalized as SAT-KS-AT-DH-ACP-[ACP]-HTH-CMeT-[TE or ADH or NAD]-[Peptidase S9]; however, some PKS genes may deviate from this structure. The observed domain content variations are not random but rather occur in two main patterns: either through the duplication of the ACP domain, or through the addition of an N-terminal TE domain. PKS genes possessing an additional ACP domain are scattered throughout different NR-PKS clades, suggesting multiple independent gains. However, the functional significance of ACP domain duplications is still unknown [14].
Most of the NR-PKS sequences belong to a large clade containing groups NR-I to NR-V and NR-VIII, with domain configuration containing a TE domain at the N-terminal end.
The irregularly present additional TE domain is involved in a thioesterase-mediated product release, which is the most common release mechanism in TI-PKS [74]. It regularly extends to a C-C Claisen cyclization domain (TE/CLC domain), e.g., in Aspergillus parasiticus PksA [75]. Although CYC domains have previously been reported from various fungi [14,69,70], we could not identify them in our analyses and thus did not indicate them in the phylogeny.
Subclade I of NR-PKSs (Including Groups NR-I to NR-V, and NR-VIII) Group NR-I The paraphyletic group NR-I contains several clades with previously characterized sequences involved in the biosynthesis of aromatic compounds derived from orsellinic acid, such as grayanic acid (PKS16), physodic acid and olivetoric acid [24,25], lecanoric acid [76], xanthones, aflatoxin, and naphthoquinones [14]. In addition, the clade also comprises PKS13 sequences from several lichen-forming fungi and Gibberella zeae as well as a PKS15 sequence from Botryotinia fuckeliana with unknown functions.

Group NR-II
Our results show that NR-II is a diverse group, comprised of several clades containing sequences from lichen-forming fungi interspersed with characterized PKS genes proposed to be involved in the biosynthesis of various substances. The earliest branching clade contains genes involved in the 6-hydroxymellein biosynthesis (NR-II; Figure 3), a key intermediate in terrein biosynthesis [77]. Terrein, produced in large quantities by Aspergillus terreus, has phytotoxic activity and has the potential to serve as a novel antibiotic [78]. Our phylogeny revealed sequences of four lichen-forming fungi (cla-grapred_001463, clauncpred_002890, psefurpred_002946, and grascrpred_004317) clustering together with terA gene in A. terreus. A BLAST search of terA (GenBank: EAU38791) against these sequences revealed 58 to 67% similarity at the amino acid level, indicating a high degree of conservation despite an approximate 350-million-year split of Eurotiomycetes (Aspergillus) and Lecanoromycetes [79]. Terrein has not been reported from lichen-forming fungi; therefore, without further experiments and detailed substance characterization, terrein production in lichen-forming fungi remains hypothetical.
Another important pattern is the occurrence of characterized melanin genes and melanin precursors in different clades. Melanins are a diverse group of substances that play a role in virulence, morphogenesis or the response to environmental stress, and can be synthesized via different pathways [80]. PKSs produce two forms of melanin: either 1,8-dihydroxynaphtalene (DHN) melanin or Deoxybostrycoidein-melanin [81]. Melanins are insoluble and thus cannot be studied by standard biochemical methods [80]. In the PKS gene survey by Kroken et al. [14], all melanin synthesizing genes were grouped together in a single monophyletic clade. In our analysis, they are recovered in at least three clades containing sequences from several Lecanoromycetes and Endocarpon pusillum. Sister to NR-II is a clade containing sequences from Colletotrichum lagenarium (PKS1), Glarea sp., Nodulisporium sp., and C. heterostrophus (PKS18), which were also characterized as producers of melanin [14]. The high number of genes from lichen-forming fungi close to characterized melanin biosynthetic genes suggests an essential role of melanins in lichens.

Groups NR-III and NR-IV
The sister groups NR-III and NR-IV contain sequences of proteins producing large polyketide chains, such as conidial yellow pigment Alb1 (NR-III) or aflatoxin/ sterigmatocystin of A. nidulans (NR-IV) [14,71,82].
Several genes recovered in the NR-IV group are potentially involved in producing different fungal pigments. The genes from Ramalina intermedia (ramintpred_005964) and Cladonia grayi (clagrapred_006320) were assigned to an FSR1 gene, which is involved in producing highly pigmented naphthoquinones fusarubins responsible for fruiting body coloration of Fusarium fujikuroi [85]. A BLAST search of the R. intermedia gene (ram-intpred_005964) revealed 72.3% similarity to a PKS of Cladonia metacorallifera (GenBank: QIX11499) that is involved in the biosynthesis of the red compound cristazarin, characterized by antibacterial and antitumor activity [86]. Sequences of Bacidia rubella and several Cladonia species producing red pigments were recovered sister to the putative Ramalina intermedia FSR1 gene. Interestingly, R. intermedia, at the same time, lacks red pigments [87]. This close phylogenetic proximity raises the possibility that combinations of substances synthesized by different PKS genes could be responsible for red-colored pigments (see also discussion on monascorubrin above), particularly for the red fruiting bodies in B. rubella.

Group NR-V
According to the previous studies, NR-V contains PKSs without the TE domain [71]; however, in the characterized PKSs in our phylogeny, the TE or R domains were present in all clades. The genes in the NR-V group are suggested to be involved in the production of different mycotoxins, such as desertorin (Aspergillus nidulans) and atrochrysone (Aspergillus fumigatus) [71,88,89].
The crown clades in the NR-V group include the annotated PKS81 orthologs from Cladonia metacorallifera, C. macilenta, Lasallia pustulata and L. hispanica as sister to a PKS19 sequence from Cochliobolus heterostrophus and two clades with characterized Agnpks1 also from several lichen-forming fungi. Agnpks1 (Atrochrysone carboxylic acid synthase) was originally described from Penicillium divaricatum and is involved in the biosynthesis of agnestins and dihydroxy-xanthone metabolites [90]. Xanthones and related benzophenones are produced by various filamentous fungi. They exhibit insecticide, antioxidant, antibacterial, anti-inflammatory, and anticancer activities [91]. Examples include desmethylsterigmatocystin, a key intermediate of the aflatoxin group of mycotoxins produced by Aspergillus flavus, and a norlichexanthone from the lichen-forming Lecanora straminea [90]. An additional BLAST search of norlichexanthone synthase from L. straminea (GenBank: D7PI15) against our annotated fungal sequences showed at least 37% similarity to sequences from the clade containing PKS81 and Agnpks1. The relationship between these two genes remains unclear, but given our results, PKS81 might belong to or have evolved from Ag-npks1. Additional closer investigations will be necessary to show the role of PKS81 and Agnpks1 orthologs in lichen-forming fungi and their possible role in xanthone biosynthesis.
Subclade II of NR-PKSs (Including Groups NR-VI, NR-VII, and NR-IX) The second largest clade of NR-PKSs includes the previously recovered groups NR-VI, NR-VII, and the recently defined group NR-IX [29]  Although previously reported from all NR-PKS groups except group NR-V [69,71], we did not observe TE domains in groups NR-VI, NR-VII, and NR-IX. Instead, we observed NAD and ADH domains-in some cases followed by a Peptidase S9 (PFAM: PF00326) domain. Peptidase S9 belongs to proteolytic enzymes that consume serine in their catalytic activity, and they are ubiquitously found in viruses, bacteria, and eukaryotes [92]. We also observed a helix-turn-helix domain (HTH) being inserted after one or two ACP domains in several groups. The HTH domain was previously found in PKSs in fungi but it has not been shown in the arrangement of the NR-PKSs of lichen-forming fungi before. Proteins with an HTH motif are, for example, involved in DNA repair, RNA metabolism, and protein-protein interaction [93], and could thus be involved in developmental or morphogenic processes.
Group NR-VI Characterized PKSs in this group include PKS18 and PKS19 genes from Botryotinia fuckeliana. Genes from this group were suggested to be involved in usnic acid biosynthesis [71], and according to Kim et al. [29] they belong to PKS8. Our results show that this group is divided into two clades. The first clade contains genes of known usnic acid producers such as Ramalina intermedia, R. peruviana, Cladonia metacorallifera, C. macilenta, C. uncialis, Evernia prunastri, Alectoria sarmentosa, Letharia columbiana, and Usnea hakonensis, interspersed with genes from lichen-forming fungi known not to produce usnic acid. The second clade includes genes mostly from lichen-forming fungi not producing usnic acid, including a gene from B. rubella (bacrubpred_000551). However, sequences from Cladonia macilenta, C. uncialis, and Evernia prunastri were also recovered in this clade.

Group NR-VII
The only genes characterized in this group are PKS17 from Botryotinia fuckeliana and PKS21 from Cochliobolus heterostrophus. These genes have been proposed to produce citrinin or lovastatin [14,94,95]. The other identified genes from the lichen-forming fungi in this group had 50 to 70% BLAST similarity to genes involved in the biosynthesis of monascorubrin, citrinin, conidial yellow, azanigerone A, and stipitatic acid (see also discussion on monascorubrin above). As mentioned above, the biosynthetic gene from B. rubella (bacrubpred_007642) has a 100% identity to the monascorubrin biosynthetic gene.

Group NR-IX
The recently recognized group NR-IX [29] contains several predicted PKS23 sequences, involved in the biosynthesis of atranorin (see sections on 3.1.1 Atranorin and 3.1.2 Homosekikaic acid above) and methylated orsellinic acid derivatives. In contrast to the groups NR-VI and NR-VII, NR-IX sequences always contain an ADH and lack a NAD domain after CMeT, while Peptidase S9 is only sporadically present. It is necessary to note that only one ACP domain was present in the PKS23, including atranorin producers. In addition, other characterized PKSs of this group are from Cochliobolus heterostrophus (PKS22 and PKS23) and Botryotinia fuckeliana (PKS16 and PKS20).

Fungal PKSs Producing Reduced Polyketides
The Diversity of R-PKS Sequences R-PKS genes are involved in synthesizing various reduced, usually linear polyketides. Many are precursors of toxins that are active in animals (e.g., lovastatin, citrinin, and fumonisin) or plants (e.g., T-toxin and PM-toxin) [96][97][98][99]. We follow this classification in our discussion whenever possible, while also highlighting deviations. Compared to NR-PKSs, less is known about the possible roles of R-PKSs. In our phylogeny, R-PKSs formed a large clade sister to NR-PKSs with many well-supported subclades. Unlike what has been shown in previous studies, all studied R-PKSs here lacked a second ACP domain, and in some cases, ACP was missing. According to Kroken et al. [14], four major groups (R-I to R-IV) of R-PKS genes can be distinguished based on their overall domain structure and produced compounds. This classification was later extended to groups R-V to R-VIII by Punya et al. [72]. Over 70% of the studied fungal sequences could be assigned to groups designated by Kroken et al. [14] and Punya et al. [72] with the general domain content: KS-AT-DH-[CMeT]-ER-KR-[ACP]. Two clades recovered by Punya et al. [72] are missing from our tree, namely R-VI and R-VIII. Clade VI contained uncommon PKS-NRPS hybrids with four additional domains after ACP, viz. condensation (C), adenylation (A), thiolation (T), and reductase (R) [72]. As our study focused on the TI-PKSs, it did not include hybrid NRPS-PKS genes. The sequences from R-VIII were not present in our phylogeny, and are thus not feasible to identify with confidence.
Based on our phylogenetic results, we identified two R-PKS clades that did not match any of the R-PKS groups reported in previous studies. Sequences in those two clades had characteristic domain structures and were recovered as distinct orthogroups in our Orthofinder analysis. This leads us to propose these clades as new reducing PKS groups, R-IX and R-X, following the numbering scheme by Kroken et al. [14] and Punya et al. [72].

Groups of R-PKSs
Group R-I Group R-I is the largest group of R-PKSs in our tree. It contains one of the R-PKS genes synthesizing the diketide portion of lovastatin and citrinin, T-toxin, and PMtoxin [14]. Our results show that the genes of this group, recovered in a monophyletic clade by Kroken et al. [14], were nested in different clades in our analysis; therefore, R-I assignment is not complete.