Plastome Diversity and Phylogenomic Relationships in Asteraceae

Plastid genomes are in general highly conserved given their slow evolutionary rate, and thus large changes in their structure are unusual. However, when specific rearrangements are present, they are often phylogenetically informative. Asteraceae is a highly diverse family whose evolution is long driven by polyploidy (up to 48x) and hybridization, both processes usually complicating systematic inferences. In this study, we generated one of the most comprehensive plastome-based phylogenies of family Asteraceae, providing information about the structure, genetic diversity and repeat composition of these sequences. By comparing the whole-plastome sequences obtained, we confirmed the double inversion located in the long single-copy region, for most of the species analyzed (with the exception of basal tribes), a well-known feature for Asteraceae plastomes. We also showed that genome size, gene order and gene content are highly conserved along the family. However, species representative of the basal subfamily Barnadesioideae—as well as in the sister family Calyceraceae—lack the pseudogene rps19 located in one inverted repeat. The phylogenomic analysis conducted here, based on 63 protein-coding genes, 30 transfer RNA genes and 21 ribosomal RNA genes from 36 species of Asteraceae, were overall consistent with the general consensus for the family’s phylogeny while resolving the position of tribe Senecioneae and revealing some incongruences at tribe level between reconstructions based on nuclear and plastid DNA data.


Introduction
The sunflower family (Asteraceae or Compositae) is probably the most diversified of plants, with about 25,000-35,000 species, being distributed worldwide and accounting for ca. 10% of angiosperms [1,2]. The family contains many important crops (such as lettuce, sunflower or artichoke) and many ornamentals (such as marigolds or dahlias) but also many weeds (such as dandelion or some thistles) [1]. Two of the defining morphological traits of the family have been crucial for the evolutionary and ecological success of Asteraceae: the characteristic inflorescence in the capitulum, in which many tiny flowers (florets) are packed in a receptacle, and the cypsela, an indehiscent dry fruit derived from a compound inferior ovary, which usually has adaptations for an effective dispersal and to herbivory [3,4]. Asteraceae has long been the subject of cytological interest, and it is possibly the plant family for which more chromosome counts are available, with x = 9 as the most likely ancestral base number. Hybridization and polyploidy are particularly active in the family, with ploidy levels up to 48x [5], and indeed several whole-genome duplications (WGDs) [6], together with frequent hybridization phenomena [7,8], are linked to the massive diversification of Asteraceae, complicating, at the same time, its systematics. Other cytological features, such as the presence of an exceptional linked arrangement of ribosomal RNA genes in many of its species [9] add interest to the study of this family.
As already supported by early molecular phylogenetic studies [10], Asteraceae constitutes a well-defined family that originated in South America, its most closely related families being Calyceraceae (South American origin) and Goodeniaceae (Southwestern Australian origin) [11,12]. Resolving the systematic relationships within Asteraceae has been, however, a more challenging task. Since the first molecular-based approaches [13,14] to the major compilation and meta-tree analyses by [1], plastid DNA has been a preferred target of researchers interested in Asteraceae evolution. The combination of slowly evolving genic regions and the fast evolutionary rate of intergenic spacers has made plastid markers classical candidates for phylogenetic reconstruction at different taxonomic levels within the family [15]. One of the latest and most comprehensive evolutionary reconstructions at the family level was based on DNA sequences of 11 plastid genes [16]. More recently, the backbone of Asteraceae phylogeny was successfully resolved using "targeted sequence capture" data from 935 nuclear loci [17]. However, some phylogenetic uncertainties, as well as a few incongruences between nuclear and plastid inferences, still remain.
From the structural point of view, plastomes in angiosperms are 120-160 kb long and have a quadripartite structure with two single-copy regions (LSC, long, and SSC, short single copy) separated by two inverted repeats (IRA and IRB) [18]. Each plastid genome of Asteraceae investigated to date is around 150 kb long and contains ca. 80 protein-coding genes, four ribosomal RNAs (rRNAs) and 30 transfer RNAs (tRNAs) in the expected quadripartite organization [19]. Large changes in plastid DNA structure across land plants are unusual, but some families such as Geraniaceae, Fabaceae and Ericaceae, do show several plastome rearrangements (e.g., expansions, contractions, inversions or losses of an IR; [20]), in many cases reported as phylogenetically informative. In Asteraceae, all plastomesexcept those from Barnadesioideae, a small basal subfamily with roughly 100 speciesshare a distinctive structural feature: a double inversion in the plastid DNA [14]. Both inversions are located in the LSC region, with the larger inversion (~22.8 kb long) containing the second one (~3.3 kb long). These inversions have been confirmed using both Sanger and next-generation sequencing (NGS), the latter from a few species of the family (e.g., Artemisia capillaris [21]). However, further studies including a larger representation of species could provide new insights into the structural variability of Asteraceae plastomes.
The advent of NGS technologies, as well as their continuously descending costs, has enabled the massive generation of genomic data for multiple species. In plants, wholegenome sequencing (WGS) datasets frequently include plastid DNA data. The development of de novo assembly bioinformatics tools such as NOVOPlasty [22] or SOAPdenovo2 [23], free and relatively easy to work with, has further facilitated the reconstruction of plastomes. Following those approaches, the plastid genomes of some species from Asteraceae have already been sequenced and published, now being stored in public repositories. However, the available genomic data represents a scattered and uneven taxonomic sampling, and thus more data are needed to analyze the diversity of plastomes within the family. In this study, we combined (i) seven previously published plastome reconstructions, (ii) seventeen new plastome assemblies obtained from WGS raw data stored in repositories and (iii) twelve new plastomes assembled from Illumina sequences specifically generated for this work. This strategy allowed us to build a comprehensive dataset with the whole plastid genome of 36 species representing the most important subfamilies and tribes of Asteraceae. We used these data to thoroughly characterize the plastome variability of the family. We also inferred the most complete plastid phylogenomic reconstruction carried out in Asteraceae, comparing our results to previous phylogenetic and phylogenomic studies on the family.

Plastome Reconstruction in Asteraceae
The mixed sampling strategy, combining Illumina sequences generated for this study and raw genomic data obtained from public repositories, resulted in 30 new plastomes for Asteraceae, completely reconstructed and circularized. All these plastomes have the standard structure typically found in angiosperms, comprising two copies of the IR region (24, 126 to 25,245 bp), separated by the LSC (82,297 to 85,288 bp) and the SSC  (17,859 to 18,786 bp) regions (Table 1). Plastid lengths were similar in all species analyzed, Achillea millefolium being the species with the smallest plastid genome (149,113 bp) and Melampodium linearilobum the biggest one (153,872 bp). The GC content of the assembled plastomes ranged between 37% and 38.1%. After all quality controls, the read coverage per nucleotide ranged between 10 and 6021, and the percentage of Ns in whole-plastome reconstructions ranged between 0.00% and 3.80% (Table 1).

Phylogenetic Analysis
The resulting tree topology from Bayesian and ML inferences is shown in Figure 1. BI and ML trees were identical in topology ( Figure S1), both phylogenetic approaches being highly resolved but BI showing stronger support in a few branches. In addition, phylogenetic inferences drawn with different combinations of CDSs, tRNA and rRNA and species datasets ( Figure S1) to validate the topology of the tree were congruent with the main topology result. The phylogenetic positions of all early-divergent clades of the family Asteraceae (i.e., subfamilies Barnadesioideae, Mutisioideae and Carduoideae) are well-supported (pp = 1, BS = 100, in all cases). The monophyly of subfamily Cichorioideae is also well-supported, being placed as the sister group of the subfamily Asteroideae (pp = 1, BS = 100). The subfamily Asteroideae is divided into two major clades, one representing the Helianthodae supertribe and another constituted by supertribes Asterodae and Senecionodae [16,24], here grouped in a fully supported clade. In the Helianthodae supertribe, the node grouping tribes Heliantheae and Millerieae only shows high support in the BI tree (pp = 0.99, BS = 62), the same pattern occurring in the group constituted by tribes Eupatorieae and Madieae (pp = 0.99, BS = 64). In the Asterodae + Senecionodae clade, there are only two nodes that do not show the highest support according to both phylogenetic approaches: the split between tribe Gnaphalieae and tribes Anthemideae and Astereae (pp = 0.91, BS = 60) and the branch grouping tribes Anthemideae and Astereae (pp = 1, BS = 67).

Structural Comparison of Plastomes
All Asteraceae plastomes, together with those of outgroups Nastanthus patagonicus (Calyceraceae), Scaevola taccada (Goodeniaceae) and Menyanthes trifoliata (Menyanthaceae), were used to perform a comparative analysis of plastid DNA structure across the MGCA clade and within family Asteraceae. The results obtained from this analysis (Figures 2 and S2) indicate that most Asteraceae plastomes exhibit a high level of sequence similarity and structural conservation. The analysis confirms the existence of a rearrangement in the LSC consisting of a double inversion: one large inversion of~22.8 kb and one small inversion of~3.3 kb nested within the large one. This rearrangement is present in all species of subfamilies Mutisioideae, Carduoideae, Cichorioideae and Asteroideae, while it does not appear in subfamily Barnadesioideae nor in the sister family Calyceraceae ( Figure 2). Family Menyanthaceae has the same plastome structure as Calyceraceae and the members of the tribe Barnadeiosideae. However, the plastome of the species representing Goodeniaceae, the sister family to the clade constituted by Calyceraceae and Asteraceae, is full of rearrangements-not only in the LSC region but also in the SSC and in both IRs-and thus its LSC structure could not be compared with the other families.  Nodes without numbers are considered nodes with maximum support for both ML and BI approaches. Nodes with numbers are spots with low support in at least one phylogenetic inference approach (posterior probability/bootstrap). Low support is defined as: posterior probability < 1; bootstrap < 70.
By visualizing the expansions and contractions of the boundaries of the IR regions ( Figure 3), we also found an additional structural difference in the plastome shown by the family Calyceraceae and the subfamily Barnadesioideae as compared to the plastomes of the remaining Asteraceae. Most Asteraceae species harbor the same genes in the boundaries between IR regions and single-copy regions (LSC/IR A : rps19; IR A /SSC: ycf 1; SSC/IR B : Ψ ycf 1; IR B /LSC: Ψ rps19). However, species from subfamily Barnadesioideae and the Calyceraceae Nastanthus patagonicus lack the Ψ rps19 pseudogene localized in the boundary between the IR B and the LSC (Figures 3 and S3).

Characterization of Sequence Divergence, Repeats and SSRs
We compared coding genes, tRNA, rRNA and intergenic spacers across all Asteraceae species examined to find nucleotide divergence hotspots. For the 225 regions analyzed (29 tRNA, 83 CDS, four rRNA and 109 intergenic spacers) the π value within Asteraceae ranged from 0 (ndhA-ndhH, trnR-ACG, rpl2-rpl23) to 0.43814 (ycf 1-ndhF) ( Figure 4). Considering the whole family, most of the intergenic spacers could be regarded as highly divergent regions, surpassing the threshold of π = 0.05, with the exception of the inverted repeat, where any region outweighs the threshold value.
For the entire plastid genomes, the IR regions (IR A π = 0.00768, IR B π = 0.00791) were more conserved than both the LSC (π = 0.03611) and the SSC (π = 0.05974). Both IRs recovered the same nucleotide diversity for almost all genes, being mirror images one of the other. Regarding the different genomic regions analyzed in this study, the tRNA and the rRNA were the regions with lower nucleotide diversity (tRNA π = 0.01558, rRNA π = 0.00109), followed by the CDS (π = 0.02238) and the intergenic spacers (π = 0.04971).

Discussion
In this study, we analyzed 36 complete plastid genomes from species of a single family, 30 of which were de novo assembled for the first time. This extensive sampling helped us understand the structural diversity of Asteraceae plastid genomes in a phylogenetic context, covering the most relevant tribes of the family. Our work also constitutes the most complete phylogenomic approach based on whole plastid genomes performed in Asteraceae, complementing the recent evolutionary studies of the family based on nuclear and plastid phylogenomic data.

Structural and Nucleotide Diversity of Asteraceae Plastomes
The plastid genomes of the species here studied present the typical quadripartite structure, and they are in the genome size range for land plants (Table 1, Figure 1) [25]. The length of plastomes included in our analyses showed a maximum variability of 4759 bp, which is congruent with the length differences found in previous Asteraceae plastomes published in GenBank. Usually, plastid genomes exist in two different structural haplotypes within an individual, differing in the orientation of the LSC or SSC [26]. We only detected one such structural haplotype in all the plastid genomes assembled in this study (Figure 2), most likely as an artifactual consequence of the plastome reconstruction method based on short reads; possibly, the use of long reads would have revealed the existence of both haplotypes [27]. Previous comparative analyses of plastomes in Asteraceae show similar results regarding structure, genome size and haplotype found [27][28][29]. While the structure of all chloroplast genomes analyzed was highly conserved, when early-divergent Asteraceae (i.e., subfamily Barnadesioideae) were compared with the rest of the family, two rearrangements were found in the LSC region ( Figure 2): a large inversion of~22.8 kb and a smaller~3.3 kb inversion nested within it. This double inversion is localized in all major clades of Asteraceae, except in the early-diverging subfamily Barnadesioideae. The phylogenetic distribution of these rearrangements is consistent with previous results based on restriction endonuclease digestions, PCR and Sanger sequencing [14,30], which first detected these inversions and estimated that they originated during the late Eocene. In our study, we report for the first time that those Asteraceae species showing the double inversion in the LSC region (i.e., all of them except those belonging to Barnadesioideae) also present a pseudogenized rps19 at the end of the IR B . In contrast, Barnadesioideae subfamily as well as Calyceraceae family members show an alternative structure, lacking the pseudogenized rps19 (Figure 3) in one of the inverted repeats. Recent studies of plastome diversity show that the absence/presence of this pseudogene is scattered across angiosperms [31][32][33]. Our results suggest that this structural plastome feature (i.e., the presence of the pseudogenized rps19) might show a phylogenetic signal in Asteraceae, putatively co-occurring (or happening over a very short time span) with the double inversion event. However, this result must be taken with caution as [34] reported absence of the rps19 pseudogene in the Asteraceae species Artemisia annua while showing its presence in all the other Asteraceae and Artemisia species there analyzed.
The representative species from family Menyanthaceae (i.e., the sister family of Goodeniaceae, Calyceraceae and Asteraceae) seems to have a plastid genome structure similar to that found in Calyceraceae and in Barnadesioideae species. However, the representative of family Goodeniaceae included in the study shows a plastome structure completely altered by rearrangements as compared to the rest of the families within the MGCA clade. As recently reported by [35], plastomes from Goodeniaceae have many intergenic regions with a low GC proportion and many repeats, which may cause technical problems at the assembly level, possibly leading to an artefactual structure. Further work using new approaches of long-read sequencing (e.g., Oxford Nanopore or PacBio technologies) could help to correctly assemble and explore the sequence of Goodeniaceae plastid genomes [36].
At the level of genetic diversity, we found that intergenic spacers of Asteraceae have more than twice the nucleotide diversity of tRNA, rRNA and CDS, most likely because intergenic regions are not subject to selection constraints, allowing higher sequence divergence [37]. Regarding genic regions, tRNA and rRNA are more conserved than CDS, possibly due to the housekeeping function of the former two. There are also different levels of sequence conservativeness depending on the region, IR sequences being less variable than those of SSC and LSC, as reported previously [38][39][40]. The main reason could be, again, that IRs are harboring important housekeeping genes such as structural ribosomal RNA genes (rrn4.5, rrn23 and rrn16), highly conserved even in organisms with short IRs that possess only rRNA genes and few intergenic spacers, such as in some algae [41].
Due to the combination of fast-and slow-evolving regions, uniparental inheritance and ease of amplification, plastid DNA markers are among the preferred targets of many phylogenetic and phylogeographic studies. Based on their relatively high nucleotide divergence, we identified several genic regions that could be used as molecular markers in studies of Asteraceae species beyond the well-known CDS matK [42] and ndhF [43], such as the CDS rpl22, accD, ccsA and ycf 1 and the tRNAs trnK-UUU, trnE-UUC and trnT-GGU.
Repetitive sequences play an important role in plastome rearrangements and could be useful to understand the evolution of plant species and sequence divergence [44]. The two main repeated motifs in plastomes are microsatellites and dispersed repeats [45]. The microsatellite data reported in our study can be useful as potential markers for evolutionary and genetic population analyses of Asteraceae [46,47]. Our results suggest that there are no relevant differences in the proportion of tandem repeats between tribes, and microsatellites are also in the expected range found in other studies, such as [45] (91 to 94 microsatellites in Chaenomeles), ref. [39] (172 microsatellites in Hagenia) and [48] (116 microsatellites in Rubus).

Asteraceae Plyogenomics Based on Plastid DNA
The availability of NGS data for an increasing number of species has hugely contributed to our understanding of the evolutionary relationships among organisms. Including data of 36 species from 19 tribes and 5 subfamilies of Asteraceae, this study represents the most comprehensive phylogenomic reconstruction using whole-plastome data obtained to date in this family (114 loci, 70,000 nt). Previous phylogenetic works had also approached the evolution of this huge and complex family, among the most relevant: (1) a meta-tree analysis combining results at lower taxonomic levels from several research works, based on 10 plastid DNA and one nuclear DNA marker (ITS) [1] and representing all tribes and subfamilies; (2) a phylogenetic inference based on 12 plastid markers representing 13 subfamilies and 40 tribes (17,319 nt) [16]; (3) an exhaustive phylogenomic study based on target sequence capture of 763 nuclear loci representing 13 subfamilies and 45 tribes (269,585 nt) [17]; and (4) a phylotranscriptomic analysis with data from 243 species (13 subfamilies and 41 tribes) within the family [49].
Our phylogenomic reconstruction based on plastome data showed overall consistency with the backbone tree topologies obtained in those previous approaches. However, unlike in the inferences by [1,16,17], we found a differential topology with strict support in basal nodes of subfamily Asteroideae, which could have important consequences for the taxonomy below the subfamily level. Study [1] placed tribe Senecioneae-the largest of the family, with ca. 150 genera and 3000 species-in a polytomy with the clade formed by the Inuleae + Athroismeae + Heliantheae Alliance (i.e., supertribe Helianthodae) and the clade containing Calenduleae + Gnaphalieae + Astereae + Anthemideae (i.e., Asterodae supertribe). Study [16] had previously found Senecioneae (there presented as supertribe Senecionodae) sister of Helianthodae and Asterodae supertribes, but the position lacked statistical support. Study [17] improved the resolution of important basal nodes in Asteroideae, merging Senecioneae with Anthemideae, Astereae, Gnaphalieae and Calenduleae (i.e., supertribe Asterodae). However, these authors found incongruences in the relationships of these five tribes among the trees they generated using different phylogenetic methods. While ML and BI yielded Calenduleae as an early-diverging group of the remaining tribes, the pseudocoalescence analysis (ASTRAL) resulted in high support for a different topology, with Senecioneae as the sister tribe to a clade of the four remaining Asterodae tribes. The same position of Senecioneae-despite not showing full statistical support-was obtained in the reconstruction of [49], also indicating that supertribe Senecionodae would not be monophyletic. Despite the limited sampling, our work provided full resolution and congruence among phylogenetic analyses for those basal nodes in subfamily Asteroideae, supporting the reconstructions obtained by [17] using the ASTRAL approach and by [49] using transcriptomic data, i.e., Senecioneae as a sister tribe to the clade constituted by Anthemideae, Astereae, Gnaphalieae and Calenduleae (i.e., supertribe Asterodae). Therefore, results based on plastome data-congruently with the last phylogenomic reconstructions based on nuclear data-suggest that the supertribe system for the subfamily Asteroideae early proposed by [24] could be recovered, merging tribe Senecioneae with the typical supertribe Asterodae.
As already obtained in previous analyses based on plastid DNA (e.g., [17]), our results support the monophyly of Cichorioideae, while recent phylogenomic approaches based on nuclear data reconstructed this subfamily as paraphyletic (e.g., [16]). According to [49], the different circumscription of Cichorioideae between the phylogenies using nuclear or chloroplast data could be explained by a potential hybridization during the evolution of Cichorieae. Our phylogenetic inference also differs from previous systematic reconstructions of the family based on nuclear and plastid markers in the relationships among the tribes of the Heliantheae alliance. As already reported by other phylogenetic studies mainly based on plastid DNA markers, such as [1,16], we found the important tribe Coreopsidae (550 spp.) in an early-diverging position within the Heliantheae alliance ( Figure 1). In contrast, in the phylogenomic reconstructions based on nuclear data by [17] or [49], Coreopsidae is placed in a much-derived position as a sister tribe of Heliantheae s.s. There are other incongruences between plastid and nuclear genomes regarding the position of some tribes in the Heliantheae alliance. According to our results, as well as to other studies based on plastid DNA data (e.g., [16]), tribe Tageteae was sister to tribe Bahieae. In contrast, Tageteae was placed either as sister to Millerieae [17] or as sister to the weakly supported clade constituted by Madieae, Chaenactideae, Bahieae, Perityleae and Eupatorieae [49] in phylogenomic reconstructions based on nuclear DNA data. These incongruences between plastid and nuclear genomes have been explained by potential hybridization events involving plastid capture [16,49]. However, considering that the Heliantheae alliance is thought to be one of Asteraceae groups experiencing faster radiation [17], incomplete lineage sorting could also be here a possible source of phylogenetic incongruence [39].

DNA Preparation and Sequencing
The total DNA of 13 species (Table 1) was isolated from dried leaf material using either a modified CTAB protocol [55] Table S1).

Genome Assembly and Annotation
The quality of all raw sequencing data-13 species sequenced for this study and 17 obtained from SRAs-was assessed by FastQC version 0.11.9 (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/, accessed on 2 November 2020). Plastid genome reconstruction was performed with a mixed strategy combining de novo reconstruction of all plastomes and mapping assemblies of reads. De novo reconstruction of plastid sequences of 31 species was performed through the NOVOPlasty pipeline version 2.6.3 [22], using the raw whole dataset of Illumina reads, previously trimming the adapters, as recommended by the authors. After this de novo assembly, all raw data were additionally filtered based on the following rules: (i) adapter trimming; (ii) quality control: each read has >90% of bases with a quality cut-off value of >20. These filtering steps were carried out using CLC Genomics Workbench 10.0.1 (CLC-BIO, Aarhus, Denmark). These high-quality reads were then mapped to the circular reconstructions obtained with NOVOPlasty using Geneious version 2020.1.1 (Biomatters, Auckland, New Zealand) with the default mapping parameters, obtaining a consensus sequence for each species. All bases with <10 coverage were replaced by Ns.

Plastome Phylogenetic Analyses
To estimate the phylogenetic relationships in Asteraceae based on plastome data, 36 species of this family were analyzed, together with one species of the sister family Calyceraceae (Nastanthus patagonicus) as outgroup, discarding species from families Goodeniaceae and Menyanthaceae to avoid amiss alignments. Both maximum likelihood (ML) and Bayesian inference (BI) analyses were performed on a dataset of 37 plastome sequences including a single IR (i.e., removing one of the IR copies to avoid redundant information), the SSC and LSC regions. All genes present in the dataset were extracted separately and then concatenated discerning between CDS, tRNA and rRNA; the non-coding regions were excluded from the analysis to be sure that all nucleotides were aligned with their homologous. The concatenated matrix was aligned using MAFFT version 7 [57] and then manually checked with Geneious, resulting in a dataset of 77,259 nucleotide sites. The aligned dataset was partitioned by separating the coding genes from the tRNA and rRNA, as well as by categorizing the nucleotides in each CDS based on the position they occupy in a codon (first, second or third). For the BI, the program MrBayes version 3.2.6 [58] included in the web-server CIPRES [59] (https://www.phylo.org/, accessed on 4 May 2021) was used to run two independent Markov chains Monte Carlo (MCMC) for 5,000,000 generations, with tree sampling every 1000 generations. The average standard deviation was confirmed to be less than 0.01, and the potential scale reduction factor was near 1.0 for all parameters. For the ML inference, the program RAxML version 8.2.10 [60] was used with 1000 bootstrap replicates and other parameters using the default settings. For both BI and ML approaches, PartitionFinder2 [61] was used to select the best evolutionary model for each concatenated region, chosen by selecting the scheme with the lowest AICc score. For the BI, the best partition scheme and substitution model was fitted for each region analyzed (Table S2). Regarding ML inference, RAxML allows for only a single evolutionary model in partitioned analyses, which was selected according to PartitionFinder2 results (i.e., GTRGAMMA). For both inferences, the first 25% of the trees were discarded as "burn-in", and the posterior probabilities/bootstraps were estimated constructing the 50% majority-rule consensus tree.
To validate the topology obtained with the previous phylogenetic analysis and the support values obtained within the family Asteraceae, additional phylogenetic analyses using datasets with different combinations of genes and non-coding regions were performed. These subsets were: (i) CDS + tRNA + rRNA, (ii) CDS and (iii) the whole-plastid-genome sequences ( Figure S1). Only the BI approach was used to perform the phylogenetic analysis for these datasets, employing the same parameter selection options mentioned above (Table S2).

Plastome Diversity Analyses
The complete plastomes of the Asteraceae analyzed in this study were used to estimate the structural and nucleotide diversity of plastid DNA within the family. Plastome rearrangements were checked among Asteraceae and the phylogenetically close families Calyceraceae, Goodeniaceae and Menyanthaceae (i.e., the "MGCA clade"; APG IV, 2016) to explore if the double inversion in the plastid DNA-as well as other possible structural changes-are apomorphies only found in Asteraceae or if these rearrangements are also present in close families. Structural changes across plastid genomes of Asteraceae and proximal families were analyzed via whole-genome alignment in Mauve version 2.4.0 [62], with the Mauve algorithm using default parameters. The expansion and contraction of the inverted repeat (IR) boundaries were also explored in order to check if these regions show differential patterns in length and gene annotations across species.
Previously to all analyses performed, the dataset containing all plastomes was aligned using MAFFT version 7 aligner [57] and then manually adjusted using Geneious. For the screening of the genetic variability between species, the nucleotide diversity (π) was estimated using DNAsp version 6 [63] for all coding DNA sequences (CDS), transfer RNA (tRNA), ribosomal RNA (rRNA) and intergenic spacers found in LSC, SSC and a single IR region.
To characterize repeat sequences in the plastid genomes, we used REPuter [64], with Hamming distance set at 3 and repeat range size from 30 to 90 bp, considering four types of repeats: forward, reverse, palindromic and complement sequences [65]. Tandem repeats were analyzed using the Tandem Repeats Finder [66] with default parameters. Shortsequence repeats (SSRs) were identified using the MISA microsatellite finder [67], with the following thresholds: eight repeat units for mono SSRs, four repeat units for di-and trinucleotide repeat SSRs and three repeat units for tetra-, penta-and hexanucleotide repeat SSRs.

Conclusions
In conclusion, whole-plastid-genome data have been established as a powerful tool to understand evolutionary trends in Asteraceae, adding support to previous systematic inferences based on different markers. Our results confirm the double inversion in plastid DNA occurring during the early evolution of Asteraceae and reveal an additional structural change-appearance of a rps19 pseudogene-that could be evolutionarily linked to the two inversions. Our work also contributed information about the gene composition, nucleotide diversity and repeat content in Asteraceae plastomes, which could be useful for the design of novel molecular markers for phylogeographic and population genetic studies. Finally, the phylogenomic reconstruction based on whole-plastome data clarified previous uncertain questions on Asteraceae systematics at the tribe level while also exposing some major incongruences among the evolutionary histories revealed by nuclear and plastid DNA data.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/plants10122699/s1, Figure S1: Phylogeny reconstructed using Bayesian inference (BI) of family Asteraceae, representing the relationship among 18 tribes, excluding the subfamily Barnadesioideae and the outgroup Nastanthus patagonicus (family Calyceraceae). Three subsets were used: (A) coding DNA sequences, (B) coding DNA sequences, transfer RNAs and ribosomal RNAs, (C) the whole plastid sequences, Figure S2: Mauve alignment of 37 plastid genomes of species from families Asteraceae and Calyceraceae, sorted phylogenetically. Within each group, local collinear blocks are represented by blocks of the same color and connected by lines. The hypothetical origin of the rearrangement is represented by a star in the phylogenetic tree, Figure S3: Comparison of the boundaries of the LSC, SSC and IR regions of a subset of 37 species from families Asteraceae and Calyceraceae. Genes suffixed with a phi (ϕ) are potential pseudogenes, Figure S4: Number of different long-repeat sequence types found in the plastid genomes of the families Asteraceae and Calyceraceae, Figure S5: Number of SSR loci analyzed and number of tandem repeats identified for each species of the families Asteraceae and Calyceraceae, Table S1: List of genes annotated in the studied plastid genomes of 37 species of Asteraceae and Calyceraceae, Table S2: Models used for phylogenetic analyses, Table S3: Microsatellite distribution, frequency and number for each species of this study. Funding: This research was supported by grants CGL2016-75694-P and PID2020-119163GB-I00 and S.G. benefited from a Ramón y Cajal contract RYC-2014-16608, all funded by MCIN/AEI/10.13039/ 501100011033. We are also grateful to the "Ajuts a grups de recerca consolidats" 2017SGR01116 from the Generalitat de Catalunya.

Data Availability Statement:
Data is available at the National Center for Biotechnology Information (NCBI) library under the accession PRJNA764566 titled "Deconstructing ribosomal DNA: from the sequence to the chromosome across the Tree-of-Life (rDNA-LIFE)".