Comparative Survey of Morphological Variations and Plastid Genome Sequencing Reveals Phylogenetic Divergence between Four Endemic Ilex Species

: Holly ( Ilex L.), from the monogeneric Aquifoliaceae, is a woody dioecious genus cultivated as pharmaceutical and culinary plants, ornamentals, and industrial materials. With distinctive leaf morphology and growth habitats, but uniform reproductive organs (ﬂowers and fruits), the evolutionary relationships of Ilex remain an enigma. To date, few contrast analyses have been conducted on morphology and molecular patterns in Ilex . Here, the di ﬀ erent phenotypic traits of four endemic Ilex species ( I. latifolia , I. suaveolens , I. viridis , and I. micrococca ) on Mount Huangshan, China, were surveyed through an anatomic assay and DNA image cytometry, showing the unspeciﬁed link between the examined morphology and the estimated nuclear genome size. Concurrently, the newly-assembled plastid genomes in four Ilex have lengths ranging from 157,601 bp to 157,857 bp, containing a large single-copy (LSC, 87,020–87,255 bp), a small single-copy (SSC, 18,394–18,434 bp), and a pair of inverted repeats (IRs, 26,065–26,102 bp) regions. The plastid genome annotation suggested the presence of numerable protein-encoding genes (89–95), transfer RNA (tRNA) genes (37–40), and ribosomal RNA (rRNA) genes (8). A comprehensive comparison of plastomes within eight Ilex implicated the conserved features in coding regions, but variability in the junctions of IRs / SSC and the divergent hotspot regions potentially used as the DNA marker. The Ilex topology of phylogenies revealed the incongruence with the traditional taxonomy, whereas it informed a strong association between clades and geographic distribution. Our work herein provided novel insight into the variations in the morphology and phylogeography in Aquifoliaceae. These data contribute to the understanding of genetic diversity and conservation in the medicinal Ilex of Mount Huangshan.


Plant Materials
The respective organs (e.g., leaves and seeds) of I. latifolia, I. suaveolens, I. viridis, and I. micrococca were harvested from 10.24 ha (320 m × 320 m) forest plot (30 • 8 26" N, 118 • 6 38" E) in Mount Huangshan, Anhui, China [26]. The plot ranges altitude from 430 to 565 m with an annual average temperature of 7.8 • C and annual precipitation of 2394.5 mm. The voucher specimens of four Ilex species (accession numbers: YL20190417014, YL20190417015, YL20190417016, and YL20190417017) were preserved in the herbarium of Nanjing Forestry University, Nanjing, China.

Phenotype Quantification and Determination of DNA Content
For each species, more than 60 healthy mature leaves and 300 mature seeds of 5 independent trees of similar age were randomly sampled for the morphological analyses. In total, thirty anatomic sections of epidermis were prepared from the cut leaves (5 mm × 5 mm) macerated by H 2 O 2 -HAC solutions [27]. To quantitate the size of the upper leaf epidermal cell (LEC), the stomata aperture (STA), and the stomata density (STD), twenty visual fields were captured using the same scale during optical microscopy. The leaf area (LA) was measured from 50 randomly selected leaves by Image J v1.53c (https://imagej.nih.gov/ij/) after scanning with Expression 11000XL (EPSON, Beijing, China). The specific leaf area (SLA) was calculated based on the ratio of leaf area to leaf dry mass. We used 100 air-dried seeds to determine the seed weight (SW). The variance of three perpendicular seed dimensions (VSD) was calculated using the average of 50 seeds according to a previous report [28]. Spherical seeds have a variance of 0, and elongated or flattened seeds have a variance of up to 0.33. Flower size (FS) was measured from the average diameters of 12 female flowers. The significance of phenotypic variation between four Ilex samples was statically analyzed using SPSS 24.0 (https://www.ibm.com) based on the ANOVA (p < 0.05) and Duncan's multiple range tests.
For the determination of the DNA C-value, the young leaves were chopped to isolate the crude nuclear DNA with the addition of woody plant buffer followed by RNase digest. DNA staining of propidium iodide (PI) and FCM analysis were performed based on a previous report [29]. Together with the internal standard (Solanum lycopersicum, 2C = 2.00 pg), the resulting suspensions were analyzed with BD Influx TM cell sorter (BD, Piscataway, NJ, USA). The histograms of FCM were generated by the software BD FACSTM 1.0.0.650. The coefficients of variation (CV) of DNA peaks below 5% were considered as reliable. The chromosome numbers in four diploid Ilex species were retrieved from the IPCN database (http://legacy.tropicos.org/Project/IPCN). The genome size (DNA C-value or the haploid DNA content) was calibrated by multiplying the standard by the ratio of the mean fluorescent intensity of each sample to that of the standard [30]. The DNA-C value is represented as means ± standard error (±SE) of at least four independent biological replicates.

Plastome Divergence and Phylogenetic Analyses
Using the annotation of I. suaveolens as the reference, we compared of the entire plastid genomes of eight Ilex species in Aquifoliaceae using the program GView (https://www.gview.ca/wiki/GView/ WebHome) [39] and the mVISTA program (http://genome.lbl.gov/vista/mvista/submit.shtml) in the Shuffle-LAGAN mode [40,41]. For the inverted repeat (IR) expansion and contraction of border genes, eight Ilex plastomes were aligned to analyze the variations in the junctions of LSC, IRs, and SSC using IRscope (https://irscope.shinyapps.io/irapp/) [42]. In total, nineteen plastid genome sequences (15 Ilex species) were retrieved from the GenBank. Populus trichocarpa, Populus deltoides, Quercus acutissima, and Helwingia himalaica were used as outgroups. The multiple sequences were aligned using MAFFT v7.471 (https://mafft.cbrc.jp/alignment/server/index.html) [43]. The phylogenetic topology was constructed using the software MEGA X software by the methods of maximum likelihood (ML) and maximum parsimony (MP) using the nucleotide substitution model of Tamura-Neighbour.
The bootstrap values are shown on the branches of the phylogenetic tree based on 1000 replicates.

Variation of the Morphological Trait and Nuclear DNA Content
To investigate the variation in morphology between the four Ilex species, we initially performed a comparative analysis of phenotypic traits of the major vegetative and reproductive organs, including LA, LEC, SLA, STA, STD, SW, VSD, and FS (Table 1). Based on the anatomic analyses, the significant variation in LA was observed in all Ilex species, with I. latifolia having the highest value (79.19 cm 2 ) ( Table 1). Significant differences were also observed in SLA, ranging the values from 40.88 cm 2 /g (I. latifolia) to 165.4 cm 2 /g (I. micrococca). Regarding the LEC, the highest value (938.65 µm 2 ) was found in I. suaveolens; however, the value varied insignificantly as I. viridis and I. micrococca have a similar-sized LEC (Figure 1b). Image analyses of stomata-related traits revealed that STD is drastically different in four Ilex species, and I. micrococca has the highest density (257.64/mm 2 ; Table 1 and Figure 1c). The significantly different values in SLA appeared to be related to the STD values in the four Ilex species, as both traits showed similar patterns ( Table 1). Analyses of VSD indicated that I. suaveolens seeds are closer to spherical shape (0.1137). However, for STA, VSD, and FS, significantly statistical distinctions were not found within the four Ilex species.
To determine the DNA-C value of four Ilex species by FCM, S. lycopersicum was used as the internal standard to calculate the genome size [44]. The DNA-associated fluorescence on FCM histograms showed that the CV values for G0/G1 peaks were between 2.92% and 4.67% (Figure 1d). The low CV (< 5%) indicated a constant quality considered reliable for FCM assessments [45]. The inspection of the FCM fluorescent peak revealed that I. micrococca has the most abundant nuclear DNA content (3.053 pg), followed by I. viridis (2.519 pg). I. suaveolens (2.242 pg) and I. latifolia (1.910 pg) exhibited similar levels of DNA 2C-value. Further calculation of nuclear genome size (NG) revealed that the mean levels vary with a range from 955 Mb (I. latifolia) to 1493 Mb (I. micrococca) in the four Ilex species (Table 1).

The Plastid Genome Features and Sequence Divergence
The de novo sequencing and reference-guided assembling of the plastid genome of four Ilex species revealed a double-stranded circular DNA with a range of length from 157,601 (I. latifolia) to 157,857 bp (I. suaveolens) ( Figure 2). The typical combined quadripartite structures of the plastid genome contain two inverted repeats (IR) regions, IRA and IRB, between 26,065 and 26,102 bp, each separated by a large single-copy (LSC, 87,020-87,255 bp) and a small single-copy (SSC, 18,394-18,434 bp) sections. Except for I. latifolia, the plastomes encode 134 genes comprised of 89 putative protein-encoding (PE) genes, 37 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes in the other three Ilex species (Table 2). Among the annotated genes in Ilex plastome, eight PE genes (rps7, ndhB, ycf15, ycf2, rpl23, rpl2, rpl12, and ycf1), five tRNA genes (trnV-GAC, trnL-GAU, trnA-UGC, trnR-ACG, and trnN-GUU), and all four rRNA genes (rrn4.5, rrn5, rrn23, and rrn16) are anchored in the IR regions. Twelve PE genes and one tRNA gene (trnL-UAG) characterize the locations in the SSC sections ( Figure 2). Most of the gene sequences assembled as a single-copy, whereas 18 genes occur with duplication in IR regions, including seven PE genes, four rRNA genes, and six tRNA genes (Table 2). Interestingly, ycf1 is the only variable gene identified with an incomplete duplication in the junction of SSC and IRB regions ( Figure 2 and Table 2). In Arabidopsis, translocon complex of ycf1/Tic214 and Tic20 has been proposed as the central component for plastid proteins accumulation [46]; however, a recent report stated that ycf1/Tic214 might not be involved in the general import machinery [47]. All four Ilex plastomes uniformly contain ycf2, a duplicated gene, showing a maximum length of 6894 bp in IR regions. Another single-copy gene rpoC2 has a relatively shorter length (4155 bp) in the LSC section. Besides, a total of 19 genes, including 11 PE genes and 7 tRNA genes, were identified with two exons, while three of PE genes (clpP, ycf3, and rps12) contain three exons. Contrast analyses revealed a slight variation in GC content in various regions of four Ilex plastomes (Table S1). Additionally, analyses of the total numbers of universal genetic code for the coding genes showed a similar range from 26,729 (I. viridis) to 27,121 (I. latifolia) (Table S2). Based on the statistical analysis of the codon usage, leucine (Leu) represents the most abundant amino acid with a frequency of 10.5%, whereas cysteine (Cys) shows the lowest abundance of 1.1%. Overall, the codon usage in all identified genes exhibits similar patterns across the four Ilex species. The almost identical gene numbers, annotation, and plastid genome length prompted us to further explore the sequence variation and divergence through the analyses of SSRs and long repeat sequences.

Analyses of SSRs and Long Repeat Sequences
Using the MISA program, a total of 221 SSRs (mono-/di-/trinucleotide repeats) were characterized in the plastid genomes of I. latifolia (46/2/1), I. suaveolens (46/3/1), I. viridis (53/3/1), and I. micrococca (52/2/1). Among the SSRs, the mononucleotide repeats are the most abundant type with a location in the non-coding regions, which is related to AT richness. The proportions of A/T sequences in the mononucleotide repeats appear to be identical in the four Ilex species, varying from 92.00% (I. suaveolens) to 92.98% (I. viridis), whereas C/G only exists in the plastid genomes of I. latifolia and I. micrococca. The AT/AT sequence of dinucleotide repeats showed similar levels (1.75%-4.08%); however, the AAT/ATT sequence of trinucleotide repeats showed significant variability, ranging in proportion from 1.82% to 6.00% ( Figure S1). Other types of SSRs (e.g., tetra-and pentanucleotide repeats) were not identified in any Ilex plastid genomes. The long repeats (forward, reverse, complement, and palindrome repeats) were concurrently analyzed by REPuter. A total of 196 unique long repeats were detected in I. latifolia (24/0/0/22), I. suaveolens (38/9/3/0), I. viridis (41/8/1/0), and I. micrococca (38/9/3/0) ( Figure S2). The forward repeats appear to be the most common type in the four plastid genomes. We found no reverse and complement repeats identified in I. latifolia, but it uniquely contains the palindrome repeats ( Figure S2a). Further analyses of various types of long repeats showed that the 20-30 bp sequence length is the typical pattern (Figures S2b-e). In four plastid genomes, the sequence length in reverse and complement repeats usually has a limitation of 30 bp, whereas this length can be extended to 60 bp in the forward and palindrome repeats.

Comparative Analyses of Complete Plastomes in Ilex Species
We used I. suaveolens as a reference to conduct a BLAST comparison of eight Ilex species, showing that the entire plastid genomes are well conserved across eight selected species. The LSC and SSC regions were found to be more substantially divergent than the IR regions (Table 3, Figure 3). The non-coding regions appear to have more significant variation compared with the coding regions, in which some genes are relatively conserved. The VISTA analysis resulted in the findings of 12 hotspot regions for genome divergence and variable genes (e.g., rpoC1, rbcL, ndhF, clpP, and psbA). The highly divergent hotspot regions are particularly located in the intergenic regions than in coding regions, including trnH-psbA, matK-rps16, psbK-psbI-trnS-trnG, petN-psbM, trnE-psbD-trnT, trnS-psbZ-psaB, trnL-ycf3, rbcL-accD-ycf4, clpP-rpl33, rpl16-rpoA, rpl32-ccsA, and ycf15-rps12-rrn16 ( Figure 4). These divergent sites may facilitate the development of potential DNA markers for the species identification and reconstruction of phylogeny in the genus Ilex.

Phylogenetic Analyses of Ilex in Aquifoliaceae
To investigate the phylogenies, the entire plastid genome sequences of 15 Ilex accessions were aligned using MAFFT. The phylogenetic topology was generated by MEGA X using the ML method supported with 1000 bootstrap values. P. trichocarpa, P. deltoides, Q. acutissima, and H. himalaica were used as out groups in the phylogenetic analyses. In Figure 6, the phylogenetic tree shows that four clades were deduced from all examined Ilex species. I. latifolia and I. integra cluster together with an identical high value in clade I, which also includes three additional Ilex species (I. delavayi, I.sp. XY-2016, and I. cornuta). All of them are evergreen trees with leathery leaves and broad distribution in subtropical Asia. The endemic I. viridis and I. suaveolens show close relationships in the clade III. I. micrococca clusters around I. wilsonii and I. asprella in clade IV, and the latter two are deciduous shrubs or trees. Yerba mate I. paraguariensis and I. dumosa, originating from South America, are located in clade II. The plastid phylogeny revealed that the species might not have originated from a single ancestor in Aquifoliaceae. The full encoding sequences were used to construct the phylogenetic tree, showing a distinctive clade ( Figure S3).

Discussion
The holly Ilex mostly grows in mesic environments with global distribution. A recent report on phylogeny and biogeography suggested the origin of Ilex being in subtropical Asia [6]. Subsequently, it colonized other areas (e.g., South and North America, Australia, Europe, Africa, and some ocean islands) with divergence time from 4 to 30 million years ago [6]. Approximately 204 Ilex species (149 endemic species) have been described in China [48]. Located in a transition zone of north-south flora of eastern China, Mount Huangshan is considered as a priority spot for biodiversity conservation, in which all of 20 recorded Ilex species exhibited many medicinal properties and economic importance for garden and industry use [8,49,50]. Among these Ilex species, I. suaveolens is the most abundantly native species. The deciduous I. micrococca is locally grown as popular ornamental trees and for providing superior materials for paper and tannin production, in addition to herbal medicine use. The "Kuding" tea represented by I. latifolia, and pharmaceutical I. viridis, show potential clinical functions for scavenging heat, anti-inflammation, and detoxification, are moderately distributed in the dynamic forest plot [26].
The plasticity and diversity in leaf traits, including venation, anatomy, stomatal distribution, and stomatal conductance, were drastically affected by both cellular factors and environmental cues [51][52][53]. Hence, this extensively phenotypic survey of the vegetative and reproductive organs in plants contributes to the understanding of plant physiological and ecological adaptation. In our work, initial comparative analyses of leaf phenotypes revealed the significantly different values in LA, SLA, STD, and SW between four Ilex species, suggesting that these morphological traits could be potentially used as candidates to distinguish different Ilex species (Table 1). The persistent fruits and distinctive leave morphology in Ilex result from complicated genetic variation and ecological adaptability to various environments; therefore, the underlying molecular mechanisms require an extensive understanding of genome architecture and genome size diversity [54]. The transcriptome assembly was previously only conducted in I. paraguariensis [13]. Unfortunately, the draft of the full nuclear genome sequencing and annotation remains unavailable in Ilex. Using the tomato as the internal standard, analyses of DNA image cytometry in four Ilex species revealed that the calculated NG showed approximately 955 Mb in I. latifolia. This NG value is higher than I.cornuta (642 Mb) [55]. The examined NG in I. suaveolens and I. viridis showed similar levels to I. mucronata (1073 Mb) [56]. The most abundant NG was characterized in I. micrococca, slightly lower than I. paraguariensis (1078 Mb) [29]. The estimated DNA C-value (NG) in four Ilex species had not been previously reported. We found that I. micrococca had the largest size in NG but showed the lowest SW levels in comparison with other Ilex. A similar phenomenon was reported in a recent study, which proposed a negative association between the seed mass and nuclear genome size in the diploid Aesculus species [57]. In contrast with all detected morphological traits, the calculated NG appears to have a linear relationship with SLA and STD in four Ilex species, which is inconsistent with the associated NG/STD patterns from a large-scale comparative analysis in angiosperms [19]. We hypothesized that the statistical associations between the NG, LEC, and STD might not be evaluated adequately with small sample capacity, which could reason the discordance above. Perhaps, the relationships of NG size and various leaf traits appeared to be not conservative among angiosperms due to the diversified environmental adaptation [58][59][60].
The advances in novel biotechnologies, particularly in whole-genome sequencing, created the opportunity to explore the phylogenies, species identification, and molecular markers among taxa [23,61]. In our work, using high-throughput sequencing technologies combined with de novo and reference-guided assembly, the complete plastomes of four Ilex species were constructed, showing the conserved quadripartite circular structure, comprising of LSC, SSC, and two copies of IR regions. A total of 144 putative genes (96 PE, 40 tRNA, and 8 rRNA genes) were previously annotated in Ilex plastid genomes [1]. The number of identified PE genes and tRNA appeared to be variable in various Ilex species [62,63]. Further sequence analyses indicated that ycf68, orf42, and ycf68 were located between two exons of trnA-UGC, and orf188 sequence overlap with one exon of ndhA. trnP-GGG shows a sequence overlapping with trnP-UGG. Thus, the total numbers of genes identified were normalized into 134 (89 PE, 37 tRNA, and 8 rRNA genes) in the plastomes of I. suaveolens, I. viridis, and I. micrococca. The most distinctive length in plastome is 370 bp between various Ilex species, whereas the maximum sequence difference of LSC is 331 bp, suggesting that the divergence in the LSC region may cause the varied plastome lengths, which might depend on the IR contraction and expansion [64]. Plastid genome sequences exhibit extensive variations in the length, number, and distribution of SSRs. These variations have potential importance for the outcome of genomic diversity [65]. Plastid SSRs have been used extensively in the taxonomy, phylogenies, and the maternal structures in the community, diversity, and differentiation [66]. In total, 221 SSRs and 196 unique long repeats were characterized in four Ilex plastomes. The mononucleotide repeats (A/T) in all SSRs represent the dominant type distributed in the non-coding regions, which is related to the AT abundance of the nucleotide composition. This typical pattern was also identified in previous reports [67][68][69]. The forward repeat and 20-30bp in sequence length were the most general features within the different types of long repeats.
In our study, both GView and mVISTA analyses of eight plastid genomes illustrated numerous divergent hotspots that are primarily located in the SSC regions, suggesting more variable sites in intergenic non-coding sequences than in coding genes. These variable sequences could potentially be used to develop new molecular markers for the identification and taxonomy in Ilex. The findings of these hotspot regions are compatible with a previous study that reported the presence of at least 11 divergent regions [1]. Although several divergent genes (rbcL, atpB, and matK) helped inform the reconstruction of the phylogenetic tree among distantly-related species, they might provide suitable resolution for studies within Ilex due to the relative lower divergence than the hotspot regions [2,6,22,70]. Recently, trnH-psbA, rbcL-accD, and trnS-trnG have been developed as genetic markers in other species [71,72]. The plastome-divergent hotspot regions were considered powerful for species-level identification. However, whether these divergence hotspot regions could be extensively used for classification and taxonomy in Ilex remains to be assessed experimentally.
The comparative detection of sequence variability in the junctions of IRs/SSC and IRs/LSC showed relative fluid patterns in eight Ilex species. Mostly, gene rps19 in LSC regions is nearest to the JLB; however, it was found to span the JLB into IRB with a 4 bp extension in I. paraguariensis, I. viridis, and I. suaveolens. The typical variability in JSB was also observed for ycf1 in the IRB region and ndhF in the SSC region. A similar phenomenon of the extent in IR regions has been recognized in other Ilex species [1]. The sequence variations in four junctions of SC/IRs appeared to frequently occur during genome evolution, which resulted in alteration of the plastome size [73]. Hence, the contraction and expansion at the boundaries of IR regions could explain the variability in sequence length between different plastid genomes [74].
Although the Aquifoliaceae has good fossil records, several systematic studies revealed a high incongruity of phylogenies between the nuclear trees and plastid trees, suggesting that the evolutionary and phylogenetic patterns of Aquifoliaceae remain to be elucidated [1,6,22]. Some molecular genetic factors, including inter-lineage hybridization, lineage sorting, introgression, and gene duplication and loss, were reported to significantly influence the phylogenetic incongruence, which broadly occurred in Angiosperms [75]. In contrast to the nuclear trees, the plastid trees strongly reflect the biogeographic distribution of extant species [22]. In our work, the phylogenetic topology revealed the presence of four clades within Ilex, in agreement with the fossil record [6], and also fit well with recent reports on plastid phylogenetic analysis [1,32,63]. All five Ilex species in the clade I belong to section Ilex, showing leathery morphology in the leaf, which is consistent with the traditional classification [48]. I. paraguariensis and I. dumosa are both located in clade II, suggesting a similar geographic distribution in Southern America. However, I. suaveolens in section Lioprinus clusters together with I. szechwanensis and I. viridis in clade III, while the latter two species are categorized into section Paltoria. The findings that evergreen species surprisingly have a close relationship with deciduous species in clade IV, further reflects the incongruence between the plastid phylogeny and traditional taxonomy [1,3,67]. Nevertheless, three endemic Ilex species in clade III show a relative similarity of the leaf morphology and growth pattern (600-1600 m altitude). Both subtropical deciduous I. micrococca and I. asprella in clade IV have very similar leaf margins, growing at an altitude of 400-1000 m. Additionally, the topology constructed using the entire plastomes significantly improved the quality of phylogenies compared to the use of full coding sequences ( Figure S3). Therefore, the plastid topology provides more reliable clues to infer the species' geographic patterns and origins. Overall, when more complete plastome sequences of Ilex species become accessible, plastid trees combined with the nuclear markers will contribute to the resolution of the deeper branches of the biogeography and phylogeny in Aquifoliaceae.

Conclusions
The unspecific pollination and weakness of reproductive isolation resulted in the frequently intraspecific genetic exchange, confusing the phylogenies and biogeography in the Ilex genus. Dioecious Ilex species have similar flowers and fruits, but the variable morphology in leaves is commonly affected by climate and season, creating an obstacle to the identification and classification of the Ilex specimens [22,75]. Hitherto, the anatomical features and plastid genomes of four endemic Ilex species in Mount Huangshan were undocumented. In this work, we performed comparative analyses of a variety of phenotypic traits (e.g., leaves, flowers, and seeds) and the molecular profiles, including nuclear genome size, plastid genome structures, variable patterns, and phylogenetic relationships, in four Ilex species. The reconstruction of the plastid phylogenies verified the significant usage of the complete plastomes in identifying the phylogeography and phylogenetic evolution of Ilex. However, the evolved phenotypic variation and variable molecular patterns might be tightly related to the adapted ecology and environments. The limited accessions in the plastid genome restrained the extensive exploration of the phylogenies in Ilex, which necessitates additional sequencing samples. In summary, the morphological and molecular data in the present study provided informative resources for the in-depth mining of the phylogenies, biogeography, and genetic diversity in the family Aquifoliaceae.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/9/964/s1, Figure S1: Analysis of SSRs in four Ilex plastid genomes, Figure S2: Analysis of the long repeated sequences in four Ilex plastid genomes, Figure S3: The phylogenetic analysis of 15 Ilex species using the coding sequences, Table S1: Base composition and proportions of the four Ilex plastid genomes, Table S2