Complete Chloroplast Genome Sequences of Four Species in the Caladium Genus: Comparative and Phylogenetic Analyses

Caladiums are promising colorful foliage plants due to their dazzling colors of the leaves, veins, stripes, and patches, which are often cultivated in pots or gardens as decorations. Four wild species, including C. bicolor, C. humboldtii, C. praetermissum, and C. lindenii, were employed in this study, where their chloroplast (cp) genomes were sequenced, assembled, and annotated via high-throughput sequencing. The whole cp genome size ranged from 162,776 bp to 168,888 bp, and the GC contents ranged from 35.09% to 35.91%. Compared with the single large copy (LSC) and single small copy (SSC) regions, more conserved sequences were identified in the inverted repeat regions (IR). We further analyzed the different region borders of nine species of Araceae and found the expansion or contraction of IR/SSC regions might account for the cp genome size variation. Totally, 131 genes were annotated in the cp genomes, including 86 protein-coding genes (PCGs), 37 tRNAs, and eight rRNAs. The effective number of codons (ENC) values and neutrality plot analyses provided the foundation that the natural selection pressure could greatly affect the codon preference. The GC3 content was significantly lower than that of GC1 and GC2, and codons ending with A/U had higher usage preferences. Finally, we conducted phylogenetic relationship analysis based on the chloroplast genomes of twelve species of Araceae, in which C. bicolor and C. humboldtii were grouped together, and C. lindenii was furthest from the other three Caladium species occupying a separate branch. These results will provide a basis for the identification, development, and utilization of Caladium germplasm.


Introduction
The genus Caladium Vent. (family Araceae) includes perennial herbs native to the tropical regions of South and Central America [1][2][3]. Most Caladium species are distributed in the Amazon rainforest either in open areas or beside streams [4,5]. Caladium spp. is regarded as the most promising colorful foliage plant with great variation in leaf color [6,7]. Because of the dazzling colors of the leaves, veins, stripes, and patches, as well as the long ornamental period, they are often cultivated in pots or gardens as decorations [8]. The ornamental effect for urban areas is excellent, and hence this genus is collectively known as the "Queen of Foliage Plants" [9,10]. In many countries of Europe and America, Caladium is grown as a replacement for traditional grasses and flowers for the purposes of arranging flower beds and flower borders or creating a unique landscape effect of "no flowers is better than flowers" [11][12][13]. In recent years, Caladium has become a new favorite foliage plant due to its colorful leaves, short production cycle, and high selling profit which conferred its high popularity in domestic and foreign markets [14,15].
Over nearly 150 years of selection and breeding, more than 2000 varieties have been cultivated [16][17][18]. At present, there are more than 90 varieties on the market, among which ( Figure 1). The GC contents of the four cp genomes were 35.87%, 35.91%, 35.64%, and 35.09%, respectively ( Table 1). The cpDNA of the four samples showed a typical tetrad ring structure, consisting of a pair of inverted repeat regions (IRa and IRb; 26,277, 26,277, 26,484 and 26,472 bp in length, respectively), a large single copy region (LSC; 89,209, 88,986, 91,168 and 93,162 bp in length, respectively) and a small single copy region (SSC; 21,170, 21,236, 21,150 and 22,782 bp in length, respectively). Table 1 also shows that the LSC region of C. humboldtii was also significantly shorter than that of the other species, whereas that of C. lindenii was the longest. In terms of SSC length, Zamioculcas zamiifolia showed the smallest, but C. lindenii had the greatest SSC. For IR size, Z. amazonica exhibited the shortest while Z. zamiifolia had the longest IR. Moreover, we compared the differences in chloroplast genome sequences between the ON7070731 and NC_060474 (Table S1). Results showed there were only some variations in the noncoding region and one SNP in the coding region, which did not affect the genetic structure.

The Structure of the Chloroplast Genomes of the Four Caladium Species
The chloroplast genome sizes of C. bicolor and C. humboldtii showed high similarity (162,933 bp and 162,776 bp, respectively) with a 157 bp difference, whereas those of C. praetermissum and C. lindenii were slightly larger (165,286 bp and 168,888 bp, respectively) ( Figure 1). The GC contents of the four cp genomes were 35.87%, 35.91%, 35.64%, and 35.09%, respectively ( Table 1). The cpDNA of the four samples showed a typical tetrad ring structure, consisting of a pair of inverted repeat regions (IRa and IRb; 26,277, 26,277,  26,484 and 26,472 bp in length, respectively), a large single copy region (LSC; 89,209,  88,986, 91,168 and 93,162 bp in length, respectively) and a small single copy region (SSC;  21,170, 21,236, 21,150 and 22,782 bp in length, respectively). Table 1 also shows that the LSC region of C. humboldtii was also significantly shorter than that of the other species, whereas that of C. lindenii was the longest. In terms of SSC length, Zamioculcas zamiifolia showed the smallest, but C. lindenii had the greatest SSC. For IR size, Z. amazonica exhibited the shortest while Z. zamiifolia had the longest IR. Moreover, we compared the differences in chloroplast genome sequences between the ON7070731 and NC_060474 (Table  S1). Results showed there were only some variations in the noncoding region and one SNP in the coding region, which did not affect the genetic structure.   A total of 131 genes were identified in the chloroplast genome of Caladium, including 86 protein-coding genes (PCGs), 37 tRNAs, and eight rRNAs (Table S2). Most genes appeared in the LSC or SSC region in single copy form. Among them, 12 genes were assigned to the SSC region, including 11 PCGs (ndhF, rpl32, ccsA, ndhD, psaC, ndhE, ndhG, ndhI, ndhA, ndhH, and rps15) and one tRNA (trnL-UAG). There were 83 genes in the LSC region, including 61 PCGs and 22 tRNAs. Only 16 genes were detected in the IR region, including five PCGs (rpl2, rpl23, ycf2, ndhB, and rps7), seven tRNAs (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnI-ACG, and trnA-GUU) and four rRNAs (rrn16, rrn23, rrn4.5 and rrn5). The ycf1 gene spanned the SSC region and the IR region. The rps12 had two copies, each having three exons, and the two copies shared the first exon, which was in the LSC region, while the other two exons were in the IR region.

Analysis of Contraction and Expansion of the IR Region
By comparing the gene distribution of IR/LSC and IR/SSC border regions in the chloroplast genomes of four Caladium species and those of related species, the expansion or contraction of IR/SSC boundary regions was assessed. As shown in Figure 2, the nine sequences had similar gene structures and sequences and the genes distributed near the boundaries of IR/LSC and IR/SSC were rps19, rpl22, rpl2, ycf1, trnH, and psbA. Among them, the IR boundaries of C. bicolor and C. humboldtii were identical, whereas C. praetermissum differed only in the SSC/IRa boundary. In C. bicolor and C. humboldtii, the sizes of the ycf1 gene were 5241 and 422 bp in the SSC and IRa regions, respectively, while these sizes in C. praetermissum were 5292 bp and 422 bp, respectively. As for C. lindenii, it was quite different from the other three Caladium species, and its boundaries were different, indicating that this species may have undergone a unique evolutionary process.

GView Analysis
To gain a deeper understanding of the phylogenetic relationships among the different species of Caladium and the differences from other closely related species, we used the GView tool to create a circle map of the chloroplast genomes with the assembled C. bicolor genome as a reference. The characteristics and structural variation of all chloroplast genomes were evaluated. Figure 3 shows that the nine genomes had similar structures. Compared with the IR region, the LSC and SSC regions varied greatly among different species. Even within the same genus, the chloroplast genomes of the studied four species of Caladium showed inconsistencies. Particularly, there existed tiny variations among the intergenic regions. In terms of the genome structures, C. bicolor and C. humboldtii were identical. However, the cp genome of C. praetermissum was similar to that of Xanthosoma sagittifolium. Moreover, the cp genome of C. lindenii was similar to that of Syngonium angustatum.

GView Analysis
To gain a deeper understanding of the phylogenetic relationships among the different species of Caladium and the differences from other closely related species, we used the GView tool to create a circle map of the chloroplast genomes with the assembled C. bicolor genome as a reference. The characteristics and structural variation of all chloroplast genomes were evaluated. Figure 3 shows that the nine genomes had similar structures. Compared with the IR region, the LSC and SSC regions varied greatly among different species. Even within the same genus, the chloroplast genomes of the studied four species of Caladium showed inconsistencies. Particularly, there existed tiny variations among the intergenic regions. In terms of the genome structures, C. bicolor and C. humboldtii were identical. However, the cp genome of C. praetermissum was similar to that of Xanthosoma sagittifolium. Moreover, the cp genome of C. lindenii was similar to that of Syngonium angustatum.

Analysis of Chloroplast Microsatellites and Repeat Sequences
In this study, we analyzed the distribution of SSRs in the cp genomes of four Caladium species. As shown in Figure 4, there were only three types of SSRs, including single-base, two-base, and three-base repeats. Three-base repeats were only present in C. lindenii but not in the other three species. Most SSRs were single-base repeats, accounting for 63.0-81.7% of the total SSRs, which was more than all other repeat types combined. The total numbers of SSRs in C. bicolor and C. humboldtii were similar (91 and 89, respectively). C. praetermissum had the least number of SSRs, whereas C. lindenii had the highest. The SSRs of the four Caladium genomes were mostly distributed in the LSC region and the non-coding region, compared to other regions. In terms of repeating units, A/T repeats were significantly more than C/G repeats. All two-base and three-base repeats were AT/AT repeats and AAT/ATT, respectively, but there were no other types of repeats.

Analysis of Chloroplast Microsatellites and Repeat Sequences
In this study, we analyzed the distribution of SSRs in the cp genomes of four Caladium species. As shown in Figure 4, there were only three types of SSRs, including single-base, two-base, and three-base repeats. Three-base repeats were only present in C. lindenii but not in the other three species. Most SSRs were single-base repeats, accounting for 63.0-81.7% of the total SSRs, which was more than all other repeat types combined. The total numbers of SSRs in C. bicolor and C. humboldtii were similar (91 and 89, respectively). C. praetermissum had the least number of SSRs, whereas C. lindenii had the highest. The SSRs of the four Caladium genomes were mostly distributed in the LSC region and the noncoding region, compared to other regions. In terms of repeating units, A/T repeats were significantly more than C/G repeats. All two-base and three-base repeats were AT/AT repeats and AAT/ATT, respectively, but there were no other types of repeats.

Analysis of Selection Pressure and Codon Bias
The relative synonymous codon usage (RSCU) tool was used to evaluate the use of synonymous codons in coding regions, where a larger RSCU indicated a stronger bias. Our data showed that the content of leucine was highest in the chloroplast genomes, followed by serine and arginine, whereas the number of codons of tryptophan was the least. As shown in Figure 5, all amino acids except tryptophan used two or more synonymous

Analysis of Selection Pressure and Codon Bias
The relative synonymous codon usage (RSCU) tool was used to evaluate the use of synonymous codons in coding regions, where a larger RSCU indicated a stronger bias. Our data showed that the content of leucine was highest in the chloroplast genomes, followed by serine and arginine, whereas the number of codons of tryptophan was the least. As shown in Figure 5, all amino acids except tryptophan used two or more synonymous codons. For example, isoleucine was encoded by three synonymous codons (alanine, glycine, and proline). Threonine and valine were encoded by four synonymous codons, and leucine, serine, and arginine were encoded by six synonymous codons. There were 32 RSCU values greater than one, of which 13 of them ended with A and 16 end with U. These findings were consistent with previous studies, which showed that codons ending with A/U in plants had higher usage preference. As shown in Table S3, the codon preferences of different Caladium species showed high conservation, where two codons exhibited the consistent RSCU value (AUG, 1.997; GUG, 0.003) and represented the extreme value in the four Caladium species. However, there were still some discrepancies among different materials, which mainly focused on the number of codons. For most codons, the codon preferences of C. bicolor and C. humboldtii were almost identical, significantly differing from those of the other two species, especially in C. lindenii. t was significantly lower than that of GC1 and GC2. We also found that the greatest differ ence in GC content existed in GC3, which was widely applied to better illustrate the codon usage variation. The neutrality plot was shown in Figure S2, which revealed little correla tion between GC3 and GC12. These results provided the foundation that the natural selec tion pressure could greatly affect codon preference.

Phylogenetic Relationship Analysis
We selected the chloroplast genomes of twelve species of Araceae to explore the ge netic relationship between Caladium and its relatives. The phylogenetic tree was con structed based on the complete chloroplast genome sequences using maximum likelihood (ML) and Bayesian inference (BI) methods. The results showed that the topological struc tures of the ML and BI analyses were identical and that most clades had high posterio probabilities and bootstrap values. As shown in Figure 6, four Caladium species were di vided into different branches, in which two C. bicolor (ON7070731 and NC_060474) and C humboldtii were grouped together, with Z. amazonica being relatively close. Furthermore C. praetermissum and X. sagittifolium were categorized into a branch, while C. lindenii wa furthest from the other three Caladium species occupying a separate branch and being rel atively closely related to Syngonium angustatum. These findings are consistent with previ ous reports, in which C. lindenii was later classified as Caladium genus, and its appearance and resistance were more similar to those of S. angustatum. In order to analyze the trend of codon usage bias in the cp genomes of nine species, the values of the effective number of codons (ENC) were investigated. As in Table S4, the ENC values varied from 17.158 to 61.000, showing different extents of codon preferences among the species and indicating that the codon preference was weak. To further explore the details, the distribution of the ENC values of the coding genes in the genomes was exhibited in Figure S1. GC 1-3 means the GC content of three different positions of each codon. The overall GC content of the cp genomes varied among the nine species and ranged from 29.06% to 46.21% (Table S5). As expected, the GC 1 , GC 2 , and GC 3 contents varied significantly across species and also among genes in the genomes. The average value of GC 3 in the cp genomes was 28.63%, and the GC 1 and GC 2 were 46.07%, and 40%, respectively. The GC 3 content was significantly lower than that of GC 1 and GC 2 . We also found that the greatest difference in GC content existed in GC 3 , which was widely applied to better illustrate the codon usage variation. The neutrality plot was shown in Figure S2, which revealed little correlation between GC 3 and GC 12 . These results provided the foundation that the natural selection pressure could greatly affect codon preference.

Phylogenetic Relationship Analysis
We selected the chloroplast genomes of twelve species of Araceae to explore the genetic relationship between Caladium and its relatives. The phylogenetic tree was constructed based on the complete chloroplast genome sequences using maximum likelihood (ML) and Bayesian inference (BI) methods. The results showed that the topological structures of the ML and BI analyses were identical and that most clades had high posterior probabilities and bootstrap values. As shown in Figure 6, four Caladium species were divided into different branches, in which two C. bicolor (ON7070731 and NC_060474) and C. humboldtii were grouped together, with Z. amazonica being relatively close. Furthermore, C. praetermissum and X. sagittifolium were categorized into a branch, while C. lindenii was furthest from the other three Caladium species occupying a separate branch and being relatively closely related to Syngonium angustatum. These findings are consistent with previous reports, in which C. lindenii was later classified as Caladium genus, and its appearance and resistance were more similar to those of S. angustatum.

Discussion
The cp genomes of the four Caladium species all had the typical circular tetrad structure of angiosperm cp genomes and were quite different in length. In terms of genome sequence length and number of annotated genes, except for Z. amazonica, which has 130 genes, the other Caladium species, as well as the related species, had 131 genes, indicating that the cp genome of Caladium had a certain degree of conservation. The GC content was reported as an important indicator for judging the genetic relationship among species [45,46]. In our study, the gene types, numbers, and order of genes encoded by the genomes

Discussion
The cp genomes of the four Caladium species all had the typical circular tetrad structure of angiosperm cp genomes and were quite different in length. In terms of genome sequence length and number of annotated genes, except for Z. amazonica, which has 130 genes, the other Caladium species, as well as the related species, had 131 genes, indicating that the cp genome of Caladium had a certain degree of conservation. The GC content was reported as an important indicator for judging the genetic relationship among species [45,46]. In our study, the gene types, numbers, and order of genes encoded by the genomes of the four Caladium species were identical, with highly similar G + C content. Based on the G + C content in the region sequence, the rank from high to low was IR, LSC, and SSC, which is a ubiquitous phenomenon in many plant species [47][48][49]. The IR boundary was different among different species, and the fluctuation of the IR boundary was the main reason for the difference [50]. The cp genomes of C. bicolor and C. humboldtii showed the smallest difference in the IR regions among those of the four Caladium species. There was no significant difference in the contraction and expansion of the IR regions between the two. Therefore, the GC content and the IR region boundary conditions, to a large extent, indicate that C. bicolor and C. humboldtii are very closely related but are relatively distinct from C. humboldtii and C. lindenii.
Codon bias affects translation initiation, elongation, and accuracy, as well as mRNA splicing and protein folding [51][52][53]. Therefore, codon preference can also reflect kinship to a certain extent. Among the chloroplast genes of the four Caladium species, the number of codons of leucine was the largest, the number of tryptophan occurrences was the least, and codons ending in A/T were preferred. This feature is consistent with that of most plant species [54,55]. The codon preferences of C. bicolor and C. humboldtii showed almost no difference, suggesting that the two species are closely related, while the codon preferences of C. humboldtii and C. lindenii were markedly different, indicating that they may have unique evolutionary positions.
Microsatellite sequences (microsatellite DNA), also known as SSR, are 1-6 bp repeats that are widely distributed in the cp genome. SSR is highly polymorphic and specific and is a valuable marker for studying gene flow, population genetics, and genetic mapping [56]. Except for C. lindenii, the repeat types and distribution numbers of the other three Caladium species were roughly the same. Identification of these SSR loci can provide candidate molecular markers for research on the genetic diversity and conservation genetics of Caladium. The nodes in the LSC, SSC, and IR regions of the four Caladium species were highly conserved, indicating that the cp genome structure of Caladium is highly conserved. Phylogenetic analysis classified C. lindenii as a single clade that was far from the other three Caladium species, which is consistent with previous reports [23][24][25]. As for the limited research on the phylogenetic analysis, AFLP and SSR markers were often used to identify their kinship among Caladium cultivars. In this paper, we provided more accurate findings on the phylogenetic relationship analysis among different species of Caladieae. Based on the above results, we conclude that C. lindenii has greater specificities in cp genome structure, IR region contraction and expansion, SSR distribution, and codon preference, being relatively close to S. angustatum. Therefore, this species is suggested to be classified into the genus Synaptocarpus.

Plant Materials
Plants of four Caladium species were collected from the Environmental Horticulture Research Institute of Guangdong Academy of Agricultural Sciences (23 • 23 N, 113 • 26 E), namely, C. bicolor, C. humboldtii 'Mini White', C. praetermissum 'Hilo Beauty,' and C. lindenii. The phenotypic characteristics are shown in Figure 7. Young leaves were collected and rinsed thoroughly with tap water. Subsequently, they were washed several times with sterile water and dried quickly in a sampling bag containing silica gel. The samples were then stored at −80 • C until used.
Plants of four Caladium species were collected from the Environmental Horticulture Research Institute of Guangdong Academy of Agricultural Sciences (23° 23' N, 113° 26' E), namely, C. bicolor, C. humboldtii 'Mini White,' C. praetermissum 'Hilo Beauty,' and C. lindenii. The phenotypic characteristics are shown in Figure 7. Young leaves were collected and rinsed thoroughly with tap water. Subsequently, they were washed several times with sterile water and dried quickly in a sampling bag containing silica gel. The samples were then stored at −80 °C until used.

DNA Extraction and High-Throughput Sequencing
Total DNA was extracted from frozen leaf samples by the modified cetyltrimethylammonium bromide (CTAB) method, and the quality of DNA was assessed by 1.5% agarose gel electrophoresis. The DNA was fragmented by mechanical interruption (ultrasound) and then purified, and end repaired. PolyA tails were added to the 3′ ends, and the fragments were ligated with sequencing adapters. The required fragment size was

DNA Extraction and High-Throughput Sequencing
Total DNA was extracted from frozen leaf samples by the modified cetyltrimethylammonium bromide (CTAB) method, and the quality of DNA was assessed by 1.5% agarose gel electrophoresis. The DNA was fragmented by mechanical interruption (ultrasound) and then purified, and end repaired. PolyA tails were added to the 3 ends, and the fragments were ligated with sequencing adapters. The required fragment size was selected by agarose gel electrophoresis. PCR amplification was performed to form a sequencing library, and the qualified library was sequenced using the BGISEQ-500 platform with PE150 read lengths according to the manufacturer's instructions. DNA extraction and sequencing were all performed by Guangzhou Bio&Data Biotechnologies Co., Ltd. (Guangzhou, China).

Chloroplast Genome Assembly and Annotation
At least 5 G of raw data were obtained for each species. After data filtering, adapter sequences and low-quality reads were removed to obtain high-quality clean data. First, NOVOPlasty software (k-mer = 39) was used for assembly and splicing, where the size of the insert was set to 250 bp [57]. Subsequently, the online program GeSeq was employed to annotate the chloroplast genome sequence, and Geneious v9.0.2 was used for visualizing the annotated sequence with manual corrections [58,59]. The annotated sequencing data for the four species were uploaded to the NCBI database with serial numbers ON707030, ON707031, ON707032, and ON707033, respectively. Finally, with the help of the online program Organellar Genome DRAW, the genome maps of the Caladium chloroplast genomes were constructed [60].

Comparative Analysis of Chloroplast Genomes
Relative synonymous codon usage (RSCU) analysis for every codon in each genome was conducted to determine codon bias [61]. The expansion and contraction of the IR borders of the four chloroplast genomes of Caladium were mapped with the aid of an IRscope [62]. Chloroplast genome similarity was assessed using BLAST Atlas on the GView server (http://server.gview.ca/, accessed on 26 October 2022) with 50 kbp connection windows with C. bicolor genome as a reference [63]. The Perl program provided by MIcroSAtellite Identification Tool (MISA) was used to analyze simple repeat sequence (SSR) sites. For mononucleotides, dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides, the repetition thresholds were set to 10, 5, 4, 3, 3, and 3, respectively.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13122180/s1, Figure S1: Analysis of ENC-plot in the cp genomes of nine species of Araceae; Figure S2: Neutrality analysis performed by plotting GC 12 values against GC 3 values for the cp genomes of nine species of Araceae. The diagonal line on the neutrality plot shows that the value of GC 12 is equal to GC 3 ; Table S1: The comparison between the ON707031 and NC_060474; Table S2: Functional classification of Caladium chlroplast genome; Table S3: Determination of optimal codons in the chloroplast genomes of Caladium; Table S4

Data Availability Statement:
The four-chloroplast genome sequence data generated in this study are available in GenBank of the National Center for Biotechnology Information (NCBI) under the access numbers: ON707030-ON707033.

Conflicts of Interest:
The authors declare no conflict of interest.