Complete Chloroplast Genome of Gladiolus gandavensis (Gladiolus) and Genetic Evolutionary Analysis

Gladiolus is an important ornamental plant that is one of the world’s four most-grown cut flowers. Gladiolus gandavensis has only been found in the Cangnan County (Zhejiang Province) of China, which is recorded in the “Botanical”. To explore the origin of G. gandavensis, chloroplast genome sequencing was conducted. The results indicated that a total of 151,654 bp of circular DNA was obtained. The chloroplast genome of G. gandavensis has a quadripartite structure (contains a large single-copy (LSC) region (81,547 bp), a small single-copy region (SSC) (17,895 bp), and two inverted repeats (IRs) (IRa and IRb, 52,212 bp)), similar to that of other species. In addition, a total of 84 protein-coding genes, 8 rRNA-encoding genes, and 38 tRNA-encoding genes were present in the chloroplast genome. To further study the structural characteristics of the chloroplast genome in G. gandavensis, a comparative analysis of eight species of the Iridaceae family was conducted, and the results revealed higher similarity in the IR regions than in the LSC and SSC regions. In addition, 265 simple sequence repeats (SSRs) were detected in this study. The results of the phylogenetic analysis indicated that the chloroplast genome of G. gandavensis has high homology with the Crocus cartwrightianus and Crocus sativus chloroplast genomes. Genetic analysis based on the rbcl sequence among 49 Gladiolus species showed that samples 42, 49, 50, and 54 had high homology with the three samples from China (64, 65, and 66), which might be caused by chance similarity in genotypes. These results suggest that G. gandavensis may have originated from South Africa.


Introduction
Gladilous comprises approximately 265 species, which is one of the largest genera in the family Iridaceae [1]. In addition, Gladiolus is a valuable ornamental plant with beautiful colors, and is one of the world's four most-grown cut flowers [2]. Gladiolus is native to Africa and southern Europe. Currently, G. gandavensis is found in Cangnan County and Zhejiang Province, China. In Cangnan County, G. gandavensis is distributed in Xiaguan town, Mazhan town, and Beiguang Island [3]. G. gandavensis likes warm and sunny environments with good ventilation. In its environment, it displays red and yellow flowers and leaves shaped like swords [4]. Therefore, G. gandavensis has a high ornamental value, and is mainly used for flower arrangements, bouquets and baskets, as well as in flower beds and as potted plants [5].
The nuclear genome is biparentally inherited, while the chloroplast genome is maternally inherited [6]. In addition, the nuclear genome can be spread by pollen and seeds, and the chloroplast genome can be spread only by seeds in most angiosperm species [7]. Therefore, the chloroplast genome is suitable for identifying plants because of its special characteristics, such as its small size and conservation. Now, the chloroplast genome plays an irreplaceable role in evolution [8], migration [9], and identification [10].
The chloroplast is an important organelle used for photosynthesis and metabolic activities in higher plants and a few algae and prokaryotes [11]. In addition, chloroplasts also play important roles in other aspects of plant physiology and development, which are important for plant responses to light [12,13], heat [14], drought [15,16], salt [17], and other stress [18].
With the popularization and development of NGS technology, chloroplast genome databases are becoming increasingly abundant. Chloroplast genomes of increasing numbers of species have been sequenced, including Nicotiana tabacum [19], Oryza sativa [20], Zea mays [21], Pinus massoniana [22], and many other species. In general, there is a typical double-linked loop structure in the chloroplast genome of higher plants, whose sizes range between 120 and 180 kb. The chloroplast genome usually contains a small single-copy (SSC) region, a large single-copy (LSC) region, and an inverted repeat (IR) sequence [23,24].
G. gandavensis was only distributed in Cangnan County, China. However, the origin of G. gandavensis is unclear. The chloroplast genome was sequenced, and then the cpDNA rbcl sequence was used to identify 46 samples of Gladilous, to explore the origin of G. gandavensis.

Sequencing, Assembly, and Annotation
The G. gandavensis flowers showed red and yellow pigment, and their distribution was found in Cangnan county, Zhejiang Province, China. In Cangnan county, G. gandavensis was distributed in Xiaguan town, Mazhan town, and Beiguang Island (Figure 1). The G. gandavensis leaves were collected from Mazhan town (Zhejiang, China, N 27.29 • , E 120.43 • ) for sequencing. An improved extraction method was used to isolate cpDNA from fresh leaves of G. gandavensis [25]. A library was constructed with 1 µg of DNA, and a Covaris M220 ultrasonic instrument was used to break the DNA into 300~5500 bp fragments. Afterward, the 3 ends were polyadenylated and connected to index fragments (TruSeq™ Nano DNA Sample Prep Kit). Library enrichment and PCR amplification for 8 cycles were performed with a 2% agarose gel recycling destination bar (Certified Low Range Ultra Agarose), and then TBS380 (picogreen) was used for quantitation; the materials were mixed according to the data ratio. Then, the generated clusters were subjected to bridge PCR amplification on a CBOT solid phase, and 2 × 150 bp sequencing was performed with an Illumina HiSeq sequencing platform [26].

Sequence Analysis
MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html, accessed on 1 May 2022) was used to identify the microsatellite motif [29]. MAFFT v7.310 (https://mafft.cbrc.jp/alignment/software/, accessed on 1 May 2022), which is a multiple sequence alignment software, was used to align the IR sequences between some species in Gladiolus [30]. MAUVE was used to locate structural differences among whole-genome alignments [31]. The co- Original reads were filtered before assembly. SOAPdenovo (version: 2.04, http://soap. genomics.org.cn/soapdenovo.html, accessed on 1 May 2022) was used to assemble the clean data and obtain the optimal assembly results after multiple adjustment parameters [27]. The contigs were obtained by the assembly. The results were partially assembled and optimized according to the reads' paired ends and overlapping relationships. Then, GapCloser (version: 1.12, http://soap.genomics.org.cn/soapdenovo.html, accessed on 1 May 2022) was used to repair the internal gaps in the sequences, and the redundant sequences were removed to obtain the final assembly sequence [28].

Genetic Evolutionary Analysis
The total DNA of 49 Gladiolus (Table 1) samples was extracted using a Plant Genprep DNA Kit (Tiangen, Beijing, China) and quantified using a NanoDrop 2000c instrument (ThermoFisher Scientific, Wilmington, DE, USA) [35,36]. The DNA templates were detected via 1% agarose gel electrophoresis. The ABI-2720 PCR instrument (Applied Biosystems, Waltham, MA, USA) was for PCR amplification. PCR was carried out according to the manufacturer's protocol. The primers of the chloroplast genome rbcl sequence used were ATGTCACCACAAACAGAAAC (forward primer) and TCGCATGTACCTGCAGTAGC (reverse primer). The PCR products were sent to Shanghai Suny Biotechnology Co., Ltd., Shanghai, China, for sequencing. The original sequence data were obtained with Sequencing Analysis 5.2 software. MAFFT was used to align all the sequences. The arithmetic means (UPGMA) method was used to construct the phylogenetic tree [37].

Bias of Codon Usage
Genes from different species or within the same species show different codon usage bias modes, and the current oscillating use of unbalanced codons in biology is called codon usage bias, which helps in better understanding the environmental adaptability and molecular evolution of organisms. In this study, 26,108 codons were identified in all protein-coding sequences. Ile had the highest number (2276) of amino acids, while Met had the lowest number (85) of amino acids. Sixty-eight codons were identified with an RSCU > 1 (Figure 3).

Microsatellite Polymorphisms
Microsatellite polymorphisms (i.e., simple sequence repeats (SSRs)) were identified in the chloroplast genome of G. gandavensis, and distributed in the two different types of regions. There were 171 SSRs located in the LSC regions (64.6%), while 47 (17.7%) and 44 (16.6%) SSRs were located in the SSC regions and IR regions, respectively. In this study, 265 SSRs were identified in the G. gandavensis chloroplast genome. Among them, 164 were mononucleotides, 14 were dinucleotides, 78 were trinucleotides, 8 were tetranucleotides, and 1 was a pentanucleotide (Figure 4).

Microsatellite Polymorphisms
Microsatellite polymorphisms (i.e., simple sequence repeats (SSRs)) were identified in the chloroplast genome of G. gandavensis, and distributed in the two different types of regions. There were 171 SSRs located in the LSC regions (64.6%), while 47 (17.7%) and 44 (16.6%) SSRs were located in the SSC regions and IR regions, respectively. In this study, 265 SSRs were identified in the G. gandavensis chloroplast genome. Among them, 164 were mononucleotides, 14 were dinucleotides, 78 were trinucleotides, 8 were tetranucleotides, and 1 was a pentanucleotide (Figure 4).

Microsatellite Polymorphisms
Microsatellite polymorphisms (i.e., simple sequence repeats (SSRs)) were identifie in the chloroplast genome of G. gandavensis, and distributed in the two different types regions. There were 171 SSRs located in the LSC regions (64.6%), while 47 (17.7%) and (16.6%) SSRs were located in the SSC regions and IR regions, respectively. In this stud 265 SSRs were identified in the G. gandavensis chloroplast genome. Among them, 164 we mononucleotides, 14 were dinucleotides, 78 were trinucleotides, 8 were tetranucleotide and 1 was a pentanucleotide (Figure 4).

IR Expansion and Contraction
The IRs serve as integral components of maintaining the stability of the chloroplast genome, as loss of IRs could result in changes in the chloroplast genome [38]. Previous reports have indicated that IR expansion and contraction occur in many plant species [39]. In this study, the IR regions and the junction sites of the LSC and SSC regions in the chloroplast genomes of eight Iridaceae family members (including G. gandavensis) were analyzed ( Figure 5). The results showed that the IR regions ranged from 150,819 bp in C. sativus to 153,735 bp in I. domestica. In addition, the ycf1 gene was located at the SSC/IRa junction in all the chloroplast genomes of different species; however, the ycf1 gene was missing in the Iris missouriensis and I. domestica, but was located at the SSC/IRb junction in other chloroplast genomes. Notably, the coding region of rpl22 was located at the LSC/IRb junction of all the chloroplast genomes, which resulted in the generation of 7, 69, 69, 63, 86, 70, 63, or 63 bp at the LSC/IRb border, respectively. and repeated sequence.

IR Expansion and Contraction
The IRs serve as integral components of maintaining the stability of the chloro genome, as loss of IRs could result in changes in the chloroplast genome [38]. Prev reports have indicated that IR expansion and contraction occur in many plant species In this study, the IR regions and the junction sites of the LSC and SSC regions in the roplast genomes of eight Iridaceae family members (including G. gandavensis) were lyzed ( Figure 5). The results showed that the IR regions ranged from 150,819 bp in C tivus to 153,735 bp in I. domestica. In addition, the ycf1 gene was located at the SSC junction in all the chloroplast genomes of different species; however, the ycf1 gene missing in the Iris missouriensis and I. domestica, but was located at the SSC/IRb juncti other chloroplast genomes. Notably, the coding region of rpl22 was located at the LSC junction of all the chloroplast genomes, which resulted in the generation of 7, 69, 69 86, 70, 63, or 63 bp at the LSC/IRb border, respectively.

Phylogenetic Analysis
A phylogenetic tree of eight Iridaceae species was constructed by the GTRGAM model. The results showed that G. gandavensis had high homology with C. cartwright and C. sativus, followed by I. domestica and other Iris species. Therefore, we speculate Gladiolus has high homology with Crocus ( Figure 6).

Phylogenetic Analysis
A phylogenetic tree of eight Iridaceae species was constructed by the GTRGAMMA model. The results showed that G. gandavensis had high homology with C. cartwrightianus and C. sativus, followed by I. domestica and other Iris species. Therefore, we speculate that Gladiolus has high homology with Crocus ( Figure 6).

Genetic Relationship Analysis of Gladiolus
For further study, we amplified and sequenced the rbcl segment (cpDNA region) of the 49 Gladiolus species to analyze the genetic relationship in Gladiolus. Then, we obtained a phylogenetic tree by using the neighbor-joining (NJ) approach (Figure 7). The results showed that the three Chinese species (64, 65, and 66) all clustered into Group Ⅱ. Specifically, species 66 had high homology with 50 and 54, 65 had high homology with the three species (50, 54, and 65), and 49, and 64 had high homology with 42 and 53. The results indicated that these species may be closely related to the three species.

Genetic Relationship Analysis of Gladiolus
For further study, we amplified and sequenced the rbcl segment (cpDNA region) of the 49 Gladiolus species to analyze the genetic relationship in Gladiolus. Then, we obtained a phylogenetic tree by using the neighbor-joining (NJ) approach (Figure 7). The results showed that the three Chinese species (64, 65, and 66) all clustered into Group II. Specifically, species 66 had high homology with 50 and 54, 65 had high homology with the three species (50, 54, and 65), and 49, and 64 had high homology with 42 and 53. The results indicated that these species may be closely related to the three species. Genes 2022, 13, x FOR PEER REVIEW 10

The Chloroplast Genome of G. gandavensis
The complete chloroplast genome of G. gandavensis was assembled. Then, the sequence data were submitted to the NCBI database, under the GenBank number (OM304631). The structure and characteristics of the G. gandavensis chloroplast genome were analyzed in this study, and the results were consistent with the traits of most angiosperms. In this study, only the clpP and rps12 genes included two introns. Research on the clpP gene indicated that it plays an important role in plant chloroplasts, which are the proteolytic subunits of the ATP-dependent Clp protease [40][41][42]. The rps12 gene is the most unique of all the chloroplast genes and is composed of two parts that are far apart in the genome [43]. Therefore, the study of these two genes would help in understanding the evolutionary process of genes and the genomic characteristics of G. gandavensis.
The sequence analysis among G. gandavensis and Iridaceae speciesCodon usage bias could avert transcriptional errors in the chloroplast genome by affecting the amino acid functions [44][45][46]. This research showed that the content of bases A and T in mononucleotide SSRs in the chloroplast genome of G. gandavensis was 96.95% on average, which was the most frequent. This result is consistent with that of previous reports for most angiosperm chloroplast genomes [47]. Notably, repetitive sequences could be used in phylogenetic studies and genome rearrangements [48]. A comparative analysis of eight Iridaceae plants was conducted. The expansion or contraction of the IR regions in this study indicates that the ycf1 gene was located at the SSC/IRa and SSC/IRb junctions, and this result will help in exploring the evolution of the chloroplast genome in G. gandavensis. Additionally, the phylogenetic tree of eight Iridaceae species indicated that G. gandavensis had higher homology with Crocus than with Iris species. These results show that G. gandavensis is closely related to the Crocus, which would cause similar molecular evolutionary mechanisms.

The Evolutionary Genetics of G. gandavensis
In addition, we obtained the rplc sequence from the 49 Gladiolus samples to analyze the evolution of Gladiolus, and then, constructed a phylogenetic tree by the NJ approach in this study. The results showed that 42, 49, 50, and 54 samples had high homology with the three Chinese species (64, 65, and 66), which may indicate a close evolutionary relationship. Regain was used with AFLP markers to analyze the genetic relationship of 54 Gladiolus cultivars, and the results showed that most of the exotic cultivars as well as indigenous cultivars were closely related to each other. This might be due to a chance similarity in their genotypes [49]. Therefore, we speculate that G. gandavensis originated from South Africa. Our results are similar to the previously mentioned studies; however, the specific relationship still needs further exploration. In the future, a study on population genetics, species identification, and conservation biology of Gladiolus may be conducted.

Conclusions
The chloroplast genome of G. gandavensis was assembled by Illumina sequencing technology. The sequence information was deposited into the NCBI database under the GenBank number (OM304631). By comparing its structure with that of other Iridaceae species, we found that Gladiolus had a higher homology with Crocus than with Iris. A study on the theoretical relationship among the Gladiolus species based on the rbcl chloroplast genome sequence will provide reference information for relationship homology, germplasm resource preservation, and sustainable use of these Gladiolus species.

Conflicts of Interest:
The authors declare no conflict of interest.