Exploring the History of Chloroplast Capture in Arabis Using Whole Chloroplast Genome Sequencing

Chloroplast capture occurs when the chloroplast of one plant species is introgressed into another plant species. The phylogenies of nuclear and chloroplast markers from East Asian Arabis species are incongruent, which indicates hybrid origin and shows chloroplast capture. In the present study, the complete chloroplast genomes of A. hirsuta, A. nipponica, and A. flagellosa were sequenced in order to analyze their divergence and their relationships. The chloroplast genomes of A. nipponica and A. flagellosa were similar, which indicates chloroplast replacement. If hybridization causing chloroplast capture occurred once, divergence between recipient species would be lower than between donor species. However, the chloroplast genomes of species with possible hybrid origins, A. nipponica and A. stelleri, differ at similar levels to possible maternal donor species A. flagellosa, which suggests that multiple hybridization events have occurred in their respective histories. The mitochondrial genomes exhibited similar patterns, while A. nipponica and A. flagellosa were more similar to each other than to A. hirsuta. This suggests that the two organellar genomes were co-transferred during the hybridization history of the East Asian Arabis species.


Introduction
The genus Arabis includes about 70 species that are distributed throughout the northern hemisphere. The genus previously included many more species, but a large number of these were reclassified into other genera, including Arabidopsis, Turritis, and Boechera, Crucihimalaya, Scapiarabis, and Sinoarabis [1][2][3][4][5][6]. Because of their highly variable morphology and life histories, Arabis species have been used for ecological and evolutionary studies of morphologic and phenotypic traits [7][8][9][10][11]. The whole genome of Arabis alpina has been sequenced, providing genomic information for evolutionary analyses [12,13].
Molecular phylogenetic studies of Arabis species have been conducted to determine species classification and also correlation to morphological evolution of Arabis species [10,14,15]. Despite having similar morphologies, A. hirsuta from Europe, North America, and East Asia have been placed in different phylogenetic positions and are now considered distinct species. For example, East Asian A. hirsuta, which was previously classified as A. hirsuta var. nipponica, is now designated as A. nipponica [16]. Meanwhile, nuclear ITS sequences indicated that A. nipponica, A. stelleri, and A. takeshimana were closely related to European A. hirsuta. However, chloroplast trnLF sequences indicated that the species were closely related to East Asian Arabis species [14,16]. Such incongruent nuclear and organellar phylogenies have been reported from in other plant species and this is generally known as "chloroplast capture" [17,18], which is a process that involves hybridization and many successive backcrosses [17]. When chloroplast capture happens, the chloroplast genome of a species is replaced by another species' chloroplast genome. A. nipponica may have originated Int. J. Mol. Sci. 2018, 19, 602 2 of 12 from the hybridization of A. hirsuta or A. sagittata and East Asian Arabis species (similar to A. serrata, A. paniculata, and A. flagellosa), which act as paternal and maternal parents, respectively [14,16]. However, the evolutionary history and hybridization processes of A. nipponica and other East Asian Arabis species still need to be clarified. Because these conclusions for incongruence between nuclear and chloroplast phylogenies came from analyzing a small number of short sequences, hybridized species, the divergence level, and the classification of species are somewhat ambiguous. In the present study, the whole chloroplast genomes of three Arabis species were sequenced in order to analyze their divergence and evolutionary history. The whole chloroplast genome sequences also provide a basis for future marker development.

Chloroplast Genome Structure of Arabis Species
The structures of the whole chloroplast genomes are summarized in Table 1, which also includes previously reported Arabis chloroplast genomes and the chloroplast genome of the closely related species Draba nemorosa. The chloroplast genome structure identified in the present study is shown as a circular map (see Figure 1). The complete chloroplast genomes of the Arabis species had total lengths of 152,866-153,758 base pairs, which included 82,338 to 82,811 base pair long single copy (LSC) regions and 17,938 to 18,156 base pair short single copy (SSC) regions, which were separated by a pair of 26,421 to 26,933 base pair inverted repeat (IR) regions. The structure and length are conserved, and are similar to other Brassicaceae species' chloroplast genome sequences [19][20][21][22]. The complete genomes contain 86 protein-coding genes, 37 tRNA genes, and eight rRNA genes. Of these, seven protein-coding genes, seven tRNA genes, and four rRNA genes were located in the IR regions, and were therefore duplicated. The rps16 gene became a pseudogene in A. flagellosa, A. hirsuta, and A. nipponica strain Midori, which was previously reported as a related species [23]. In addition, the rps16 sequences of D. nemorosa, A. stelleri, A. flagellosa, A. hirsuta, and A. nipponica shared a 10 base pair deletion in the first exon, while A. stelleri, A. flagellosa, A. hirsuta, and A. nipponica shared a 1 base pair deletion in the second exon and D. nemorosa lacked the second exon entirely. The rps16 sequence of A. alpina also lacked part of the second exon and had mutations in the start and stop codons. Therefore, different patterns of rps16 pseudogenization were observed in A. alpina and the other Arabis species, as was previously suggested [23]. The A. alpina lineage had acquired independent dysfunctional mutation(s). The patterns observed for the European A. hirsuta revealed that the pseudogenization of rps16 in the other Arabis species might not have occurred independently but, instead, occurred before the divergence of D. nemorosa and other Arabis species after splitting from A. alpina.

Chloroplast Genome Divergence
Phylogenetic trees were generated by using whole chloroplast genome sequences and concatenated coding sequence (CDS) regions (see Figure 2). The inclusion of other Brassicaceae members revealed that D. nemorosa should be placed within Arabis, as previously reported [24]. In both trees, the two A. nipponica strains were grouped with A. flagellosa and A. stelleri. Although several nodes were supported by high bootstrap probabilities, the nearly identical sequences of the four East Asian Arabis species made them indistinguishable.
The divergence among the Arabis chloroplast genomes was shown using a VISTA plot (see Figure 3) and this was summarized in Table 2. The genome sequences of the two Japanese A. nipponica strains differed by only 55 nucleotide substitutions (0.036% per site), while those of A. hirsuta and A. nipponica differed by about 3500 sites (2.4% per site). The chloroplast genomes of A. nipponica and the other two East Asian Arabis species were also very similar (~100 nucleotide differences, <0.1% per site). Additionally, the 35 CDS regions, 29 tRNA genes, and four rRNA genes of the four East Asian Arabis species were identical, with three, 27, and four, respectively, also found to be identical in A. hirsuta. The levels of divergence between the East Asian Arabis species were similar to previously reported levels of variation within the local A. alpina population, in which 130 SNPs were identified among 24 individuals (Waterson's θ = 0.02%) [25]. If the hybridization event had facilitated chloroplast capture, the divergence between the A. stelleri and A. nipponica chloroplast genomes should have been less than their divergence from A. flagellosa. However, the divergence between the potential hybrid-origin species (A. stelleri and A. nipponica: 0.068 to 0.085) was similar to their divergence from A. flagellosa (0.056 to 0.086). Although the level of divergence was too low to make reliable comparisons, it is possible that A. stelleri and A. nipponica originated from independent hybridization events or the introgression process may still be ongoing.  The divergence among the Arabis chloroplast genomes was shown using a VISTA plot (see Figure 3) and this was summarized in Table 2. The genome sequences of the two Japanese A. nipponica strains differed by only 55 nucleotide substitutions (0.036% per site), while those of A. hirsuta and A. nipponica differed by about 3500 sites (2.4% per site). The chloroplast genomes of A. nipponica and the other two East Asian Arabis species were also very similar (~100 nucleotide differences, <0.1% per site). Additionally, the 35 CDS regions, 29 tRNA genes, and four rRNA genes of the four East Asian Arabis species were identical, with three, 27, and four, respectively, also found to be identical in A. hirsuta. The levels of divergence between the East Asian Arabis species were similar to previously reported levels of variation within the local A. alpina population, in which 130 SNPs were identified among 24 individuals (Waterson's θ = 0.02%) [25]. If the hybridization event had facilitated  four in A. nipponica strain JO23, and one in A. nipponica strain Midori. Five of the SSRs were shared by the two A. nipponica strains, which suggests that they were also species-specific. Although the two A. nipponica strains were similar to each other, A. flagellosa, A. stelleri, and A. nipponica differ to a similar degree in terms of of variable SSRs, which suggests that the occurrence of chloroplast capture would be independent or still ongoing. This was suggested by the patterns of nucleotide substitutions.

Distribution of Simple Sequence Repeats in the Chloroplast Genomes
Because the extremely low divergence among the East Asian Arabis species made it difficult to resolve their evolutionary relationships, other highly variable markers were needed. Therefore, simple sequence repeat (SSR) regions throughout the chloroplast genome were assessed for their ability to provide high-resolution species definition. A total of 74 mono-nucleotide, 22 di-nucleotide, and two tri-nucleotide repeat regions of ≥10 base pairs in length were identified (see Table 3). However, these repeat regions were still unable to completely resolve the relationships of the East Asian Arabis species. Fifty of the 98 SSRs exhibited no variation among the East Asian Arabis species, while only 29 SSRs exhibited species-specific variation, including nine in A. flagellosa, 15 in A. stelleri, four in A. nipponica strain JO23, and one in A. nipponica strain Midori. Five of the SSRs were shared by the two A. nipponica strains, which suggests that they were also species-specific. Although the two A. nipponica strains were similar to each other, A. flagellosa, A. stelleri, and A. nipponica differ to a similar degree in terms of of variable SSRs, which suggests that the occurrence of chloroplast capture would be independent or still ongoing. This was suggested by the patterns of nucleotide substitutions.  T  16  15  15  T5GT10  16  7  64,629  64,639  T  11  11  11  11  11  T6GT3G  65,636  65,645  C  10  13  11  13  8  C2TCTGC7  66,253  66,262  AT  5  5  5  5  4  7  66,851  66,864  A  14  14  14  19  17  12  68,965  68,977  T  13  13  13  13  11  11  69,965  69,975  T  11  11  12  11  11  8  75,328  75,340  A  13  14  14  13  19  14  Table 3. Cont.  T  13  13  13  13  13  13  78,154  78,162  TTG  3  5  3  3  4  2  80,484  80,493  A  10  11  10  10  10  9  81,019  81,035  T  17  17  17  17  17  17  81,178  81,191  T  14  14  14  14  18  8  82,568  82,578  A  11  10  9  10  9  10  83,489  83,498  TA  5  5  5  5  5  4

Mitochondrial Genome Analysis
Chloroplast capture could have originated from hybridization events that also affected other cytoplasmic genomes. Due to this, variation in the mitochondrial genome sequences was analyzed. Mapping next-generation sequencing (NGS) reads to the Eruca vesicaria mitochondrial genome revealed that 29 sites with five or more mapped reads varied among the A. nipponica strain Midori, A. flagellosa, and A. hirsuta (see Table 4). Twenty-eight of the sites were conserved among A. nipponica and A. flagellosa. One site was specific to A. nipponica and provided 100% support for the relationship between A. nipponica and A. flagellosa. Even though reliability decreased, 123 of 125 sites with two or more reads (98.4%) also supported the similarity of the A. nipponica and A. flagellosa mitochondrial genomes. These findings suggest that the hybridization history of the species affects both the chloroplast and the mitochondrial genomes similarly.
East Asian Arabis species have previously been reported to show evidence of chloroplast capture [14,16]. More specifically, detailed phylogenetic analyses of nuclear and chloroplast marker genes has suggested that A. nipponica, A. stelleri, and A. takeshimana originated from the hybridization of A. hirsuta (or A. sagittata) and East Asian Arabis species (close to A. serrata, A. paniculata, and A. flagellosa), which act as paternal and maternal parents, respectively [14,16]. In the present study, comparing the whole chloroplast genomes of four plants from three East Asian Arabis species (two A. nipponica, one each of A. stelleri, and A. flagellosa) revealed genome-wide similarities that indicated chloroplast capture by A. nipponica and A. stelleri. The study also compared the species' partial mitochondrial genomes, which indicated a closer relationship between A. nipponica and A. flagellosa than between the former and European A. hirsuta. This suggested that A. nipponica also has a history of mitochondrial capture. This is not surprising, because hybridization and backcrossing could have similar effects on both organellar genomes. Also, cyto-nuclear incompatibility caused by a mitochondrial genome could lead cytoplasmic replacement to exhibit chloroplast capture [17,41,42]. The pattern of variation in the mitochondrial genomes suggested that both the chloroplast and mitochondrial genomes were co-transmitted during the evolutionary history of East Asian Arabis species. Future research should focus on the process of chloroplast (organellar) capture. Simple backcrossing could show the mechanisms of cytoplasm replacement and could produce results in as few as a hundred generations under certain conditions [42]. In the present study, the divergence between the genomes of hybrid-origin species and putative pollen-donor species was similar to the divergence observed within species, which suggests that the hybridization event was relatively recent. Nuclear genome markers are needed to estimate the proportion of parental genome fragments in the current nuclear genome of A. nipponica.

DNA Isolation, NGS Sequencing, and Genome Assembly
Chloroplasts were isolated from A. hirsuta and A. nipponica as described in Okegawa and Motohashi [43]. DNA was isolated from the chloroplasts using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA), while the total DNA was isolated from leaves of A. flagellosa. NGS libraries were constructed using the Nextera DNA Sample Preparation Kit (Illumina, San Diego, CA, USA) and sequenced as single-ended reads using the NextSeq500 platform (Illumina). About 2 Gb (1.4 Gb, 12 M clean reads) of sequences were obtained for A. flagellosa (43 Mb mapped reads, 282.69× coverage). Additionally, 400 Mb (300 Mb, 2.5 M clean reads) were obtained for both A. hirsuta (64 Mb mapped reads, 417.17× coverage) and A. nipponica (72 Mb mapped reads, 455.87× coverage). The generated reads were assembled using velvet 1.2.10 [44] and assembled into complete chloroplast genomes by mapping to previously published whole chloroplast genome sequences. Sequence gaps were resolved using Sanger sequencing. Genes were annotated using DOGMA [45] and BLAST. The newly constructed chloroplast genomes were deposited in the DDBJ database under the accession numbers LC361349-51. Finally, the circular chloroplast genome maps were drawn using OGDRAW [46].

Mapping NGS Reads to Mitochondrial Genome Sequences
Because the chloroplast isolation method used in the present study did not completely exclude mitochondria, about 1% of the sequence reads were derived from mitochondrial genomes. Although this proportion is too low to be useful for assembling whole mitochondrial genomes, the reads were nevertheless mapped to the mitochondrial genome of Eruca vesicaria (KF442616) [54] in order to measure mitochondrial genome divergence. Regions with at least five mapped reads were used for the analysis.