Comparison of the Whole-Plastome Sequence between the Bonin Islands Endemic Rubus boninensis and Its Close Relative, Rubus trifidus (Rosaceae), in the Southern Korean Peninsula

Rubus boninensis is a rare endemic species found on the Bonin Islands with a very restricted distribution. It is morphologically most closely related to Rubus trifidus, occurring widely in the southern Korean peninsula and Japan. This species pair provides a good example of anagenetic speciation on an oceanic island in the northwestern Pacific Ocean—R. trifidus as a continental progenitor and R. boninensis as an insular derivative species. In this study, we firstly characterized the complete plastome of R. boninensis and R. trifidus and compared this species pair to another anagenetically derived species pair (R. takesimensis–R. crataegifolius). The complete plastome of R. trifidus was 155,823 base pairs (bp) long, slightly longer (16 bp) than that of R. boninensis (155,807 bp). No structural or content rearrangements were found between the species pair. Eleven hotspot regions, including trnH/psbA, were identified between R. trifidus and R. boninensis. Phylogenetic analysis of 19 representative plastomes within the family Rosaceae suggested sister relationships between R. trifidus and R. boninensis, and between R. crataegifolius and R. takesimensis. The plastome resources generated by the present study will help elucidate plastome evolution and resolve phylogenetic relationships within highly complex and reticulated lineages of the genus Rubus.


Introduction
The Bonin Islands, also known as the Ogasawara Islands, consist of 25 small islands (>0.1 km 2 ) and many islets scattered in the region of 24 • 14'-27 • 44' N and 140 • 52'-142 • 15' E, and are located approximately 1000 km directly south of the Japanese archipelago [1,2]. The Bonin Islands consist of two island groups, i.e., Ogasawara Group (Hahajima, Chichijima, and Mukojima) and Volcano Group (Kitaiwojima, Iwojima, and Minamiiwojima). The Ogasawara Group is a group of three aggregated islands, and are aligned from south to north, i.e., Hahajima, Chichijima, and Mukojima. Although they were formed during the Paleogene period, their uplift started in the Pleistocene epoch, exposing the landmass above the sea level before the middle Pleistocene epoch [3,4]. Therefore, organisms presumably started colonization from the late Pliocene to early Pleistocene epochs [5][6][7]. Of the 369 indigenous vascular plant species, approximately 40% of them are endemic to the Bonin Islands, Compared to its continental progenitor, R. crataegifolius, which occurs rather widely in northeastern Asia (China, Japan, Korea, and Russian Far East), R. takesimensis is characterized by a lack of prominent prickles (i.e., loss of defense mechanism) and an overall large status of plants (i.e., insular gigantism) as a response to release from selection pressure of herbivores and due to the fact of its moderate insular climatic setting, respectively. Ulleung Island is known for unusually high levels of anagenetic speciation (at least 88% of vascular endemic species), mainly driven by a lack of vegetation heterogeneity, younger island age, and low elevation [16]. Recently, we demonstrated a sister relationship between this continental progenitor (R. crataegifolius) and insular derivative (R. takesimensis) species pair and investigated the population genetic structure among them [20,21]. In addition, we compared the complete plastome sequences of R. crataegifolius and R. takesimensis and characterized their molecular evolution, identifying mutational hotspot regions [22]. The example of anagenetic speciation of Rubus found in East Sea/Sea of Japan, R. crataegifolius-R. takesimensis, and another example found in the northwestern Pacific Ocean, R. trifidus-R. boninensis, could be an ideal system to investigate genome evolution of organelles during anagenetic speciation on ocean islands.
In this study, we determined two complete plastome sequences of the insular derivative, R. boninensis, and the continental progenitor, R. trifidus, in the northwestern Pacific Ocean, and compared them to two previously reported plastomes of an anagenetically derived species pair in the East Sea/Sea of Japan. This allowed us to characterize the plastome sequences of two anagenetically derived species in different oceanic islands and to reveal any molecular changes occurring during anagenetic speciation. In addition, we hoped to identify mutation hotspots in the plastomes of R. boninensis and R. trifidus belonging to subgenus Anoplabatus. Such plastome hotspot regions could then be utilized as efficient maternally inherited molecular markers for phylogeographic and population genetic study of the Rubus species belonging to subgenus Anoplabatus. Lastly, this study aimed to develop simple sequence repeat (SSR) markers based on R. boninensis to discriminate closely related congeneric species of Rubus. Taken together, the results of this comparative plastome study will shed new light on chloroplast genome structure and evolution of insular endemic species pairs during anagenetic speciation and contribute to the development of chloroplast markers based on mutation hotspot regions, thereby facilitating resolution of phylogenetic relationships among closely congeneric species of Rubus.

Plastome Sequencing and Annotation
Fresh leaves of a single plant of R. boninensis were collected from the Volcano Islands group in the Bonin Islands (i.e., Minamiiwojima), Japan (voucher specimen: KYO_Takayama17062202). Similarly, leaves of R. trifidus were collected from Yigidae, Busan, southern part of Korea peninsula (voucher specimen: KNU_Yigidae180513) and dried with silica gel before DNA extraction. Total DNA was isolated by using the DNeasy Plant Mini Kit (Qiagen, Carlsbad, CA, USA) and sequenced with an Illumina HiSeq 4000 (Illumina, Inc., San Diego, CA, USA), yielding 150 bp paired-end read length, at Macrogen Corporation (Seoul, Korea). A total of 22,273,138 and 43,891,068 paired-end reads were obtained for R. boninensis and R. trifidus, respectively, and assembled de novo using Velvet v. 1.2.10 with multiple k-mers [23]. The tRNAs were confirmed using with tRNAscan-SE [24]. Annotation was conducted using Geneious R10 [25] and the annotated plastome sequences were submitted to GenBank (accession numbers MH734123 and MK465682 for R. boninensis and R. trifidus, respectively). The annotated GenBank format sequence file was used to draw a circular map with OGDRAW program v1.2 [26].

Comparative Plastome Analysis
The complete plastomes of R. boninensis and R. trifidus were compared to those of two other Rubus species, R. crataegifolius (MG189543) and R. takesimensis (MH734123), using mVISTA [27] in Shuffle-LAGAN mode [28]. The four Rubus plastome sequences were aligned with MAFFT v. 7 [29] and adjusted manually with Geneious [25]. By using DnaSP v. 6.10 software [30], a sliding window analysis with a step size of 200 bp and window length of 800 bp was carried out to determine the nucleotide diversity (Pi) of the plastome. The codon usage frequency was calculated using MEGA7 [31] with relative synonymous codon usage (RSCU) value [32], which is a simple measure of non-uniform usage of synonymous codons in a coding sequence. The DNA code used by bacteria, archaea, prokaryotic viruses, and chloroplast proteins was used [33].

Tandem Repeat and Microsatellite Analysis
Microsatellite or SSR markers were identified in the plastome sequences by using MISA [34] with minimum repeat thresholds of ten for mononucleotide repeats, four for dinucleotide repeats, four for trinucleotide repeats, four for tetranucleotide repeats, four for pentanucleotide repeats, and three for hexanucleotide repeats [22].

Phylogenetic Analysis
For the phylogenetic analysis, the complete plastome sequences of 18 representative species from the family Rosaceae (seven species from Rubus, including R. corchorifolius (KY419958), R. niveus (KY419961), and R. fockeanus (KY420018); six species from Fragaria; two species from Rosa; one species from Prunus; two species from Pyrus; and one species from Prinsepia) were aligned with MAFFT v. 7 [29] in Geneious [25]. Maximum likelihood (ML) analysis based on the best-fit model of TVM+F+R2 was conducted with IQ-TREE v. 1.4.2 [35]. Prinsepia utilis was used as an outgroup, and non-parametric bootstrap analysis was performed with 1000 replicates.

Genome Size and Features
The complete plastome sequence of R. boninensis was 155,807 bp long, with a large single copy (LSC) region of 85,438 bp, small single copy (SSC) region of 18,783 bp, and two inverted repeat (IR) regions of 25,793 bp. The R. trifidus plastome was 155,823 bp long, with a large single copy (LSC) region of 85,466 bp, small single copy (SSC) region of 18,759 bp, and two inverted repeat (IR) regions of 25,799 bp ( Figure 1 and Table 1). The two plastomes of R. boninensis and R. trifidus contained 131 genes, including 84 protein-coding, 8 ribosomal RNA, and 37 transfer RNA genes. The overall guanine-cytosine (GC) content of both R. boninensis and R. trifidus was 37.1%.  In both the species, 17 genes were duplicated in the IR regions, including seven tRNA, four rRNA, and six protein-coding genes. Fifteen genes (ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, rps12, rps16, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) contained one intron, whereas clpP and ycf3 each contained two introns. Interestingly, the highly conserved group II intron of atpF was lost, as we have demonstrated in the case of R. crataegifolius and R. takesimensis [22]. It remains to be determined if loss of the atpF intron, which occurs frequently in the two genera, Rosa and Rubus, has also occurred in the other major lineages of Rosaceae and related Rosid families [22,36]. A partial ycf1 gene (1221 bp in both the species, R. boninensis and R. trifidus) was located at the IR b /SSC junction region, whereas the complete ycf1 gene was located in the IR region at the SSC/IR a junction. To reveal a hybrid origin of some endemic taxa of Rubus (subgenus Idaeobatus) in the Hawaiian Islands, Howarth et al., [37] successfully used the ndhF gene, which is known to have frameshift mutations and alterations on transcription termination due to the higher substitution rates, a wide range of insertion and deletion (indel) variations, and a high AT content [38]. Although the closely related Rosa section Synstylae showed frameshift mutations on the 3' end of the ndhF gene [39], only nucleotide substitution and alteration on transcription were found in the four species of Rubus, without size variation (a total CDS length of 2244, which is the same as that of Rosa section Synstylae). The infA gene, which was located in the LSC region, became a pseudogene. The plastome sequence of the insular derived species, R. boninensis, was highly similar to that of the continental progenitor species, R. trifidus (99.6% sequence similarity; 155,355 bp identical sites), and the R. trifidus plastome sequence was just 16 bp longer than that of R. boninensis (Table 1). In case of the species pair in East Sea/Sea of Japan, the complete plastome sequences of R. takesimensis and R. crataegifolius were 99.8% similar (i.e., 155,537 bp identical sites); the R. takesimensis plastome was 46 bp longer than the R. crataegifolius plastome (a 28 bp extension in the LSC and a 18 bp extension in the SSC) [22].
The frequency of codon usage in R. boninensis and R. trifidus was calculated for their plastomes based on protein-coding genes and tRNA genes ( Table 2). The codon usage bias (CUB) refers to differences in the frequency of occurrence of synonymous codons in coding DNA, and it has been demonstrated that CUBs could be manifested by maintaining a balance between mutational bias and natural selective forces [40,41]. Therefore, analysis and characterization of CUBs at the genomic scale can help elucidate molecular evolution and environmental adaptation [42]. Overall, we detected similar patterns in codon usage between R. boninensis and R. trifidus. Some exceptions included AUG codon usage of trnI-CAU, trnfM-CAU, and CAA codon usage of trnK-UUG in R. boninensis; and UAG and UCA codon usage of trnI-CAT and trnS-UGA, respectively, in R. trifidus. The frequency of codon usage of R. crataegifolius and R. takesimensis is also summarized in Table 3. When compared with the pair of R. boninensis-R. trifidus, AUG (trnI-CAU, trnfM-CAU, and trnM-CAU), UCA (trnS-UGA), UAG (no usage), and CAA codon usage (trnQ-UUG) showed different patterns. The codon usage of two pairs of Rubus species (R. boninensis-R. trifidus and R. crataegifolius-R. takesimensis) was biased toward a high RSCU values of U and A at the third codon usage, a similar phenomenon found in other angiosperm [43] and algal lineages [40].

Analysis of Microsatellites
We found a nearly identical number of potential SSRs between the continental progenitor, R. trifidus (a total of 112 SSRs), and insular derived, R. boninensis, in the Bonin Islands (a total of 111 SSRs). Of a total of 86 unique consensus sequences (out of 111 copies) identified in R. boninensis, 57 (66.3%) were located in the LSC region, 11 in the SSC region (12.8%), and 18 (20.9%) in the two IR regions (Supplementary Table S1). Of a total of 91 unique consensus sequences (out of 112 copies) identified in R. trifidus, 63 (69.2%) were located in the LSC region, 12 in the SSC region (13.2%), and 16 (17.6%) in the two IR regions (Supplementary Table S2). Therefore, we found slight differences in the number of SSRs obtained between R. boninensis and R. trifidus. In addition, mononucleotide repeats were detected in 46 (41.4%) and 46 (41.1%) SSRs in R. boninensis and R. trifidus, respectively, while very low frequencies of 1 (0.9%) and 3 (2.7%) for trinucleotide repeats were found in R. boninensis and R. trifidus, respectively (Figure 2A).

Phylogenetic Analysis
Maximum likelihood analysis of complete cp genome sequences, which included 19 representative members of the family Rosaceae, was carried out based on the best-fit model of TVM+F+R2. Of a total of 170,274 aligned nucleotide bases, 149,000 (87.5%) were constant and 21,274 (12.5%) were variable sites, with 8273 (4.9%) parsimony-informative sites. The ML tree supported the monophyly of Rubus and the sister relationship between the continental progenitor, R. trifidus, and R. boninensis in the Bonin Islands of the northwestern Pacific Ocean in subgenus Anoplobatus, and also between the continental progenitor, R. crataegifolius, and the insular derivative, R. takesimensis in the East Sea/Sea of Japan in subgenus Idaeobatus ( Figure 6). The complete chloroplast genome sequences provided full resolutions within Rubus, with high support values (all but one 100% BS support). The ML tree also provided evidence that the currently delimited subgenus Idaeobatus is not monophyletic. Given the lack of sufficient resolution and insufficient support for relationships of interest within Rubus [57][58][59][60][61], phylogenomic study (or phylogenetic study based on hotspot regions identified in this study) within the genus will shed new light on the disentangling of complex evolutionary events within the genus. Furthermore, inferences based on large-scale phylogenetic frameworks and our understanding of trait evolution within Rosaceae should benefit from phylogenomic approaches based upon whole-plastome sequencing [62,63].

Conclusions
The complete plastome sequences of the insular derived R. boninensis, a rare plant endemic to the Bonin Islands in the northwestern Pacific Ocean and R. trifidus, a continental progenitor, were determined. Their plastome sequences were compared to those of R. crataegifolius, the continental progenitor and R. takesimensis, the insular derivative on Ulleung Island in East Sea/Sea of Japan. These species pairs represent parallel plastome systems in two different subgenera of Rubus, Anoplobatus and Idaeobatus, providing insights into plastid genome evolution during anagenetic speciation. Both infA pseudogenization and atpF intron loss were observed in the two species pairs. The relative synonymous codon usage of genus Rubus was biased toward high RSCU values of U and A at the third codon. Several mutation hotspot regions for the R. boninensis-R. trifidus species pair included trnH/psbA, clpP intron1, clpP/psbB, trnP/psaJ, ndhA intron, trnK/rps16, trnT/trnL, trnF/ndhJ, psaJ/rpl33, rps15/ycf1, and ycf1. Based on four complete plastome sequences of Rubus, ten highly variable regions, including trnT/trnL, trnF/ndhJ, clpP intron2, trnK/rps16, psbE/petL, rpl32/trnL, trnH/psbA, rps4/trnT, rps12/clpP, and ycf1 were detected. The application of these markers will be a powerful tool for barcoding and for cultivar and germplasm identifications for economically important Rubus and related genera in Rosaceae. In addition to the identification of hotspot regions, we also identified cpSSRs for our four species of interest. We found a higher number of SSRs for the continental progenitor species R. crataegifolius and R. trifidus than for the insular derived species R. boninensis and R. takesimensis. The most common SSR motifs in the four Rubus species were dinucleotide repeats; however, such repeats were not found in other Rosaceae genera (e.g., Malus and Rosa). The location of SSRs motifs and A/T abundance detected from four Rubus species were consistent with other members of Rosaceae. The phylogenetic analysis confirmed the evolution of two anagenetically derived insular species from their continental progenitors. Additional studies using multiple samples from continental and island species based on highly variable plastome markers found in this study can ultimately confirm such progenitor-derivative relationships and help us better understand the anagenetic speciation of island endemics. In addition, the phylogenomic analysis of complete plastome sequences will be an effective tool to infer phylogenetic relationships within Rubus and to establish infra-familial classifications within Rosaceae.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/10/774/s1, Table S1: Distribution, length, and location of repeat sequences in the plastome sequence of Rubus boninensis in the Bonin Islands, Table S2: Distribution, length, and location of repeat sequences in the complete plastome sequence of Rubus trifidus from the southern Korean peninsula, Table S3: Distribution, length, and location of repeat sequences in the Rubus crataegifolius plastome sequence.