The Complete Mitochondrial Genome of Bactrocera carambolae (Diptera: Tephritidae): Genome Description and Phylogenetic Implications

Bactrocera carambolae is one of the approximately 100 sibling species of the Bactrocera dorsalis complex and considered to be very closely related to B. dorsalis. Due to their high morphological similarity and overlapping distribution, as well as to their economic impact and quarantine status, the development of reliable markers for species delimitation between the two taxa is of great importance. Here we present the complete mitochondrial genome of B. carambolae sourced from its native range in Malaysia and its invaded territory in Suriname. The mitogenome of B. carambolae presents the typical organization of an insect mitochondrion. Comparisons of the analyzed B. carambolae sequences to all available complete mitochondrial sequences of B. dorsalis revealed several species-specific polymorphic sites. Phylogenetic analysis based on Bactrocera mitogenomes supports that B. carambolae is a differentiated taxon though closely related to B. dorsalis. The present complete mitochondrial sequences of B. carambolae could be used, in the frame of Integrative Taxonomy, for species discrimination and resolution of the phylogenetic relationships within this taxonomically challenging complex, which would facilitate the application of species-specific population suppression strategies, such as the sterile insect technique.


Introduction
The Bactrocera dorsalis species complex consists of approximately 100 morphologically similar taxa distributed mainly in South-East Asia and Australasia [1]. Although most members within the complex present no economic interest, a small number of them are serious pests infesting many commercial throughout the mitogenome. Furthermore, a phylogenetic analysis within Bactrocera was performed focusing on the placing of the B. carambolae complete mitogenomes in comparison to B. dorsalis.

Insects
The B. carambolae specimens used in this study originated from Malaysia and Suriname. The Malaysian strains were collected from infested wax apples (Syzygium spp.) from the forest fringe in Raub, Pahang state, Malaysia. The emerged adults from the infested fruits were subjected to morphological identification based on three key morphological characteristics: (a) the presence of a recurved pattern on the wing costal band beyond apex R 4+5 , (b) the presence of a fore femoral dark spot and (c) the presence of bar-shaped bands at terga III-V [3]. Close morphological identification confirmed all flies emerged from this source was B. carambolae. Flies were laboratory reared for 2-3 generations on carambola (Averrhoa carambola) fruits (27 ± 2 • C, 85% ± 5% RH, 12 h L: 12 h D) in order to confirm the species status of the offspring and to raise a sufficient number of flies. The sample from Suriname came from a laboratory colony initiated by insects collected from carambola fruits from the districts of Paramaribo and Saramacca, Suriname, and reared on carambola fruits for 21 generations (24 • C, 85% RH, 12 h L: 12 h D) in the Carambola fruit fly unit, Department of Agricultural Research, Ministry of Agriculture, Animal Husbandry and Fisheries, Paramaribo. Pupae from the above strains were sent to the Insect Pest Control Laboratory (IPCL) of the Joint FAO/IAEA Division of Nuclear Techniques in Food and Agriculture (Seibersdorf, Austria) and adults emerging from these pupae were used in the present study. In addition, B. dorsalis specimens from laboratory colonies maintained on artificial diet (25 ± 2 • C, 60% ± 5% RH, 14 h L: 10 h D) at the IPCL were used. The above colonies represented three populations originating from Saraburi (Thailand), Philippines (B. 'syn. philippinensis') and Kenya (B. 'syn. invadens'). Their status has been verified by taxonomists and the insect materials have been used in several research projects [11,[17][18][19][20]69].

DNA Isolation, Amplification and Sequencing
Total genomic DNA was extracted from single flies, using either the CTAB protocol [70] or the DNeasy Blood and Tissue kit (Qiagen, Hilden, Germany) following manufacturer's instructions for total DNA purification from animal tissues. Negative controls were included in DNA extraction. DNA quality and quantity were measured using the NanoDrop 1000 Spectrophotometer (Thermo Fischer Scientific, Waltham, MA, USA).
Each mitogenome sequence was obtained from a single specimen by standard PCR amplifications using primers that were designed based on the mitochondrial sequence of Bactrocera dorsalis (accession no NC_008748; Supplementary Table S1). Twenty-seven pairs of primers targeting overlapping fragments were designed by the Oligoexplorer and Oligoanalyzer programs (Supplementary Table S2). Approximately 30 ng of template DNA was used in each reaction of 25 µL (1× PCR buffer, 1.5 mM MgCl 2 , 0.2 mM of each dNTP, 0.5 µM of the appropriate primers and 1 U Taq polymerase). The BIOTAQ (BIOLINE, UK) or the One Taq (NEB, Ipswich, MA, USA) DNA polymerases were used. Amplification was performed in a SensoQuest thermocycler by the following program: initial denaturation at 94 • C for 3 min, 40 cycles of 45 section denaturation at 94 • C, 30 section primer annealing at 46-59 • C and 1 min DNA chain extension at 72 • C, and final extension at 72 for 7 min. PCR products were purified by the Nucleospin Gel and PCR Clean up kit (Macherey Nagel, Düren, Germany) or by Exonuclease I and Shrimp Alkaline Phosphatase (NEB, USA).
Sequencing reactions were performed by Macrogen Europe (Amsterdam, The Netherlands) or Eurofins Genomics (Ebersberg, Germany). Each fragment was sequenced in both directions and the sequences obtained by the forward and the reverse reactions were merged using EMBOSS Merger [71] after careful manual inspection. In cases of inconsistencies reactions were repeated. The mitogenome sequences were assembled using EMBOSS Merger [71] and submitted to GenBank (accession nos.: KT343905, MG916998, MN104217-20, Supplementary Table S1).

Phylogenetic Analysis
Phylogenetic analysis based on concatenated COI and ND4 partial gene sequences was performed using B. dorsalis and B. carambolae sequences from different locations previously analyzed by Boykin et al. [27] (Supplementary Table S3) together with the corresponding gene fragments from the complete sequences generated in the present study (Supplementary Table S1) (dataset 1). Phylogenetic analysis based on alignments of complete mtDNA sequences was performed using all Bactrocera complete mitogenomes available (Supplementary Table S1) (dataset 2). Multiple sequence alignments were constructed by ClustalW using default parameters. Phylogenetic trees were inferred by the Maximum Likelihood (ML) method based on the Hasegawa-Kishino-Yano model (dataset 1) or the General Time Reversible (GTR) (dataset 2) model with 1000 bootstrap replicates. All the above analyses, alignments, model selection and phylogeny reconstruction, were performed in MEGA 7.0 [75][76][77]. Dataset 2 was also analyzed by maximum likelihood (ML) inference using IQ-TREE 1.4.2 [78] and, in particular, the IQ-TREE web server (http://iqtree.cibiv.univie.ac.at) [79]. The best-fit substitution model was determined by IQ-TREE ("Auto" option in the field Substitution model) including FreeRate heterogeneity in the model selection process ("Yes [+R]" option in the field FreeRate heterogeneity). To assess nodal support, 1000 ultrafast (UFBoot) [80] bootstrap replicates were performed.

Results and Discussion
The mitogenomes of three B. carambolae specimens, two from Malaysia (M5 and M8) and one from Suriname (S2) were analyzed. In order to further substantiate the species characterization of the specimens, we performed a phylogenetic analysis based on COI + ND4 partial mitochondrial sequences from the B. dorsalis complex previously used in several studies [10,12,27] together with the ones generated in the present study (Supplementary Tables S1 and S3). The above analysis clustered the sequences of our specimens together with the other B. carambolae and separately from all B. dorsalis sequences (Supplementary Figure S1), which, although in a context of low nodal support, supports their identification as true representatives of B. carambolae.
The overlaps of seven nucleotides between ATP8 and ATP6 and ND4 and ND4L genes are the longest observed between protein-coding genes of B. carambolae. Overlaps restricted to one or two nucleotides can also be observed between ATP6 and COIII, ND3 and tRNA Ala , ND6 and CYTB and CYTB and tRNA Ser ( Table 1). Overlaps that are similar in size and position are common among tephritids [52,55,[58][59][60][61][62][63][64].

RNA Genes
The 16S rRNAs of B. carambolae M5 and S2 individuals consist of 1332 nucleotides (positions: 12,776-14,107 and 12,773-14,104, respectively), while that of the M8 appears to be one nucleotide shorter (positions: 12,773-14,103) (Table 1) (Table 1). In accordance with other insect mitogenomes, these genes are located in the L strand between the gene for tRNA Leu (CUA) and the control region, and are separated by the tRNA Val gene ( Table 1). The 22 tRNA genes, predicted to fold into the expected cloverleaf secondary structures, are dispersed among the protein-coding and the rRNA genes; 14 of them lie on the H and 8 on the L strand of the mtDNA (Table 1). Their positions and sizes (63-72 nucleotides) follow the typical organization for insect mtDNA.

Non-Coding Regions
Similarly, to all tephritids, the mitogenome of B. carambolae contains only one long non-coding region, i.e., the control region (D-loop), located between the 12S rRNA and the tRNA Ile genes (Figure 1). Its length was found to be 949 and 948 nucleotides and its A+T content was found to be 87.14% and 87.87% for M8 and S2 individuals, respectively (Table 1).
A stretch of 22 thymidines resides at the 5 end of the D-loop (near to the tRNA Ile gene), a feature that is common among tephritid and other insect mitogenomes, and is believed to play a role in the control of transcription and/or replication [54,59,60,81,90]. The 13 nucleotide long motifs TTTAATTTTTTAA and TTAATTTTATTAA were found to be tandemly repeated four times at the same position (D-loop position 212-262) of the M8 and S2 individuals, respectively. Tandem repeats have been identified in the control regions of Bactrocera as well as in other tephritid species [54,60,81].
The longest intergenic spacer (IGS) region in the analyzed B. carambolae mitogenomes was found between the tRNA Gln and tRNA Met genes with a size of 66 nucleotides for both M5 and M8 individuals and 67 nucleotides for S2 (Table 1). Although the position of the longest IGS seems to be conserved among several species of the Bactrocera subgenus [53,55,56,62,63], the sequence presents no significant similarity except within the B. dorsalis complex (the sequence identity between B. carambolae and B. dorsalis was about 97%). The second longest IGS is located between the tRNA Cys and tRNA Tyr genes and is 46 nucleotides long in all three B. carambolae specimens analyzed (Table 1). This IGS folded into secondary structures and its first 33 nucleotides could be found repeated in the D-loop region of B. carambolae, which is similar to other Bactrocera species as well as members of the B. dorsalis complex suggesting recombination events [53,81]. Yu et al. [53] reported an 11 bp insertion at the end of this spacer in a B. carambolae specimen and suggested that it could be used as a specific marker for species discrimination between B. carambolae and the other members of the complex. However, the above insertion was not observed in any of the three B. carambolae specimens analyzed. Furthermore, a short TA repeat making this IGS longer (53 compared to 46 bp) was also found in one of the B. dorsalis sequences generated in the present study (accession no KT343905). The above findings suggest that small insertions in the spacer lying between the tRNA Cys and tRNA Tyr genes are more likely to represent individual-or population-rather than species-specific polymorphisms.

Sequence Comparisons and Phylogenetic Analysis
The three B. carambolae mitogenomes analyzed here were compared to the complete mitochondrial sequences of B. carambolae (one) and B. dorsalis (six) found in GenBank and to the three additional sequences of the B. dorsalis complex generated in the present study (Supplementary Table S1). The identity scores obtained between the complete mitogenome sequences from B. carambolae and B. dorsalis ranged from 98.45% to 98.98% being imperceptibly lower than the ones observed among the B. dorsalis sequences (98.88-99.49%). The identity scores were extremely high even for the D-loop region, which is considered the most variable region of the mitogenome (95.68-98.52% between B. carambolae and B. dorsalis; 97.99-99.16% among B. dorsalis).
However, alignment of the above sequences revealed a small number (12) of positions that consistently differed between the B. carambolae and the B. dorsalis sequences ( Table 2). Almost all of the above polymorphisms were found within the PCG sequences and could be potential markers for discriminating the two very closely related taxa analyzed. Nonetheless, additional data at population level is required to assess whether these polymorphisms are fixed and species-specific. Table 2. The interspecies nucleotide polymorphisms observed in the complete mitochondrial sequences of B. dorsalis and B. carambolae used in the present study. Gene abbreviation as in Table 1 Position refers to nucleotide position within respective gene.

Gene
Position Nucleotide in B. dorsalis Nucleotide in B. carambolae The ML phylogenetic analysis with the complete Bactrocera mitochondrial sequences (six generated in the present study and 20 from GenBank) ( Supplementary Table S1) conducted with MEGA software (Figure 2) resulted in almost identical tree topology to the one inferred by the IQ-Tree ML algorithm (data not shown). Topologies were very similar to other recent analyses also using data of complete mitogenomes to explore phylogenetic relationships within the Bactrocera genus [54,58,63,82,83,91,92] and confirmed the very close relationship of the B. dorsalis complex members. Within the complex ( Figure 2B), the B. dorsalis sequences formed a highly supported clade, while all B. carambolae sequences, though not forming a single clade, were placed outside the B. dorsalis clade. The above results suggest the differentiation of the B. carambolae mitosequence and could provide some support to the retention of B. carambolae as a separate taxon from B. dorsalis [8]. However, additional data and analyses would be required to clarify the issue of species limits between the above two taxa.
In summary, the complete mitochondrial sequence of three B. carambolae specimens is presented. These are the first published B. carambolae mitogenomes described in detail, though not the first appearing in databases. The structure and the organization of the B. carambolae mitogenomes analyzed follow the typical pattern of an insect mitochondrion. The availability of several complete B. carambolae mitogenomes allowed, through sequence alignments against all available B. dorsalis mitogenomes, the identification of potentially species-specific nucleotide polymorphisms. Phylogenetic analyses within the Bactrocera genus supported the differentiation of B. carambolae in comparison to B. dorsalis. The future disposal of additional complete mitosequences from other members of the B. dorsalis complex could enable more extensive comparative analyses, to aim for a better resolution of their evolutionary relationships and for identification of the most informative polymorphic mitochondrial regions. Nevertheless, multidisciplinary approaches, combining mitochondrial and nuclear genetic information together with data on different aspects of species biology in the frame of Integrative Taxonomy, are considered necessary for reliable identification of species boundaries within this speciose complex of destructive pests.  Supplementary Table S1. In summary, the complete mitochondrial sequence of three B. carambolae specimens is presented. These are the first published B. carambolae mitogenomes described in detail, though not the first appearing in databases. The structure and the organization of the B. carambolae mitogenomes analyzed follow the typical pattern of an insect mitochondrion. The availability of several complete B. carambolae mitogenomes allowed, through sequence alignments against all available B. dorsalis mitogenomes, the identification of potentially species-specific nucleotide polymorphisms. Phylogenetic analyses within the Bactrocera genus supported the differentiation of B. carambolae in comparison to B. dorsalis. The future disposal of additional complete mitosequences from other members of the B. dorsalis complex could enable more extensive comparative analyses, to aim for a better resolution of their evolutionary relationships and for identification of the most informative polymorphic mitochondrial regions. Nevertheless,  Supplementary Table S1.

Sequences' accession numbers in
Supplementary Materials: The following are available online at http://www.mdpi.com/2075-4450/10/12/429/s1, Figure S1. Molecular phylogenetic analysis by Maximum Likelihood method based on concatenated partial sequences of the COI and ND4 genes of 38 B. carambolae and B. dorsalis specimens, Table S1. List of the mitogenome sequences used in the present study, Table S2. List of the primers used for the amplification of the mitogenomes of the Bactrocera carambolae and Bactrocera dorsalis specimens, Table S3. List of the GenBank accession numbers of the COI and ND4 partial sequences from B. dorsalis and B. carambolae used in the present study.