Hemipteran Mitochondrial Genomes: Features, Structures and Implications for Phylogeny

The study of Hemipteran mitochondrial genomes (mitogenomes) began with the Chagas disease vector, Triatoma dimidiata, in 2001. At present, 90 complete Hemipteran mitogenomes have been sequenced and annotated. This review examines the history of Hemipteran mitogenomes research and summarizes the main features of them including genome organization, nucleotide composition, protein-coding genes, tRNAs and rRNAs, and non-coding regions. Special attention is given to the comparative analysis of repeat regions. Gene rearrangements are an additional data type for a few families, and most mitogenomes are arranged in the same order to the proposed ancestral insect. We also discuss and provide insights on the phylogenetic analyses of a variety of taxonomic levels. This review is expected to further expand our understanding of research in this field and serve as a valuable reference resource.


Introduction
Entomologists first suggested that Hemiptera (true bugs) and Homoptera (planthoppers, leafhoppers, cicadas, spittlebugs, aphids, psyllids, scales, and whiteflies) are two orders according to OPEN ACCESS features of the wing [1]. In 1810, Latreille suggested combining Heteroptera and Homoptera as one order called Hemiptera (s.l.) [2]. The concept of Hemiptera (s.l.) has been widely accepted since 1969 to the present [3][4][5]; therefore, in this review, Hemiptera refers to Hemiptera (s.l.). As one major order of insects, Hemiptera is the largest group of the hemimetabolous insects [6], including more than 50,000 described species [7]. They are small sap-sucking insects with body-sizes from 1 mm (0.04 in) to approximately 15 cm (6 in).
There is great variety within the order Hemiptera, more commonly known as bugs. Hemipterans have evolved an extraordinary range of body forms and lifestyles: some live on land, some live in water, some feed on plants and others are voracious carnivores or scavengers. Therefore, many species of Hemiptera are significant pests of crops and gardens. Some, as many species of aphid, cause direct damage to crop hosts and often kill the entire plants. Additionally, some delphacids cause considerable damage to grain production and have been identified as one cause of rice famine in several Asian countries [8]. Moreover, many species of Hemiptera are vectors of viruses and diseases. For example, Triatoma dimidiata is the vector of Chagas disease, a predominantly chronic disease affecting millions of people [9].
Based on the history of Hemipteran phylogeny research, we propose two controversial questions. First, how many suborders does Hemiptera include? Traditionally, Hemiptera comprised three major groups (including four suborders): Sternorrhyncha (aphids, scale bugs, whiteflies, and psyllids), Auchenorrhyncha (planthoppers, leafhoppers, spittlebugs, and cicadas), and Heteroptera (true bugs, including Coleorrhyncha) [10]. Previous morphological studies suggested that Fulgoromorpha and Cicadomorpha formed Auchenorrhyncha, and that Auchenorrhyncha is more closely related to Coleorrhyncha and Sternorrhyncha than to Heteroptera [11]. However, additional molecular and morphological evidence has challenged the monophyly of Auchenorrhyncha (summarized by [12]). The second question, what are the relationships of these suborders that have confused entomologists for many years? Cobben suggested that both Heteroptera and Fulgoromorpha form the sister clade to (Sternorrhyncha, Cicadomorpha) according to a cladistic study of morphological traits [13]. Hamilton examined the phylogenetic affiliations using features of the head and mouthparts and suggested (Fulgoromorpha, (Sternorrhyncha, Cicadomorpha)) was the sister group to the clade (Coleorrhyncha, Heteroptera) [14]. However, it has been argued that Coleorrhyncha and Heteroptera do not have an immediate common ancestor and have descended independently from separate lineages [15]. Hence, the phylogenetic relationships among the higher-level hemipteran lineages remain unclear.
Since the first insect mitogenome was published in 1985 [16], there has been a rapid accumulation of sequenced insect genomes, with representatives from all orders now available [17]. Insect mitogenomes are small, double stranded, circular DNA molecules, ranging in size from 14 to 19 kb. The mitogenome is composed of thirty-seven genes (13 protein-coding, 22 transfer RNA, and 2 ribosomal RNA genes), and contains a control region (A + T-rich region) that is thought to play a role in the initiation of transcription and replication, and is a source of length variation in the genome [18]. Particularly, mitogenome sequences can provide even more genetic information and are increasingly being utilized in insect identification, biogeographic and phylogenetic studies [19][20][21].
Here, we utilize all the mitogenomes of Hemiptera to analyze their features on the genome level and summarize the rearrangement events for the first time. In addition, all available complete mitogenomes of Hemiptera were used to reconstruct and discuss the phylogeny relationships of this order.

Mitogenomes of Hemiptera
Triatoma dimidiata, the vector of Chagas disease, was the first published mitogenome of Hemipterain 2001 [9]. The sequencing history of hemipteran mitogenomes was shown (Figure 1a). There are two peaks during the past 14 years. Three years after the publication of the mitogenome of Triatoma dimidiata, Thao et al. [22] reported the complete nucleotide sequence of the mitogenomes of six species of whiteflies, one psyllid and one aphid from the suborder Sternorrhyncha. Four species of whiteflies had variations in gene order that were very different from the proposed insect ancestor (Drosophila yakuba) [16]. Subsequently, a number of studies have already proved that the rearrangements were more likely to happen in the mitogenomes of whiteflies than other insects of Hemiptera [22][23][24][25]. In 2008, Bu's group obtained 10 complete and five nearly complete mitogenomes of Heteroptera [23] and they reported the first comparative mitogenome analysis of one suborder of Hemiptera and the phylogenetic relationships of Heteroptera [23]. With the development of PCR technology and the use of next-generation sequencing strategies [26][27][28], many complete mitogenome sequences of Hemiptera have been obtained and more will be sequenced ( Figure 1a).   Table 1 summarize the mitogenomes of Hemiptera from the first report to the present. The total 90 complete mitogenomes can be divided into five parts by different suborders (Figure 1b) (according to the five suborder system) [12]. Heteroptera has the highest species richness of Hemiptera [29], and more than a half of the 90 complete mitogenomes are from this suborder. Coleorrhyncha, small bugs with a cryptic lifestyle, possess a mixture of cicadomorphan and bug-like characters [30], and represent a separate suborder within Hemiptera. This suborder includes a single extant family, Peloridiidae, which is currently distributed only in Patagonia and on the Australian continent [31]. Only two complete mitogenomes of Peloridiidae have been reported to date [12,32], as representatives of Coleorrhyncha. Meanwhile, the provenances and GenBank numbers of these mitogenomes were detailed set out ( Table 1). Most of them as the representatives of different taxa in Hemiptera were published for the first time [12,22,23,33]. Legend: "-" refer to direct submission; "*" refers to submitted the data and not a published paper.

Genome Organization
The mitogenome sizes of Hemiptera range from 14,371 bp (Nilaparvata muiri) to 18,414 bp (Trialeurodes vaporariorum) and have an average value of 15,733 bp ( Figure 2). The size changes of five suborders are also shown ( Figure 2). The size variation is mainly attributed to the non-coding regions, especially the control regions and repeat regions in some groups (such as the control regions of the true water bugs [23] and the repeat regions of aphids [62]). Most of the mitogenomes (76/90) resemble that of the known ancestral species (D. yakuba [16]) in structural organization and composition with 13 protein coding genes (PCGs), 22 transfer RNAs (tRNAs), and 2 ribosomal RNAs (rRNAs). The remaining mitogenomes differ only in the number of tRNAs, most likely due to gene deletion events. For example, Neomaskellia andropogonis (Sternorrhyncha) contains only 18 tRNAs [22].

Nucleotide Composition
The A%, T%, C% and G% values and the AT and GC skews were calculated for all available complete mitogenomes of Hemiptera species (Figure 3). Interestingly, the lowest and the highest A + T contents of the hemipteran mitogenomes were found in the suborder Sternorrhyncha (65.67% in Bemisia afer and 86.33% in Aleurodicus dugesii). Species from the suborders Fulgoromorpha, Coleorrhyncha and Heteroptera were all A and C skewed. This was also the case for the species of Cicadomorpha, except for Empoasca vitis. For the suborder Sternorrhyncha, nine species were A and C skewed, including all aphid species. This discovery of all aphid species forming a cluster is similar to the results of previous studies (cycle in Figure 3 [62,63]). In contrast, the seven other Sternorrhynchan species (whiteflies), which had highly rearranged gene orders [22,25], were G and T skewed.

Protein-Coding Genes
All PCGs in the majority of hemipteran mitogenomes were initiated with familiar triplet initiation codons (as shown in the invertebrate mitochondrial genetic code table), including the commonly used ATN and some special couplet codons. For instance, in the suborder Coleorrhyncha, cox1 starts with CGA in Xenophyes cascus and with TCG in Hackeriella veitchi [12]. Furthermore, the tetranucleotide initiation codons were also found in hemipteran mitogenomes; such as in Cydnidae where nad2 was supposed to be initiated with an atypical initiation codon, ATCA [23]. In fact, atypical initiation codons are not rare in other insects; for example, the tetranucleotide TTAG is the initiation codon for cox1 of Bombyx mori (Lepidoptera: Bombycidae) [66]. Most PCGs stopped with TAA/TAG termination codons or truncated termination codons (TA or T) that are presumed to be completed via posttranscriptional polyadenylation [67].
In view of the evolutionary forces acting on the mitochondrial PCGs of hemipteran species, the average rate of non-synonymous substitutions (Ka), the average rate of synonymous substitutions (Ks), the average ratio of Ka/Ks, and the Jukes-Cantor adjusted Ka/Ks (JKa/JKs) were calculated for each PCG, respectively [68]. The results showed that atp8 had the highest evolutionary rate, followed by nad2, while cox1 appeared to be the lowest ( Figure 4). Notably, the ratio of Ka/Ks for each PCG was below 1, indicating that these genes are evolving under purifying selection. The uniformly low values of the Ka/Ks and JKa/JKs ratios for cox1-3 and cob indicate strong evolutionary constraints in cytochrome c oxidase [69] and also suggest a strong purifying selection in the species of Hemiptera. Therefore, a DNA barcoding approach based on cox1 sequence diversity has been utilized for identification of closely related species [70]. Similarly, cob and cox2 with relatively slow rates may also be candidate barcoding markers [24,62]. By contrast, due to the highest divergence, atp8 and nad2 can be used as an effective molecular marker to analyze intraspecific relationships and reveal relationships between populations within the same hemipteran species. This result is highly consistent with previous findings in most metazoans [71].

tRNAs and rRNAs
All 22 tRNA coding genes usually were found in the mitogenomes of Hemiptera and the tRNAs were between 60 and 75 bp in length. The anticodon nucleotides for the corresponding tRNA genes are identical to those of other available arthropod mitogenomes [66,72]. All tRNA genes had the typical clover-leaf structure with one exception: trnS(AGN), in which the dihydrouridine arm formed a simple loop (as in some other metazoan species, including most insects [66,72,73]. The arrangements of both rrnL and rrnS in the hemipteran mitogenomes are commonly conserved, and are generally located between trnL(CUN) and trnV, and between trnV and the control region. The lengths of rrnL and rrnS are determined to be 1192-1260 and 711-766 bp, respectively. These lengths are similar to those of other orders of Insecta [16,66,72,73].

Non-Coding Regions
There are some non-coding (NC) regions interspersed throughout the hemipteran mitogenomes, thus the mitogenomes of Hemiptera displayed a moderate size variation. Four distinct large NC regions were identified in the following gene pairs of hemipteran mitogenomes: trnI-trnQ, trnS-nad1, trnE-trnF and rrnS-trnI. The region located between rrnS and trnI, coincided with the A + T-rich region, also called the control region, including the origin of replication and promoters for transcription initiation [16,74,75]. Tandem repeats were detected in the remaining three regions, and named repeat regions.

Control Region
Most control regions of hemipteran mitogenomes were longer than 1 kb, with high rates of nucleotide substitution and indels, and a variable number of tandem repeats. Generally, one control region of the hemipteran mitogenome includes four parts without order: tandem repeat sequences, sequences of poly(T) stretch, a subregion with high A + T content, and stem-loop structures (for example, Chauliops fallax Figure 5a). This feature of the control region was summarized by Cook for arthropods [76]. There are some interesting exceptions in the hemipteran mitogenomes. For example, in some species of Cicadomorpha (Philaenus spumarius) [34], Fulgoromorpha (Geisha distinctissima, Sivaloka damnosa, Laodelphax striatella and Laodelphax striatellus) [27,38,39,41] and Heteroptera (Alloeorhynchus bakeri) [51], two fragments of tandem repeat sequences insert into the control region separately (for example, Philaenus spumarius Figure 5b). A few of the control regions of hemipteran species did not contain all four parts (for example, Schizaphis graminum Figure 5c) [22]. The conserved sequences, stem-loop structures and tandem repeat sequences found in the present study can provide useful information for research of the phylogeny of specific groups [34,35,45,47,62]. For example, in the systematic research of Aphidinae, the phylogenetic tree based on PCGs is similar to the clusters of the stem-loop structures [62]. Another interesting question is how functionality is retained under such great variations in both length and sequence. Considering the high nucleotide substitution rate, both the secondary structures and the conserved segments might be key clues in determining the function of the control region.

Repeat Region
In general, the NC regions of an insect mitogenome consist of a control region and short intergenic spacers. However, some special species of Hemiptera include one repeat region ( Figure 6). These repeat regions mainly are located into different positions (trnE-trnF, trnI-trnQ and trnS-nad1) in three families (Aphididae, Nabidae and Reduviidae), and differ in repeat unit sequence and copy number, suggesting that they are highly species-specific (Table 2). These repeat regions are not similar to any known sequences in GenBank. We speculate that this region, full of tandem repeats, has a function similar to the intergenic spacer in Apis mellifera that is thought to be another origin of replication [77].

Phylogenetic Inferences by Hemipteran Mitogenomes
As mentioned in the introduction, the phylogenic relationship of the Hemiptera has been controversial for many years and two questions remain unanswered. Here, we reviewed the research history of hemipteran phylogenetic relationships based on mitogenomes and combine our phylogenetic analyses to discuss the most reliable results. In 2009, a study clarified the relationships of the three phylogenetically controversial suborders, Auchenorrhyncha, Sternorrhyncha, and Heteroptera [53]. Heteroptera constituted a monophyletic group, and a sister relationship was proposed for Auchenorrhyncha and Sternorrhyncha [53]. However, only one species (Cicadomorpha: Philaenus spumarius) was chosen representing Auchenorrhyncha, and no taxa of Fulgoromorpha were discussed. Therefore, in the same year, Song and Liang [38] increased the samplings of taxa and proposed the inferred genealogical proximities of hemipteran lineages of (Heteroptera + (Cicadomorpha + (Fulgoromorpha + Sternorrhyncha))). In their research, Auchenorrhyncha was clearly separated into two parts, and Fulgoromorpha and Cicadomorpha were not a monophyletic group [38]. In fact, in their reports (in 2010 and 2012), the paraphyly of Auchenorrhyncha was also supported [40,41], and their phylogenetic reconstruction supported a sister relationship between Fulgoromorpha and Sternorrhyncha [40]. The suborder Coleorrhyncha (Hemiptera) has only one extant family, Peloridiidae, comprising 36 species in 17 genera [79]. Species of this group live in the wet mosses of South America (Chile, Argentina), New Zealand, New Caledonia and eastern Australia (from North Queensland to Tasmania) [80]. Complete or nearly complete mitogenomes of Peloridiidae were not obtained until 2013 [12]. Cui's research was the first phylogenomic study of hemipterans with complete suborder samplings. Their results supported the paraphyly of Auchenorrhyncha and proposed the close relationship between Cicadomorpha and Heteroptera [12]. Meanwhile, our result displayed the similar result ( Figure 7): Sternorrhyncha located as the basal suborder and Cicadomorpha and Heteroptera clustered as sister-group. Summarizing all these viewpoints, we can make three conclusions. First, the phylogenetic relationships among suborder-level hemipteran linages remain unclear by using mitogenome inference. Most viewpoints supported that Auchenorrhyncha is not a monophyletic group; Second, whether a monophyletic group or a sister-group to Cicadomorpha, the suborder Heteroptera is the most evolved group of Hemiptera; Third, the positions of other suborders remain confused and require further investigation. Under the suborder taxa, the phylogenetic research also was involved. We summarized all the research results of these years (Table 4). All the phylogenetic issues of every taxonomic category were considered. For example, in the relationships among the intraorders of Heteroptera, Enicocephalomorpha was the most basal sister-group of the majority of Heteroptera [45]; the position of Cimicomorpha was unclear and it is possible that it is not a monophyletic group [45,47]; and Pentatomomorpha was the most evolved group of Heteroptera [45,47,49,54,60]. Regarding interfamily relationships, Hua et al. conducted phylogenomic studies on the mitogenomes of Pentatomomorpha [23] and Nepomorpha [19], and resolved some superfamily phylogenetic problems (Table 4). In Sternorrhyncha, the mitochondrial gene rearrangements among whiteflies corresponded to the phylogenetic tree [22]. The intrasubfamily relationships of Aphids (a group with special regions [33,63]), were also discussed [62] (Table 4). the sister relationships within Nepomorpha and Gerromorpha. [49] In conclusion, the present study shows that mitogenomes may be good molecular markers for phylogenetic inference at a variety of taxonomic levels of Hemiptera (such as suborders, intraorders and families). However, some relationships have not been resolved based solely on mitogenomes. Nuclear genes evolve more slowly, and are effective for the analysis of deeper phylogenetic relationships. Moreover, some endosymbionts co-evolve with their hosts, and symbiont-derived data, in principle, could be used to reconstruct the evolutionary history of hosts [81]. So, with the development of sequencing technology, more available genetic resources are expected to provide more effective information of phylogenetic trees.

Analysis of Sequence Data
The nucleotide sequences of PCGs were translated based on the invertebrate mtDNA genetic code. A + T content were calculated using MEGA version 6.0 [82]. Strand asymmetry was calculated using the formulae AT skew = [A − T]/[A + T] and GC skew = [G − C]/[G + C], for the strand encoding the majority of the protein-coding genes. The software packages DnaSP 5.0 [83] was used to calculate the synonymous substitution rate (Ks) and the nonsynonymous substitution rate (Ka) for each PCG as well as Jukes-Cantor adjusted Ka/Ks (JKa/JKs).

Phylogenetic Analysis
Each of the 13 PCGs of all 92 species were aligned individually using MEGA v6.0 [82] with default parameters. Before alignments, the stop codons were all removed from those sequences. Maximum likelihood (ML) and Bayesian inference (BI) analyses were implemented by PHYML 3.0 [84] and MrBayes version 3.1.2 [85], respectively. Model selection was based on jModeltest v0.1.1 [86]. According to the AIC, the GTR + I + G model was optimal for analysis with nucleotide alignments. MrBayes version 3.1.2 and PHYML were employed to reconstruct the phylogenetic trees. In the ML analysis, the parameters were estimated during analysis and the node support values were assessed by bootstrap re-sampling (BP) calculated using 100 replicates. In Bayesian inference, runs of ten million generations were conducted. Trees were sampled every 1000 generations with a burn-in of 25%.

Conclusions and Perspectives
Generally, the complete mitogenomes of Hemiptera were 14-17K bp in size and encoded all 37 genes typical for insects. These genes were arranged in the same order as the proposed ancestral insect, except in a few particular species. Notably, the mitogenomes of three families possessed a large repeat region located at three different positions. We speculate that this region, full of tandem repeats, is another origin of replication. The mitogenomes have been successfully used to reconstruct the phylogenetic relationships within a variety of taxonomic levels of Hemiptera.
Future work should focus on four goals. First, the comparative genomics of different categories need more taxon samplings and more mitogenome sequences to further describe the comprehensive characteristics of Hemiptera mitogenomes; Second, the research of various populations and phylogeographic structures of hemipteran species based on mitogenomes require more mitogenome sequences about the same species or similar species; Third, the functional and evolutionary significance of different rearrangement types should be examined to open the view of the evolutionary dynamics of Hemiptera mitogenomes; Finally, phylogenetic inference with more resource data will provide greater insight into the evolution of Hemiptera.