Characterization and Phylogenetic Implications of the Complete Mitochondrial Genome of Syrphidae

In this study, the complete mitochondrial genomes (mitogenomes) of two hoverfly species of Korinchia angustiabdomena (Huo, Ren, and Zheng) and Volucella nigricans Coquillett (Diptera: Syrphidae) were determined and analyzed. The circular mitogenomes were 16,473 bp in K. angustiabdomena (GenBank No. MK870078) and 15,724 bp in V. nigricans (GenBank No. MK870079). Two newly sequenced mitogenomes both contained 37 genes, and the gene order was similar with other syrphine species. All the protein-coding genes (PCGs) were started with the standard ATN codons; and most of PCGs were terminated with a TAA stop codon, while ND1 in K. angustiabdomena ended with a TAG codon, and ND5 terminated with truncated T stop codons in both species. The phylogenetic relationship between K. angustiabdomena and V. nigricans with related lineages was reconstructed using Bayesian inference and Maximum-likelihood analyses. The monophyly of each family considered within Muscomorpha was confirmed by the clades in the phylogenetic tree, and superfamily of the Oestroidea (Calliphoridae, Sarcophagidae, and Oestridae) was unexpectedly found to be a paraphyletic group based on our selected data. This mitogenome information for K. angustiabdomena and V. nigricans could facilitate future studies of evolutionarily related insects.


Introduction
Syrphidae, true flies, is one group in the very large insect order Diptera. Syrphidae are well known by the common names flower flies or hoverflies. They can be found hovering around flowers and are often mistaken for bees or wasps due to their black and yellow coloring. Syrphidae can be recognized from its vena spuria on the wing. It is cosmopolitan and includes more than 6,000 species known around the world [1][2][3].
While the adult Syrphidae typically visits flowers, their larvae are often polyphagous in a very diverse habitat. The larvae of subfamily Syrphinae are largely predaceous usually on soft-bodied Hemiptera [4], typically aphids. There are also some non-predaceous Syrphinae that develop as miners in plants [4][5][6]; Eristalinae are generally saprophagous in dead wood; coprophagous, phytophagous, aquatic filter feeders, or inquiline in social insect nests [2,7]; Microdontinae inquiline with ants in nests.
In general, Syrphidae were always classified into three groups (Microdontinae, Eristalinae, and Syrphinae) based on adult morphology [2,8]. There is some controversy over the classification of Pipizinae, which share a larval feeding mode with Syrphinae but some external morphological characters with the Eristalinae. Recently, Mengual et al. proposed that Pipizinae be upgraded to the level of subfamily based on morphological and genomic data [4].
Recent studies based on integrated datasets almost unanimously agree that the Microdontinae was a sister group to the remaining syrphines; Pipizinae and Syrphinae formed one clade, and Eristalinae

Sample Collection and DNA Extraction
Syrphine specimens used in this study were collected by sweep net and immediately preserved in absolute ethyl alcohol for the next experiment. Their collection information is provided in Table S1. Voucher specimens were deposited in the Institute of Entomology, Guizhou University in Guiyang, China. The samples were washed twice by vortexing them in absolute ethanol, and then, they were dried at room temperature before DNA extraction. Genomic DNA was extracted using a DNeasy © Tissue Kit (Qiagen, Hilden, Germany). Specimens were incubated at 56 • C for 6 h to lyse cells completely, and the total genomic DNA was eluted with 100 µL of double-distilled water (ddH 2 O), while the remaining steps were conducted in accordance with the manufacturer's protocol. The genomic DNA concentration was quantified by a system of NanoDrop 1000 and then stored at −20 • C.

PCR Amplification, Cloning, and Sequencing
Mitogenomes were sequenced by next-generation sequencing (Illumina HiSeq 2500, paired-end strategy and 2 Gb raw data; Berry Genomic, Beijing, China) and PCR amplification. Two sequence fragments of COX1 (700 bp) and 12S rRNA (550 bp) were amplified by PCR amplification as "reference sequences" using universal primers (Table 1). PCR amplifications were conducted using PCR MasterMix (Tiangen Biotech Co., Ltd., Beijing, China) according to the specification manual. The PCR cycling conditions comprised a pre-denaturation at 94 • C for 3 min and 30 cycles of denaturation at 94 • C for 30 s, annealing at a suitable temperature for 30 s and elongation at 70 • C for 1 min and an additional elongation step at 70 • C for 10 min at the end of all cycles. The sequencing results obtained from PCR amplification and TA cloning were assembled using the SeqMan program package (DNAStar Inc.; Madison, WI, USA) ( Figure S1). Clean next-generation sequencing results were assembled using Geneious R9 [24] based on the COX1 and 12S fragment of mitochondrial DNA, A total of

Mitochondrial Genome Annotation
The mitochondrial genome (mitogenome) was initially annotated using the MITOS web server [27]. The base composition was analyzed with MEGA 6.0 [28], and PCGs were identified in GenBank [29]. The locations and secondary structures of 22 tRNA genes were determined using tRNA scan-SE version 1.21 [30] and ARWEN version 1.2 [31]. The rRNA genes were determined based on the locations of adjacent tRNA genes and by comparisons with other Syrphoidea. DNASIS version 2.5 (Hitachi Engineering, Tokyo, Japan) and RNA Structure version 5.2 [32] were used to predict helical elements in variable regions. Strand asymmetry was calculated using the formula: AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C) [33].

Sequence Alignment and Phylogenetic Analysis
A total of 42 insect species were used in the phylogenetic analysis, including 40 ingroup species and 2 outgroup species (Cydistomyia duplonotata and Trichophthalma punctate [20]). The ingroups included 11 families all which have representative mitogenome sequences in Diptera (Table S2). The 13 PCG sequences without stop codons were used in the phylogenetic analysis. Each PCG was aligned individually with codon-based multiple alignments using the MAFFT algorithm in the Translator X online server [34] with gaps and ambiguous sites removed from the protein alignment before back-translating to nucleotides using Gblocks under the default settings. Next, all alignments were checked and corrected manually in MEGA 6.0 [28].
Three datasets were generated: (1) amino acid sequences of 13 PCGs with 3,681 amino acids (AA); (2) 13 PCGs and 2 rRNAs with 13,091 nucleotides (PCGRNA); and (3) the first and second codon positions of the 13 PCGs and 2 rRNAs with 9,410 nucleotides. The optimal partition scheme for each dataset and the best model for each partition were selected under the corrected Bayesian Information Criterion using Partition Finder 2 (Table S3) [35]; maximum-likelihood (ML) phylogenetic trees were constructed with the IQ-TREE using an ultrafast bootstrap approximation approach with 10,000 replicates [36]. Bayesian analyses were carried out with the site-heterogeneous model CAT + GTR implemented in PhyloBayes (PB) MPI on XSEDE [37,38].

Genome Organization and Base Composition
In this study, two complete mitochondrial genomes (mitogenomes) of Syrphidae were sequenced and annotated for the first time ( Figure 1). Each newly sequenced mitogenome is circular and double stranded, containing 37 mitochondrial genes (13 PCGs, 22 tRNAs and 2 rRNAs) and one control region (Tables S4 and S5). The sizes of the two mitogenomes were 16,473 bp in K. angustiabdomena (GenBank No. MK870078) and 15,724 bp in V. nigricans (GenBank No. MK870079), respectively. Thus, Syrphidae mitogenomes range from 15,326 bp (E. corollae; NC_036482 [14]) to 16,473 bp (K. angustiabdomena, MK870078, this study). Within syrphine mitogenomes, length variation is limited in the PCGs and RNAs, but there is remarkable variation in the size of the control region ( Figure 2). The patterns of mitogenome genes in the newly sequenced species are the same as those found in all previously sequenced Syrphidae, as well as of the inferred most insect mitogenome order [16]. A total of 22 genes (9 PCGs and 13 tRNAs) were encoded on the majority strand (J-strand), whereas the remaining 15 genes (4 PCGs, 9 tRNAs, and 2 rRNAs) were located on the minority strand (N strand) ( Figure 1, Tables S4 and S5).
control region ( Figure 2). The patterns of mitogenome genes in the newly sequenced species are the same as those found in all previously sequenced Syrphidae, as well as of the inferred most insect mitogenome order [16]. A total of 22 genes (9 PCGs and 13 tRNAs) were encoded on the majority strand (J-strand), whereas the remaining 15 genes (4 PCGs, 9 tRNAs, and 2 rRNAs) were located on the minority strand (N strand) (Figure 1, Tables S4 and S5).
The nucleotide composition of the seven sequenced mitogenome sequences showed biases toward A and T, with the overall A + T content of the mitogenomes ranging from 79.9% (V. nigricans) to 80.9% (S. grandicornis [20]). The A + T content of the control region (mean value = 93.18%) was always higher than in other regions, while PCGs showed the lowest A + T content values (mean value = 78.44%). All mitogenomes showed positive AT skews (0.00 in S. grandicornis and E. balteatus to 0.05 in V. nigricans) and negative GC skews (−0.21 in V. nigricans to −0.13 in E. balteatus and E. corollae).

Protein-Coding Genes and Codon Usage
The length of PCGs are 11,188 bp in K. angustiabdomena and 11,170 bp in V. nigricans respectively, their locations and directions of 13 PCGs are similar to other syrphine. The overall A + T content of the 13 PCGs in the seven species was between 77.6% (K. angustiabdomena) and 79% (O. sativus [23]). The AT skews were slightly negative from −0.15 (K. angustiabdomena) to −0.12 (V. nigricans); the AT skews of the other five species were −0.14. The GC skews were slightly positive from 0.01 (V. nigricans) to 0.05 (O. sativus), expect in K. angustiabdomena (−0.01) ( Table 2). mitogenome order [16]. A total of 22 genes (9 PCGs and 13 tRNAs) were encoded on the majority strand (J-strand), whereas the remaining 15 genes (4 PCGs, 9 tRNAs, and 2 rRNAs) were located on the minority strand (N strand) (Figure 1, Tables S4 and S5).
The nucleotide composition of the seven sequenced mitogenome sequences showed biases toward A and T, with the overall A + T content of the mitogenomes ranging from 79.9% (V. nigricans) to 80.9% (S. grandicornis [20]). The A + T content of the control region (mean value = 93.18%) was always higher than in other regions, while PCGs showed the lowest A + T content values (mean value = 78.44%). All mitogenomes showed positive AT skews (0.00 in S. grandicornis and E. balteatus to 0.05 in V. nigricans) and negative GC skews (−0.21 in V. nigricans to −0.13 in E. balteatus and E. corollae).

Protein-Coding Genes and Codon Usage
The length of PCGs are 11,188 bp in K. angustiabdomena and 11,170 bp in V. nigricans respectively, their locations and directions of 13 PCGs are similar to other syrphine. The overall A + T content of the 13 PCGs in the seven species was between 77.6% (K. angustiabdomena) and 79% (O. sativus [23]). The AT skews were slightly negative from −0.15 (K. angustiabdomena) to −0.12 (V. nigricans); the AT skews of the other five species were −0.14. The GC skews were slightly positive from 0.01 (V. nigricans) to 0.05 (O. sativus), expect in K. angustiabdomena (−0.01) ( Table 2). The nucleotide composition of the seven sequenced mitogenome sequences showed biases toward A and T, with the overall A + T content of the mitogenomes ranging from 79.9% (V. nigricans) to 80.9% (S. grandicornis [20]). The A + T content of the control region (mean value = 93.18%) was always higher than in other regions, while PCGs showed the lowest A + T content values (mean value = 78.44%). All mitogenomes showed positive AT skews (0.00 in S. grandicornis and E. balteatus to 0.05 in V. nigricans) and negative GC skews (−0.21 in V. nigricans to −0.13 in E. balteatus and E. corollae).

Protein-Coding Genes and Codon Usage
The length of PCGs are 11,188 bp in K. angustiabdomena and 11,170 bp in V. nigricans respectively, their locations and directions of 13 PCGs are similar to other syrphine. The overall A + T content of the 13 PCGs in the seven species was between 77.6% (K. angustiabdomena) and 79% (O. sativus [23]). The AT skews were slightly negative from −0.15 (K. angustiabdomena) to −0.12 (V. nigricans); the AT skews of the other five species were −0.14. The GC skews were slightly positive from 0.01 (V. nigricans) to 0.05 (O. sativus), expect in K. angustiabdomena (−0.01) ( Table 2). In the two newly sequenced mitogenomes, all the PCGs started with the standard ATN codons. Most PCGs terminated with a TAA stop codon, while ND1 in K. angustiabdomena ended with a TAG codon and ND5 terminated with truncated T stop codons in both species. Comparing with other Syrphidae mitogenomes, most PCGs use canonical start codons. E. tenax is excluded with ND4L began with TTG and ND5 terminated with truncated T stop codon, while all other PCGs started with the standard ATN codons and terminated with the TAN codon.
A + T bias was also reflected in the relative codon usage by the PCGs. After excluding the stop codons, the relative synonymous codon usage (RSCU) was calculated and is summarized in Figure 3. We determined the behavior of the codon families in the PCGs (Figure 4), which showed that the codon usage was very similar in the Syrphidae, where the four most frequently used codons were Leu (598 in E. balteatus to 607 in K. angustiabdomena), Ile (352 in E. balteatus to 374 in K. angustiabdomena), Met (254 in K. angustiabdomena and E. tenax to 291 in E. corollae) and Phe (332 in V. nigricans and E. corollae to 342 in K. angustiabdomena).

tRNAs and rRNAs
All tRNA sequences in the two newly sequenced Syrphidae mitogenomes were determined using tRNA scan-SE or ARWEN. Most tRNAs could be folded into the typical clover-leaf structure ( Figure 5), while tRNA-Ser (AGN) lacked a DHU arm, as has been observed in other metazoan mitogenomes [39]. The combined length of all tRNAs was 1471 bp in K. angustiabdomena and 1475 bp in V. nigricans, which are medium sized when compared with the mitogenomes of other Syrphidae for which total tRNA size ranges from 1471 bp to 1479 bp (E. corollae and E. tenax [14]). Besides the classic A-U and C-G pairs in the secondary structure, there are 18 and 15 base pairings in K. angustiabdomena and V. nigricans, respectively. Four and nine other mismatched base pairs (U-U and C-U) were also founded in the arm.
There were two rRNA genes, a 1338-bp 16S rRNA gene and a 787 bp/186 bp 12S rRNA genes in K. angustiabdomena and V. nigricans, respectively. Among the seven Syrphidae mitogenomes, the length of the 16S rRNA genes range from 1,314 bp (O. sativus [23]) to 1,340 bp (E. tenax), and the length of 12S rRNAs are 778 bp (O. sativus [23]) to 804 bp (S. grandicornis and E. balteatus [14,20]), with mean A + T contents of 84.5% and 83%, respectively (Table S6). Both rRNA genes were located on the N strand. Unlike PCGs with functional annotation features, it is difficult to determine rRNA gene boundaries [40,41]. Therefore, the boundaries of flanking genes were used by assuming no overlapping or gaps between adjacent genes, as in the inferred insect mitogenome pattern. The 16S rRNA subunit was located between tRNA-L2 (CUN) and tRNA-V, while the 12S rRNA gene was between tRNA-V and the control region.
codons, the relative synonymous codon usage (RSCU) was calculated and is summarized in Figure  3. We determined the behavior of the codon families in the PCGs (Figure 4), which showed that the codon usage was very similar in the Syrphidae, where the four most frequently used codons were Leu (598 in E. balteatus to 607 in K. angustiabdomena), Ile (352 in E. balteatus to 374 in K. angustiabdomena), Met (254 in K. angustiabdomena and E. tenax to 291 in E. corollae) and Phe (332 in V. nigricans and E. corollae to 342 in K. angustiabdomena).    Genes 2019, 10, x FOR PEER REVIEW 9 of 13 [14], but it is very necessary to further examine if more abundant molecular data (more species, more mitogenomes, or more longer sequences, etc.) increased. The phylogenetic relationship of Muscomorpha showed that (((((Oestroidea + Muscoidea) + Ephydroidea) + (Tephritoidea + Sciomyzoidea)) + Locuxanioidea) + Syphoidea) + Platypezoidea.

Non-Coding Region
Both newly sequenced mitogenomes had gene overlap, and each single overlap ranged from 1 to 9 bp. K. angustiabdomena had a total of 34 bp in overlaps between nine gene junctions, while V. nigricans had 38 bp overlaps between 11 gene junctions (Tables S4 and S5). Excluding the control region, there were 19 and 13 intergenic spacers of total 179 and 143 bp non-coding bases in K. angustiabdomena and V. nigricans, respectively. The longest intergenic spacers in the mitogenomes are between tRNA-E and tRNA-F (20 bp) in K. angustiabdomena and between tRNA-Y and COX1 (34 bp) in V. nigricans, respectively.
The putative control region between 12S rRNA and tRNA-I was the most variable region in the whole mitogenome. In K. angustiabdomena, the full control region measured 1526 bp in length with an A + T content of 94.4%, and a 119 bp repeat unit repeated four times within control region. In V. nigricans, the control region was 843 bp in length with an A + T content of 94.4%, and a 74 bp repeat unit repeated twice.

Phylogenetic Relationship
The phylogeny of Syrphidae and relative groups in the present study was reconstructed based on the three above datasets containing 40 Muscomorpha species and 2 outgroup species using the methods of Maximum likelihood (ML) and PhyloBayes (PB), totally six trees with strict similarity of topologies ( Figure 6) were generated. Comparing with these six trees, the bootstrap probabilities and Bayesian posterior probabilities based on the dataset of PCG12RNA are higher than the other two datasets.

Conclusions
Consistent with previous observations of Syrphidae species, the mitogenome sequences of K. angustiabdomena and V. nigricans were highly conserved in gene order, gene content, gene size, base composition, codon usage of PCGs, and tRNA secondary structures. Variation in the length of complete mitogenomes is mostly due to the length of the control region, which ranges from 320 bp (E. corollae [14]) to 1526 bp (K. angustiabdomena, this study).

Conclusions
Consistent with previous observations of Syrphidae species, the mitogenome sequences of K. angustiabdomena and V. nigricans were highly conserved in gene order, gene content, gene size, base composition, codon usage of PCGs, and tRNA secondary structures. Variation in the length of complete mitogenomes is mostly due to the length of the control region, which ranges from 320 bp (E. corollae [14]) to 1526 bp (K. angustiabdomena, this study).
Author Contributions: H.L. conceived and designed the study, and wrote the entire manuscript. The author read and approved the final manuscript.

Conflicts of Interest:
The author declares no conflict of interest.