New Mitogenomes of the Polypedilum Generic Complex (Diptera: Chironomidae): Characterization and Phylogenetic Implications

Simple Summary Many species from the Polypedilum generic complex are renowned for their roles in aquatic ecosystems. This genus complex, as one of the most species-rich groups, has been persistently contentious. The current lack of sequence data for the Polypedilum complex limits our comprehension of their species evolution. Here, 14 mitogenomes of the Polypedilum generic complex were sequenced. Combined with three recently published mitochondrial genomes, we analyzed the features of the mitogenomes among genera. Furthermore, the phylogenetic analysis among genera within the Polypedilum complex showed that the Endochironomus + Synendotendipes is the sister group of Phaenopsectra + Sergentia. Our study provides a vital foundation for further study on the evolutionary biology of Chironomidae. Abstract Mitochondrial genomics, as a useful marker for phylogenetics and systematics of organisms, are important for molecular biology studies. The phylogenetic relationships of the Polypedilum generic complex remains controversial, due to lack taxonomy and molecular information. In this study, we newly sequenced mitogenomes of 14 species of the Polypedilum generic complex. Coupled with three recently published sequences, we analyzed the nucleotide composition, sequence length, and evolutionary rate of this generic complex. The control region showed the highest AT content. The evolution rate of protein coding genes was as follows: ATP8 > ND6 > ND5 > ND3 > ND2 > ND4L > ND4 > COX1 > ND1 > CYTB > APT6 > COX2 > COX3. We reconstructed the phylogenetic relationships among the genera within the Polypedilum generic complex based on 19 mitochondrial genomes (seventeen ingroups and two outgroups), using Bayesian Inference (BI) and Maximum Likelihood (ML) methods for all databases. Phylogenetic analysis of 19 mitochondrial genomes demonstrated that the Endochironomus + Synendotendipes was sister to Phaenopsectra + Sergentia.


Introduction
Mitochondrial genomes are important molecular markers that have usually been used for studies on phylogeny, evolutionary history, speciation and phylogeography in insect groups [1][2][3], benefit by the maternal inheritance, high substitution, and easy availability [4,5]. Mitogenomes are known as the second genetic information system, because of its special genetic characteristics [6]. The mitochondrial genome length of insects ranged from 14,000 to 20,000 bp mostly [1], containing two ribosomal RNAs (rRNAs), thirteen of the Polypedilum generic group, including Endochironomus, Phaenopsectra, Polypedilum, Sergentia, Stictochironomus, and Synendotendipes (detailed information showed in Table 1 and Table S1). All samples were immersed in 85% to 95% ethanol at −20 • C before DNA extraction and morphological examination. Specimen identifications were made by X.L.L. All vouchers were deposited in the College of Fisheries and Life, Shanghai Ocean University, Shanghai, China.
Whole genome DNA was extracted with Qiagen DNeasy Blood & Tissue Kit, and the concentration of the DNA was measured with a Qubit ® DNA Assay Kit in Qubit ® 2.0 Flurometer (ThermoFisher, USA). All whole genomes were sent to company for sequencing (Berry Genomics, Beijing, China). Truseq Nano DNA HT sample preparation Kit (Illumina, USA) was used to generate sequencing libraries. Raw reads were sequenced on the Illumina NovaSeq 6000 platform with 150 bp paired-end reads and were generated with an insert size around 350 bp. Adapters, short, and low-quality reads of raw data were removed using Trimmomatic v0.32 (Jülich, Germany) [26].

Assembly, Annotation and Composition Analyses
To ensure accuracy, we used two methods for de novo assembly: (1) NOVOPlasty v3.8.3 (Brussel, Belgium) [27] was implemented for mitogenome assembly with COI gene of Polypedilum heberti (GenBank accession: MK505566) as seed sequences and k-mer sizes of 23-39 bp; and (2) IDBA-UD v1.1.3 (Boston, MA, USA) [28] was used to assemble with parameter "-mink 40 -maxk 120". Geneious 2020.2.1 [29] was used to compare mitogenome sequences which were obtained by the above two methods, and combine them into a single sequence. The secondary structure of tRNAs was implemented in tRNAscan SE 2.0 [30] and MITOS WebServer. The rRNAs and PCGs were annotated manually with the Polypedilum vanderplanki (GenBank accession: KT251040) as a reference using Clustal Omega in Geneious. Clustal W function in MEGA 7 was used to proofread the boundaries of rRNAs and PCGs [31]. Bias of the nucleotide composition and nucleotide composition of each gene was performed via SeqKit v0. 16.0 (Chongqing, China) [32].

Phylogenetic Analyses
A total of 2 rRNAs and 13 PCGs genes of 19 mitogenomes were used for phylogenetic analyses. Nucleotide and protein sequences were aligned via MAFFT v7.450 (Osaka, Japan) [34] with the L-INS-I method. Trimming was performed by Trimal v1.4.1 (Barcelona, Spain) [35] with "-automated1" strategy, and then we concatenated five matrices via FASconCAT-G v1.04 (Santa Cruz, CA, USA) [36] for phylogeny analysis: (1) cds_faa matrix contained all PCGs amino acid reads; (2) cds_fna matrix included all PCGs nucleotide reads; (3) cds_rrna matrix included all PCGs and two rRNA nucleotide reads; (4) cds12_fna matrix contained all PCGs nucleotide reads except the third codon positions; and; (5) cds12_rrna matrix contained PCGs nucleotide reads which removed the third codon positions and two rRNA gene. AliGROOVE v1.06 (Bonn, Germany) [37] was used to calculate the heterogeneity among matrices. For all matrices, we used ML and BI approaches to infer the phylogenetic relationships of the Polypedilum generic complex. For the ML analysis, we used ModelFinder [38] to select the best-fitting substitution models implemented in IQ-TREE 2 (Canberra, ACT, Australia) [39]. The posterior mean site frequency (PMSF) [40] model was used to minimize long-branch attraction artifacts for matrix cds_faa, with the command '−m − mtART + C60 + FO + R' in IQ-TREE. Phylobayes-MPI (Montréal, QC, Canada) [41] was implemented to generate the BI tree, with the site-heterogeneous mixture model (−m CAT + GTR). We performed two Markov chain Monte Carlo chains (MCMC) with 10,000,000 generations, and stopped when we achieved satisfactory convergence (maxdiff < 0.3). A total of 25% initial trees of each run were discarded as burn-in, and we generated a consensus tree using the remaining trees combined. iTOL, an available online website, was used for tree beautified (https://itol.embl.de/upload.cgi, accessed on 15 December 2022).

Mitogenomic Organization
We sequenced about three Gb raw reads for each sample. A total of 14 mitogenomes of Chironomidae were obtained in this study, of which five were complete mitogenomes and nine were linear mitogenomes, and all of them were submitted in GenBank with accession number OP950216-OP950228, OK513041 (Table 2 and Table S1). The whole length of newly obtained sequences ranged from 15,582 (Polypedilum masudai) to 17,810 bp (Sergentia baueri), the variation in which was mainly caused by the unstable size of the CR ( Table 2). All newly assembled mitogenomes contained the typeset of one CR and 37 genes which included 13 PCGs, 22 tRNAs, and two rRNAs ( Figure 1). Most of the newly assembled mitogenomes were similar to those previously published for Chironomidae in length [3,[21][22][23][24]. Sequence features of the represented species are given in Figure 1.

Protein-Coding Genes, Codon Usage, and Evolutionary Rates
There was no remarkable difference in the size of tRNA, PCGs, and rRNAs among each species. A total of 13 PCGs of obtained mitogenomes ranged from 11,196 (Polypedilum unifascium) to 11,241 bp (Endochironomus pekanus) in length. Combined and compared with published Chironomidae data, we found that the AT content of the third codon positions was significantly higher than the first and the second positions in PCGs (Figure 2). Most of the 14 mitogenomes exhibited negative GC-skew in PCGs, while E. tendens and E. pekanus were positive; and each of them had negative AT-skew of PCGs, ranging from −0.21 (P. masudai) to −0.17 (E. tendens). The AT content (%) ranged from 71.53 (S. akizukii) to 76.93 (E. albipennis); the GC content (%) ranged from 23.07 (E. albipennis) to 28.47 (S. akizukii) (for detailed information, see Table 2).  the highest AT content, ranging from 91.95% (S. akizukii) to 98.18% (Endochironomus albipennis). The AT content in tRNA and PCGs was lower than that in rRNAs ( Table 2). All newly assembled mitogenomes had a negative GC-skew, while the AT-skew for most of them was positive, except Endochironomus tendens which showed negative AT-skew. The GC-skew ranged from −32.02 (S. akizukii) to −16.30 (Synendotendipes sp.1), and the AT-skew ranged from −0.002 (Polypedilum masudai) to 0.075 (S. akizukii); the GC content (%) ranged from 20.06 (P. flavipes) to 24.02 (S. akizukii) (detailed information is given in Table 2).

Protein-Coding Genes, Codon Usage, and Evolutionary Rates
There was no remarkable difference in the size of tRNA, PCGs, and rRNAs among each species. A total of 13 PCGs of obtained mitogenomes ranged from 11,196 (Polypedilum unifascium) to 11,241 bp (Endochironomus pekanus) in length. Combined and compared with published Chironomidae data, we found that the AT content of the third codon positions was significantly higher than the first and the second positions in PCGs ( Figure 2). Most of the 14 mitogenomes exhibited negative GC-skew in PCGs, while E. tendens and E. pekanus were positive; and each of them had negative AT-skew of PCGs, ranging from −0.21 (P. masudai) to −0.17 (E. tendens). The AT content (%) ranged from 71.53 (S. akizukii) to 76.93 (E. albipennis); the GC content (%) ranged from 23.07 (E. albipennis) to 28.47 (S. akizukii) (for detailed information, see Table 2). All 13 PCGs of newly obtained mitogenomes had the standard start codon ATN, which was most similar to insect mitochondrial. However, several different start codons were found; the start codon of COI gene in 16 species was TTG, in two species was ATG, and in one species was ATA; ATP8 gene in five species was ATA, in nine species was ATT, and in five species was ATC; ND2 was ATT in all species, the ND5 gene in 12 species was GTG, in six species was ATG, in one species was ATC and so on, and detailed statistical information as shown in Figure S1. The codon size ranged from 110 (Endochironomus albipennis) to 1180 bp (Polypedilum heberti) ( Table 1).
The Ka/Ks value (ω) was usually used to measure the rate of sequence evolution by natural selection [45,46]. The result of Ka/Ks ratio of all 13 PCGs was less than one, ranging from 0.0958 (COX3) to 0.7251 (ATP8) (Figure 3), which was same as other insects [43,44]. The evolution rate of 13 PCGs was as follows: This result indicated that, in most cases, selection eliminated deleterious mutation, and all of them evolved under purifying selection pressure (Figure 3). In PCGs, each gene was under different purifying selection. ATP8, ND6, and ND5 showed higher ω value, indicating that they exhibited relatively relaxed purifying selection pressure. Meanwhile, COX2 and COX3 were under the hardly purifying selection, which were similar to previous research results regarding chironomids [2,3,10,43]. selection pressure (Figure 3). In PCGs, each gene was under different purifying selection. ATP8, ND6, and ND5 showed higher ω value, indicating that they exhibited relatively relaxed purifying selection pressure. Meanwhile, COX2 and COX3 were under the hardly purifying selection, which were similar to previous research results regarding chironomids [2,3,10,43].  Table 2). ; the AT-skew in most mitogenomes were negative (−0.048 to −0.002), while five species were negative (P. heberti, Polypedilum yongsanensis, E. pekanus, S. akizukii, and P. nubifer) for more detailed information, see Table 2).

Phylogenetic Relationships
The heterogeneous divergence analysis indicated that the matrix cds12_rrna and cds_rrna exhibited higher heterogeneity than cds_faa, cds12_fna, and cds_fna ( Figure 4). Because of high heterogeneity, third codon positions were rejected to reconstruct the phylogenetic relationship of the Polypedilum generic complex. Furthermore, the genera Endochironomus and Synendotendipes exhibited lower heterogeneity than Polypedilum, Stictochironomus, and Sergentia ( Figure 4).
The heterogeneous divergence analysis indicated that the matrix cds12_rrna and cds_rrna exhibited higher heterogeneity than cds_faa, cds12_fna, and cds_fna ( Figure 4). Because of high heterogeneity, third codon positions were rejected to reconstruct the phylogenetic relationship of the Polypedilum generic complex. Furthermore, the genera Endochironomus and Synendotendipes exhibited lower heterogeneity than Polypedilum, Stictochironomus, and Sergentia ( Figure 4). We used supermatrix cds_faa (3715 sites), cds_fna (10,833 sites), cds_rrna (13,368 sites), cds12_fna (7430 sites), and cds12_rrna (9653 sites) to reconstruct phylogenetic relationships within the Polypedilum generic complex by two different methods. BI and ML approaches based on five matrices yielded five and six trees, respectively (Figures 5 and S2-S9), representing three different topologies. Our result was consistent with phylogenetic tree inferred from previous studies based on four genetic markers [20]. The monophyly of the genera Endochironomus, Polypedilum, Stictochironomus, and Synendotendipes was well supported in all topologies. In BI trees, the phylogenetic analysis of the matrices cds_fna, cds_faa, and cds12_rrna resulted in ((Endochironomus + Synendotendipes) + (Phaenopsectra + Sergentia) + (Polypedilum + Stictochironomus)) ( Figures 5A, S7 and S9), while the matrices cds_rrna and cds12_fna yielded (Stictochironomus + ((Endochironomus + Synendotendipes) + (Sergentia + Phaenopsectra)) + Polypedilum) (Figures 5B and S8). All ML trees Due to the lack of Phaenopsectra and Synendotendipes species, the relationships of the Polypedilum generic complex were unclear in a recent dated molecular phylogeny [20]. Our analyses included a wider range of samples, recovering a new insight for the phylogenetic relationships within the Polypedilum generic complex. According to our data, Endochironomus + Synendotendipes is sister to Phaenopsectra + Sergentia ( Figure 5). Although the BI and ML topologies differed, the phylogenetic relationships of the genera Synendotendipes, Endochironomus, Phaenopsectra, and Sergentia were stably supported in all trees.
Different topologies between BI and ML trees indicated that the phylogenetic relationships based on mitogenomes among this group were still erratic, i.e., the systematic position of Stictochironomus, and the trees which were inferred by the heterogeneity model (CAT + GTR) were also not well supported. Therefore, we need to await further taxonomic and phylogenomic studies with more taxon sampling and availably molecular markers, such as ultra-conserved elements and single-copy orthologous genes, which have been successfully used in other insect groups [47][48][49]

Conclusions
Fourteen mitogenomes of six genera within the Polypedilum generic complex were obtained, including six complete mitogenomes and eight linear mitogenomes. All newly sequenced mitogenomes had similar structural characters and nucleotide compositions to previously published Chironomidae data. In adding Phaenopsectra and Synendotendipes, we could also reconstruct the phylogenetic relationships among the genera within the Polypedilum generic complex. Our results showed that Endochironomus + Synendotendipes are sister to Phaenopsectra + Sergentia, which is a new systematic finding for Chironomidae.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/insects14030238/s1, Table S1. Sample resources used in this study. Figure S1. Start codons of protein-coding genes among Polypedilum generic complex mitogenomes. Figure S2. Maximum likelihood phylogenetic tree of Polypedilum generic complex based on the analysis cds_faa with Partition model in IQTREE. Support values on nodes indicate SH-aLRT/UFBoot2, respectively. Figure