Phylogenetic Analysis of Some Species of the Anopheles hyrcanus Group (Diptera: Culicidae) in China Based on Complete Mitochondrial Genomes

Some species of the Hyrcanus group are vectors of malaria in China. However, the member species are difficult to identify accurately by morphology. The development of sequencing technologies offers the possibility of further studies based on the complete mitochondrial genome. In this study, samples of mosquitoes of the Hyrcanus group were collected in China between 1997 and 2015. The mitochondrial genomes of ten species of the Hyrcanus group were analyzed, including the structure and base composition, codon usage, secondary structure of tRNA, and base difference sites in protein coding regions. Phylogenetic analyses using maximum-likelihood and Bayesian inference were performed based on mitochondrial genes and complete mitochondrial genomes The mitochondrial genome of 10 Hyrcanus group members ranged from 15,403 bp to 15,475 bp, with an average 78.23% (A + T) content, comprising of 13 PCGs (protein coding genes), 22 tRNAs, and 2 rRNAs. Site differences between some closely related species in the PCGs were small. There were only 36 variable sites between Anopheles sinensis and Anopheles belenrae for a variation ratio of 0.32% in all PCGs. The pairwise interspecies distance based on 13 PCGs was low, with an average of 0.04. A phylogenetic tree constructed with the 13 PCGs was consistent with the known evolutionary relationships. Some phylogenetic trees constructed by single coding regions (such as COI or ND4) or combined coding regions (COI + ND2 + ND4 + ND5 or ND2 + ND4) were consistent with the phylogenetic tree constructed using the 13 PCGs. The phylogenetic trees constructed using some coding genes (COII, ND5, tRNAs, 12S rRNA, and 16S rRNA) differed from the phylogenetic tree constructed using PCGs. The difference in mitochondrial genome sequences between An. sinensis and An. belenrae was very small, corresponding to intraspecies difference, suggesting that the species was in the process of differentiation. The combination of all 13 PCG sequences was demonstrated to be optimal for phylogenetic analysis in closely related species.


Introduction
Malaria is an infectious disease transmitted by Anopheles mosquitoes that seriously threatens human health. According to the World Malaria Report by the World Health Organization, 241 million cases and 627,000 deaths were reported globally in 2020; both numbers increased compared with 2019. In China, malaria once presented a serious risk to people's health, causing a large number of illnesses and deaths. After years of effective prevention and control, China was certified malaria-free by the World Health Organization in 2021. However, there is still a large number of imported malaria cases were brought back to the laboratory and identified based on morphological characteristics as Hyrcanus group species and stored at −80 • C until genome extraction. After extraction of genomic DNA, the species identifications of all samples were confirmed by amplifying the ITS2 sequence using PCR [26]. The reaction mixture is 20 µL containing genomic DNA as templates 10 µL premix Taq (Aidlab, Beijing, China), 0.5 µL forward and reverse primers each, 1 µL genomic DNA and 8 µL ddH 2 O. The PCR reaction was conducted in a ProFlex PCR System (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA), and the cycling conditions were 94 • C for 2 min, followed by 30 cycles of 94 • C/30 s, 45 • C/30 s, 72 • C/30 s, with a final extension at 72 • C for 5 min.

Genomic DNA Extraction, Library Construction, and Next-Generation Sequencing
A single mosquito was placed in a 1.5 mL centrifuge tube and was thoroughly ground after adding liquid nitrogen. Then, the adult mosquito genomic DNA was extracted using a DNeasy Blood Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The genomic DNA concentration of the extracted samples was measured by the Qubit method, and OD 260 /OD 280 was detected by Nanodrop (Thermo Scientific, USA). Genomic DNA samples were stored at −80 • C for further procedures. The DNA was broken into 300-500 bp fragments by the instrument Covaris M220 (Covaris, Woburn, MA, USA). Libraries were then constructed using a TruSeq RNA sample Prep Kit (Illumina, San Diego, CA, USA) and quantified using TBS380 Picogreen (Invitrogen, Carlsbad, CA, USA) reagent. Subsequently, the library was recovered using Certified Low Range Ultra Agarose (Bio-Rad, Richmond, VA, USA). The products were amplified and enriched using a cBot TruSeq PE Cluster Kit v3-cBot-HS (Illumina, San Diego, USA) reagent through PCR reaction, and the cycling conditions were 95 • C for 3 min, followed by 8 cycles of 98 • C/20 s, 60 • C/15 s, 72 • C/30 s, with a final extension at 72 • C for 5 min. Finally, the products obtained were sequenced using a TruSeq SBS Kit (Illumina, San Diego, USA) and in a run of 300 cycles in Illumina Novaseq platform (Illumina, San Diego, USA).

Genomic Assembly and Annotation
The original image data obtained by Illumina sequencing were converted to FASTQ format sequence data through base calling to obtain the original sequencing data file. In total, the raw data reads ranging from 57,068,184 (An. lesteri) to 147,214,292 (An. belenrae).
To ensure the accuracy of subsequent bioinformatics analysis, the Fastp v.19.4 software (https://github.com/OpenGene/fastp/, accessed on 16 May 2022) with default parameters was used to filter the original sequencing data to obtain high-quality data [27]. The specific process is as follows: The adapter sequence was removed from reads. Then, the bases containing non-AGCT at the 5 end were removed. The ends of reads with lower sequencing quality (sequencing quality value less than Q20) were trimmed. Reads with 10% N content were also removed. Small segments less than 50bp in length after adapter and trimming were discarded. After data filtering, statistics of the sequence reads were counted by cutadapt v1.16 software (http://cutadapt.readthedocs.io/, accessed on 16 May 2022). Using FastQC v.0.11.4 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, accessed on 20 May 2022) with the default parameters, the raw sequencing data of each sample were evaluated for quality, including base content distribution and base quality distribution statistics and to further obtain quality control and validation data. The mitochondrial genome reads were extracted with An. sinensis mitochondrial genome sequence as reference. The de novo method was used to splice the sequenced read sequences for multiple iterations using the NOVO Plasty (https://github.com/ndierckx/NOVOPlasty, accessed on 20 May 2022) online software [28]. The sequencing depth ranged from 83× (An. junlianensis) to 7656× (An. sinensis) ( Table 1). Based on the de novo assembled mitochondrial genome sequences, gene annotation was performed by the annotate module of mitoZ (https://github.com/linzhi2013/MitoZ) and visualized by the visualize module [29]. Finally, all mitochondrial genome sequences were manually verified by Geneious v.11.0 software.

Overall Analysis of Mitochondrial Genome Sequences
The nucleotide composition of each mitochondrial genome was evaluated by MEGA 11 software [30]. According to the nucleotide composition of each sample genome, the A-T skew and G-C skew of each gene were calculated using the following formulas: AT-skew = (A% − T%)/(A% + T%), and GC-skew = (G% − C%)/(G% + C%). The secondary structures of tRNAs were predicted by the online tool MITOS webserver (http://mitos. bioinf.uni-leipzig.de/index.py, accessed on 8 September 2022) [31]. The relative uniform codon usage (RSCU) of each mitochondrial genome coding region was calculated using MEGA 11 software.

Analysis of Variable Sites in PCGs
Based on previous research reports, six pairs of closely related species in the Hyrcanus group were selected to analyze the variation among sites; these were An. sinensis and An. belenrae, An. sinensis and An. kleini, An. belenrae and An. kleini, An. kunmingensis and An. liangshanensis, An. junlianensis and An. yatsushiroensis, and An. crawfordi and An. peditaeniatus. Mitochondrial genome sequences of another two An. lesteri were also obtained to analyze intraspecific variation. All three samples were collected from Sichuan Province, China. (Table S1). The 13 coding regions of the mitochondrial genome were aligned in pairs using the MEGA 11 software, and the variable sites in nucleotide bases and amino acids were calculated.

Phylogenetic Analysis
The mitochondrial genome PCGs, tRNA, and rRNA coding regions were extracted and summarized using Geneious v.11.0. The gene analysis combination was aligned using the Clustal w algorithm in MEGA 11 software and merged into separate files. Pairwise p-distances of species were calculated based on 13 PCGs using MEGA 11 software. The following two methods were used to construct phylogenetic trees based on mitochondrial genes and complete mitochondrial genomes. Under the Akaike information criterion (AIC), JModel Test v2.1.10 was used to select the optimal nucleotide substitution model [32]. Afterward, a phylogenetic tree was constructed using MEGA 11 software based on the maximum-likelihood method with bootstrapping values defined from 1000 repetitions. In addition, Bayesian inference was performed using MrBayes v.3.2.7 in two independent and simultaneous runs established for 1,000,000 generations, with four chains including one cold chain and three hot chains sampled every 500 generations, with a relative burn-in of 25% [33]. The topological structures obtained by the two methods were viewed and visualized with MEGA 11 software, and the graphs were edited with the online tool ITOL (https://itol.embl.de/itol.cgi, accessed on 17 October 2022) [34].

Overall Composition and Structure of the Mitochondrial Genome
The full-length mitochondrial genomes of ten species members in the Hyrcanus group were obtained by NGS. Among these ten sequences, those of seven species (An. belenrae (GenBank accession no. OP311320), An. crawfordi (GenBank accession no. OP311321), An. junlianensis (GenBank accession no. OP311322), An. kleini (GenBank accession no. OP311323), An. kunmingensis (GenBank accession no. OP311324), An. liangshanensis (GenBank accession no. OP311326) and An. yatsushiroensis (GenBank accession no. OP311329)) are reported here for the first time. These mitochondrial genomic DNAs were tightly arranged double-stranded circular molecules with lengths ranging from 15,403 bp (A. kunmingensis) to 15,475 bp (An. liangshanensis). All mitochondrial genomes contained 13 protein-coding regions, 22 tRNA coding regions, and two rRNA coding regions arranged on the J chain (forward) and N chain (reverse) (Figures 1 and S1).   Figure S1. The grey inner ring represents the GC content pattern.
The percentages of (A + T) content in the mitochondrial genomes of 10 species in An. hyrcanus group ranged from 78.14% (An. junlianensis) to 78.41% (An. crawfordi), and the average (A + T) content was 78.23%. The average (A + T) contents of PCGs, tRNA, and rRNA coding regions were 76.78%, 78.25%, and 81.43%, respectively. The mitochondrial genomes of the ten species in Hyrcanus group had positive A-T skew values, with a mean value of 0.028, and G-C skew values were all negative, with a mean value of −0.156 (Figure 2a-c, Table S2), close to values for other mosquitoes [35][36][37].

Composition and Structure of PCGs
The total length of the 13 protein-coding region genes was 11,224 bp; these were ATP6 (681 bp), ATP8 (162 bp), COI (537 bp), COII (685 bp), COIII (787 bp), CYTB (1137 bp), ND1 and ND6 (525 bp). The percentages of (A + T) content of the protein-coding region genes were from 76.52% (An. liangshanensis) to 76.92% (An. kunmingensis). The average AT content of the ND6 gene was the highest (85.83%) in PCGs, while the lowest AT content was 70.09% for the COI gene. All A-T skew values of the protein-coding regions were negative. Most of the protein-coding regions had negative G-C skew values, except in the following cases: all G-C skew values of ND1, ND5, ND4, ND4L genes were positive; the COI gene G-C skew values of An. crawfordi, An. kleini, An. kunmingensis, An. lesteri, An. peditaeniatus and An. sinensis were positive, while the COI gene G-C skew value of An. yatsushiroensis was 0; and the CYTB gene G-C skew values of An. junlianensis and An. yatsushiroensis were positive. It is worth noting that all of the COI genes G-C skew values of the protein-coding regions were relatively small, and the average absolute value of the G-C skew was only 0.006. These compositional characteristics have been reported for Anophelinae and other insects [38,39].

Codon Usage in the Protein-Coding Regions
Among the 13 protein-coding regions, nine genes (ATP6, ATP8, COI, COII, COIII, CYTB, ND2, ND3, and ND6) were located on the J chain (forward), and the remaining four genes (ND1, ND4, ND4L, and ND5) were located on the N chain (reverse). The initiation and stop codons of the protein-coding regions of the ten species in Hyrcanus group were the same. Specifically, ATG was used as the initiation codon of ATP6, COII, COIII, CYTB, ND4, and ND4L; ATT was used as the initiation codon of ATP8, ND1, ND2, and ND6. In addition, TCG, ATA, and GTG were the initiation codons in COI, ND3, and ND5, respectively. For the stop codons, TAA was used as the stop codon of ATP6, ATP8, ND1, ND2, ND3, ND4L, ND5, ND6, and CYTB. However, the single-nucleotide T was used as the stop codon of COI, COII, COIII, and ND4. The recording of incomplete stop codons (T/TA) is common in insects, where the TAA stop codon may be completed by the addition of 3 A residues to the mRNA after transcription [40,41].
Except for stop codons, all ten species had 3731 codons in the mitochondrial genome PCGs. The relative synonymous codon usage (RSCU) analysis showed that almost all codons were used, except for CUG (L) and AGG (S) codons, which were not used in any of the coding regions. The frequency of codon usage was different among mosquito species. The CUC (L) codon was only used in An. kleini, An. lesteri, and An. liangshanensis with an average RSCU value of 0.01; CCG (P) codons were only used in An. sinensis, An. belenrae, An. crawfordi, and An. peditaeniatus with an average RSCU value of 0.03; and CGC(R) codons were only used in An. liangshanensis, An. kunmingensis and An. peditaeniatus with an average RSCU value of 0.07. The GGC(G) codon was unused only in An. belenrae and An. peditaeniatus, while the GCG(A) codon was unused only in An. sinensis.
The use of the third base in synonymous codons favored adenine and thymine (uracil) over guanine and cytosine. For example, the most frequently used codon for leucine was UUA with an average RSCU value of 5.27, while the average RSCU value of UUG, which also codes for leucine, was only 0.32, and another CUG codon was not used at all. In terms of codon usage frequency, the highest was UUA (L, average RSCU 5.27), followed by CGA (R, 3.28), UCA (S, 2.55), UCU (S, 2.54), and GCU (A, 2.21), while the lowest were ACG (T, 0.03), ACU (I, 0.03), and GUC (V, 0.06) (Figure 3). The RSCU value was greater than 1 in all NNA and NNU types. This fixed distribution pattern was similar to those reported in other mosquito species [42].

tRNA Structure Analysis
The full-length tRNAs in the mitochondrial genomes of all ten species of Anopheles mosquitoes ranged from 1474 bp to 1476 bp, and the average length of individual tRNA genes ranged from 64 bp to 72 bp. Among the 22 tRNA gene coding regions in ten mitochondrial genomes, 21 predicted tRNA secondary structures that were typical cloverleaf stem-loop structures with four arms and one loop. Only the predicted secondary structure of the tRNA ser1 gene was atypical, lacking a DHU arm and instead forming a DHU loop ( Figure S2). This pattern has been reported in other insects [36,37].

tRNA Structure Analysis
The full-length tRNAs in the mitochondrial genomes of all ten species of Anopheles mosquitoes ranged from 1474 bp to 1476 bp, and the average length of individual tRNA genes ranged from 64 bp to 72 bp. Among the 22 tRNA gene coding regions in ten mitochondrial genomes, 21 predicted tRNA secondary structures that were typical cloverleaf stem-loop structures with four arms and one loop. Only the predicted secondary structure of the tRNA ser1 gene was atypical, lacking a DHU arm and instead forming a DHU loop ( Figure S2). This pattern has been reported in other insects [36,37].

Differential Site Analysis
Differential site analysis was performed on all mitochondrial genome PCGs between six pairs of cryptic species in the An. hyrcanus group, An. sinensis and An. belenrae, An. sinensis and An. kleini, An. belenrae and An. kleini, An. liangshanensis and An. kunmingensis, An. yatsushiroensis and An. junlianensis, and An. crawfordi and An. peditaeniatus. The difference in sites between An. sinensis and An. belenrae was the least (36, 0.32%). The ratio of transitions to transversions was 33:3. This was followed by An. junlianensis and An 21:20) between An. crawfordi and An. peditaeniatus. Differential sites were mostly on the third base of the codon encoding the amino acid, thus not changing the encoded amino acid, while non-synonymous variation in the encoded amino acids was mostly due to the variation in the first bases of the codons (Table 2; Figure S3).  In order to understand the level of intraspecific variation in the mitochondrial genome of Anopheles, the PCGs of three mitochondrial genomes of An. lesteri were compared. There were total 16 different sites (0.14%) among three samples of An. lesteri, showing less variation compared to interspecific variation. The variation sites were distributed in coding gene with five in COI, four in ND4, three in ND2 and ND5, two in ND1, and one in ATP8, COII, and CYTB. No variation site was detected in ATP6, COIII, ND3, ND4L, and ND6.

Genetic Distance
In addition to the ten species in the Hyrcanus group mentioned above, the mitochondrial genomes of four other species of Anopheles were also included in the phylogenetic analysis, An. lindesayi, An. barbirostris, An. dirus, and An. minimus. The complete mitochondrial genomes of An. lindesayi (GenBank accession no. OP324636) and An. barbirostris (GenBank accession no. OP324637) were sequenced in our lab. The An. dirus (GenBank accession no. JX219731) and An. minimus (GenBank accession no. KT895423) belonging to the subgenus Cellia were downloaded from the GenBank database. All of the PCGs of 14 Anopheles were aligned using MEGA 11 software. Nucleotide genetic distances between species were calculated based on the aligned data. The results showed that the smallest interspecies distance existed between An. sinensis and An. belenrae with a p-distance of 0.003. Other small distances existed between An. junlianensis and An. yatsushiroensis (p = 0.007), An. kunmingensis and An. liangshanensis (p = 0.011), and An. sinensis and An. kleini (p = 0.017). In addition, the pairwise p-distances between the subgenus Cellia and subgenus Anopheles were all larger than the p-distances within the subgenera (Table 3).

Phylogenetic Analysis Based on Sequences of 13 PCGs
The J model test was used to analyze the data to select the most suitable nucleotide substitution model; this was determined to be GTR + I + G by the AIC standard test. Based on the sequences of 13 PCGs of the mitochondrial genome, phylogenetic trees were constructed using the maximum-likelihood method (Figure 4a) and the Bayesian method (Figure 4b). The topological shapes of the two phylogenetic trees were completely consistent. The 14 species of Anopheles were divided into two clades. The An. dirus and An. minimus belonging to subgenus Cellia together formed a clade. The other clade comprised the subgenus Anopheles, including the 10 species in the An. hyrcanus group, and An. lindesayi and An. barbirostris. In the clade of subgenus Anopheles, the ten species in the An. hyrcanus group clustered together, then clustered with An. barbirostris, and finally clustered with An. lindesayi. The clustering of the An. hyrcanus group was divided into two subgroups. One subgroup contained An. crawfordi and An. peditaeniatus, and the other contained the other eight species. Among the latter, An. sinensis and An. belenrae, An. liangshanensis and An. kunmingensis, and An. yatsushiroensis and An. junlianensis were the most closely related pairs (Figure 4).

Phylogenetic Analysis Based on Single or Combined Coding Regions
The phylogenetic trees based on single protein-coding regions were constructed using the maximum-likelihood method as above. The results showed that the topological structures of the phylogenetic trees constructed based on only COI and ND4 genes were consistent with the phylogenetic tree constructed using the 13 PCGs, but the bootstrap values were relatively low ( Figure S4a, b). Moreover, phylogenetic trees constructed based on other genes were different in topology, for example, COII and ND5 ( Figure S4c, d). In addition to the sequence of a single protein-coding region, we also selected some highly discriminative coding region combinations (using COI, ND2, ND4, ND5, and ND6) to construct the phylogenetic tree. The results showed that the topological structure of the phylogenetic trees constructed based on the combination of COI + ND2 + ND4 + ND5

Phylogenetic Analysis Based on Single or Combined Coding Regions
The phylogenetic trees based on single protein-coding regions were constructed using the maximum-likelihood method as above. The results showed that the topological structures of the phylogenetic trees constructed based on only COI and ND4 genes were consistent with the phylogenetic tree constructed using the 13 PCGs, but the bootstrap values were relatively low ( Figure S4a,b). Moreover, phylogenetic trees constructed based on other genes were different in topology, for example, COII and ND5 ( Figure S4c,d). In addition to the sequence of a single protein-coding region, we also selected some highly discriminative coding region combinations (using COI, ND2, ND4, ND5, and ND6) to construct the phylogenetic tree. The results showed that the topological structure of the phylogenetic trees constructed based on the combination of COI + ND2 + ND4 + ND5 genes and ND2 + ND4 genes were consistent with the topological structure of the phylogenetic tree constructed using the 13 PCGs, but the bootstrap values of some branches were low ( Figure S5). The phylogenetic trees were also constructed based on the full-length sequence of the tRNA and rRNA coding regions, and the topology was quite different from the phylogenetic tree constructed using the 13 PCGs ( Figure S6). These sequences were clearly not suitable for analyzing the phylogenetic relationships among the members of the An. hyrcanus group.

Phylogenetic Analysis Based on the Complete Mitochondrial Genome Sequence
Based on the full-length mitochondrial genome sequence, including all coding and noncoding regions, we also constructed a phylogenetic tree, and the topology was consistent with the phylogenetic tree constructed using the 13 PCGs ( Figures S7 and S8).

Discussion
In this study, we collected samples of the An. hyrcanus group distributed in different regions of China, and the species were identified by morphological and molecular characteristics. After extracting the genomic DNA of single mosquitoes, the complete mitochondrial genome sequences of ten species in the An. hyrcanus group were obtained using next-generation sequencing. The complete mitochondrial genomes of seven species were reported for the first time. The results showed that the composition and structure of mitochondrial genomes of ten species in the An. hyrcanus group were the same, and their total length and nucleotide composition characteristics were highly similar. In addition, RSCU analysis of protein coding regions showed similar features. These results were consistent with the reported characteristics of mitochondrial genomes in Anopheles mosquitoes, confirming the reliability of the sequences determined in this study.
Owing to the high morphological similarity, some species in the Hyrcanus group are taxonomically controversial [43]. Therefore, molecular methods have become an important basis for accurate identification of similar species [44]. At present, the molecular identification markers of species in the Hyrcanus group include internal transcribed spacer 2 (ITS2) and COI and COII genes of the mitochondrial genome. Ma et al. analyzed the phylogenetic relationships of 12 species in the Hyrcanus group based on ITS2 sequences [45]. Zhu et al. discussed the phylogenetic relationships of six species of the Hyrcanus group based on mitochondrial genes COI + tRNA + COII, ATP6 + COIII, ND1, and 16S rRNA genes [46]. Fang et al. studied the phylogenetic relationships of 18 species of the Hyrcanus group based on COI sequences [47]. Zhang et al. compared ITS2 and COII fragments in constructing phylogenetic trees and suggested that ITS2 sequences could be a better molecular marker to distinguish closely related mosquito species [48]. However, the lengths of ITS2 sequence fragments are different, and thus it is difficult to compare them in the reconstruction of phylogenetic relationships. It has also been suggested that the analysis of phylogenetic relationships may lose some information by using fragment reconstruction. The ITS2 sequence can be used to identify some species in the Hyrcanus group. However, the inconsistent length of sequences in various species, large intraspecific variations especially in Anopheles, and limited evolutionary information provided by a single gene sequence make it not effective for phylogenetic relationship analysis. Overall, ITS2 is suitable for molecular identification of similar species, while the 13 PCGs of the mitochondrial genome are effective for phylogenetic analysis.
Pairwise nucleotide differences in the mitochondrial genome between An. sinensis and An. belenrae were very small. In fact, An. sinensis and An. belenrae were also very similar in ITS2 sequences. Considering their high similarity in morphological and molecular characteristics and overlapping distribution ranges, we believe that An. belenrae and An. Sinensis were in the process of differentiation. However, An. kleini is different in molecular characteristics from An. sinensis and An. belenrae, and the distribution range is more inclined to high latitudes, supporting it being an independent species.
The differences between An. junlianensis and An. yatsushiroensis, and An. kunmingensis and An. liangshanensis were 0.72% and 1.10%, respectively. Some researchers believe that the two groups of Anopheles are synonyms [13]. The results of this study partially support this view, as their level of differentiation was between interspecific and intraspecific. Therefore, the two may still be in the process of species differentiation, and more samples should be collected to further clarify the differentiation level. However, the difference between An. crawfordi and An. peditaeniatus was relatively high at 4.57%. They are clearly two different species.
From the multiple phylogenetic trees constructed above, the topological structure of the phylogenetic tree constructed based on the 13 protein-coding regions was stable and consistent with the known evolution of the Hyrcanus group. Most phylogenetic trees based on single genes of mitochondrial genomes cannot distinguish well the evolutionary relationships between Hyrcanus group species, and are very different from morphological classification results. Therefore, phylogenetic relationship analysis based on single gene should be treated with caution. In addition, tRNA and rRNA coding regions were shown to be unsuitable for phylogenetic analysis. The full-length sequence contains PCGs, tRNA, and rRNA coding regions as well as non-coding regions, leading to the reduction in the discrimination compared with the 13 PCGs evolutionary tree. Certain gene combinations can be selected to construct a phylogenetic tree that is very close to the 13 PCGs phylogenetic tree. However, considering that much basic work is needed to verify the best fragment combination, we believe that 13 PCGs should be the best choice for phylogenetic analysis of the species in the Hyrcanus group [49,50]. However, we were not able to collect all species in the Hyrcanus group distributed in China. For some species, the quality of the extracted genomic DNA was insufficient for next-generation sequencing, partly because of the long storage time. In recent years, the rapid progress of urbanization in China has also led to dramatic changes in the habitat environment of Anopheles mosquitoes, making it difficult to collect some species. Moreover, there are gene polymorphisms in the mitochondrial genome, and the variation among An. hyrcanus group species is common. Differential loci analysis of three An. lesteri samples showed only 16 different sites in the whole mitochondrial genome, which indicated a low intraspecific difference. However, due to sample limitations, one sample per mosquito species was sequenced in this study, except for An. lesteri, and intraspecific differences were not considered. In the future, we will strive to collect more species and expand the sample size to further improve the mitochondrial genome data of the species in the Hyrcanus group.
In conclusion, the complete mitochondrial genome sequences of ten species in the Hyrcanus group distributed in China were obtained, seven of which were reported for the first time. The difference in mitochondrial genome sequences between An. sinensis and An. belenrae was very small, attributed to intraspecies difference, suggesting that the species was in the process of differentiation. The combination of all 13 PCG sequence was shown to be optimal for phylogenetic analysis of closely related species. The results of this study contribute to the accurate identification of the vector species of Anopheles and provide scientific basis for the prevention and control of mosquito-borne diseases.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes14071453/s1, Table S1: Collection and sequencing information of Anopheles lesteri in this study; Table S2: Composition of mitochondrial genome obtained in this study; Figure S1: Schematic map of collection sites for Hyrcanus group in China; Figure S2: Structures of mitochondrial DNA of the Hyrcanus group obtained in this study. The genes arranged in the inner circle are located on the N strand (Reverse), and the genes arranged in the outer circle are located on the J strand (Forward); Figure S3: Secondary structures of the Hyrcanus group tRNAs obtained in this study; Figure S4: Loci differences among 13PCGs of 10 Hyrcanus group species obtained in this study; Figure S5: Phylogenetic reconstruction by maximum likelihood based on single subunits concatenated of the species sequenced in this study and two other taxa with data available from GenBank. (a) COI subunit, (b) ND4 subunit, (c) COII subunit, (d) ND5 subunit. The values of bootstrapping support are shown to the left of the branch point; Figure S6: Phylogenetic reconstruction by maximum likelihood based on combined protein-coding subunits concatenated of the species sequenced in this study and two other taxa with data available from GenBank. (a) The combination of ND2 and ND4 subunits. (b) The combination of COI, ND2, ND4 and ND5 subunits. The values of bootstrapping support are shown to the left of the branch point; Figure S7: Phylogenetic reconstruction by maximum likelihood based on tRNA and rRNA coding region concatenated of the species sequenced in this study and two other taxa with data available from GenBank.