Complete Mitochondrial Genome and Phylogenetic Analysis of the Blue Whistling Thrush (Myophonus caeruleus)

The blue whistling thrush (Myophonus caeruleus) is a bird belonging to the order Passeriformes and family Muscicapidae. M. caeruleus is widely distributed in China, Pakistan, India, and Myanmar and is a resident bird in the southern part of the Yangtze River in China and summer migratory bird in the northern part of the Yangtze River. At present, there are some controversies about the classification of M. caeruleus. We use complete mitochondrial genomes to provide insights into the phylogenetic position of M. caeruleus and its relationships among Muscicapidae. The mitochondrial genome (GenBank: MN564936) is 16,815 bp long and contains 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a non-coding control region (D-loop). The thirteen PCGs started with GTG and ATG and ended with five types of stop codons. The nucleotide composition of T was 23.71%, that of C was 31.45%, that of A was 30.06%, and that of G was 14.78%. The secondary structures of 22 tRNAs were predicted, all of which could form typical cloverleaf structures. There were 24 mismatches, mainly G–U mismatches. Through phylogenetic tree reconstruction, it was found that Saxicola, Monticola, Oenanthe, and Phoenicurus were clustered into one clade, together with the sister group of Myophonus.


Introduction
The mitochondrial genome is the genetic material within the mitochondria, most of which follows strict maternal inheritance, is highly conserved, and is almost unaffected by gene recombination [1].Avian mitochondrial genomes are easy to extract and amplify, making them ideal markers for molecular-level evolutionary analysis [2,3].The length of the avian mitochondrial DNA genome is about 16.3 kb to 20 kb and consists of one heavy chain (H) and one light chain (L) forming a closed circular double-stranded molecule [4].It is usually composed of 37 genes, including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNAs), 2 ribosomes RNA (rRNAs), and a large non-coding D-lLoop region [5,6]; in addition, there are many species that retain the pseudo-D-loop region, such as Sinosuthora conspicillata and Falco rusticolus [7,8].The characteristics of mitochondrial genes in birds are due to a rearrangement near the control region that differs from the gene order found in most vertebrates.In addition, most mitochondrial DNA mutations are point mutations with few insertions or deletions [1].At the same time, compared with a single gene that can only provide limited information, the complete sequence of the mitochondrial genome contains more abundant and accurate information.Because of these advantages, avian mitochondrial genomes have been widely used in phylogenetic reconstruction and interspecific genetic diversity analysis [9].
The blue whistling thrush (Myophonus caeruleus) is a bird belonging to the order Passeriformes and family Muscicapidae, and it occupies mountains and stream sides [10].M. caeruleus has a dark bluish-purple body, with pale bluish-purple spots and markings all over the body except for the wings and tail and a yellow beak (Figure 1).There are nine species of Myophonus in the world, the Taiwan whistling thrush (M.insularis), blue whistling thrush (M.caeruleus), Ceylon whistling thrush (M.lighi), shiny whistling thrush (M.melanurus), Sunda whistling thrush (M.glaucinus), Bornean whistling thrush (M.borneensis), brown-winged whistling thrush (M.castaneus), Malayan whistling thrush (M.robinsoni) and Malabar whistling thrush (M.horsfieldii) [10].Among them, the first two species are distributed in China and the last seven species are found in other countries.M. insularis is distributed only in the Taiwan Province.M. caeruleus is found in Central Asia, East Asia, and Southeast Asia.It is widely distributed in China, including Xizang and central and southern provinces of China.
mutations with few insertions or deletions [1].At the same time, compared with a s gene that can only provide limited information, the complete sequence of the mitoc drial genome contains more abundant and accurate information.Because of thes vantages, avian mitochondrial genomes have been widely used in phylogenetic r struction and interspecific genetic diversity analysis [9].
The blue whistling thrush (Myophonus caeruleus) is a bird belonging to the o Passeriformes and family Muscicapidae, and it occupies mountains and stream sides M. caeruleus has a dark bluish-purple body, with pale bluish-purple spots and mark all over the body except for the wings and tail and a yellow beak (Figure 1).There are species of Myophonus in the world, the Taiwan whistling thrush (M.insularis), blue w tling thrush (M.caeruleus), Ceylon whistling thrush (M.lighi), shiny whistling thrus melanurus), Sunda whistling thrush (M.glaucinus), Bornean whistling thrush (M.bor sis), brown-winged whistling thrush (M.castaneus), Malayan whistling thrush (M.r soni) and Malabar whistling thrush (M.horsfieldii) [10].Among them, the first two sp are distributed in China and the last seven species are found in other countries.M. ins is distributed only in the Taiwan Province.M. caeruleus is found in Central Asia, East and Southeast Asia.It is widely distributed in China, including Xizang and centra southern provinces of China.
At present, phylogenetic relationships of Muscicapidae have been studied u mtDNA and nDNA [11], but the complete mitochondrial genome sequence for M. uleus is lacking.To better understand the mitochondrial genome characteristics and logenetic relationships of M. caeruleus, we sequenced the M. caeruleus mitochondria nome based on next-generation sequencing and reconstructed the M. caeruleus phy netic relationship in combination with published data.

Samples and DNA Extraction
The sample of M. caeruleus was an individual who died of natural causes in the in Zhaotong City of Yunnan Province, China.All tissues used in this study were prese in absolute ethanol and stored at −20 °C until DNA extraction.Using the TIAN At present, phylogenetic relationships of Muscicapidae have been studied using mtDNA and nDNA [11], but the complete mitochondrial genome sequence for M. caeruleus is lacking.To better understand the mitochondrial genome characteristics and phylogenetic relationships of M. caeruleus, we sequenced the M. caeruleus mitochondrial genome based on next-generation sequencing and reconstructed the M. caeruleus phylogenetic relationship in combination with published data.

Samples and DNA Extraction
The sample of M. caeruleus was an individual who died of natural causes in the wild in Zhaotong City of Yunnan Province, China.All tissues used in this study were preserved in absolute ethanol and stored at −20 • C until DNA extraction.Using the TIANamp Genomic DNA Kit (DP304, TIANGEN, Beijing, China), DNA was extracted from the muscle according to the instructions.The DNA integrity was determined by agarose gel electrophoresis at 1% concentration, and the DNA concentration and purity were measured on a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA).

Genome Sequencing, Assembly, and Annotation
The DNA extract was sent to Shanghai Paisenor Biotechnology Co., Ltd.(Shanghai, China), to construct a library using the whole-genome shotgun (WGS) strategy and utilize next-generation sequencing (NGS) technology.These libraries featured paired-end (PE) sequencing based on the Illumina MiSeq sequencing platform.The mitogenome sequence is stored in GenBank database (https://www.ncbi.nlm.nih.gov/genbank/,accessed on 19 June 2024) with the accession number MN564936.High-quality secondgeneration sequencing data were assembled from scratch using A5-miseq v20150522 [12] and SPAdesv3.9.0 [13].The results were corrected using Pilon v1.18 [14] software to obtain the final mitochondrial sequence.The tRNA was validated by the MITOS WebServer [15] (http://mitos.bioinf.uni-leipzig.de,accessed on 19 June 2024); tRNAscan-SE 2.0 [16] adjusted the default setting to the vertebrate mitochondrial genetic code, and tRNAscan-SE 2.0 was used to predict the secondary structure of the tRNA.The open reading frame (ORF) finder [17] on NCBI was used to identify the protein-coding region, set the vertebrate mitochondrial genetic code, and translate it into specific proteins using GenBank.The base composition was calculated and the relative synonymous codon usage (RSCU) was analyzed using MEGA 7.0 [18].The combined skewness was calculated using the formula "AT-skew = (A − T)/(A + T)" and "GC-skew = (G − C)/(G + C)".The CGView Server was used to map the genome (http://stothard.afns.ualberta.ca/cgview_server/index.html, accessed on 19 June 2024) [19].

The Phylogenetic Position of M. caeruleus
In addition to the species targeted in this study, 16 complete mitochondrial genomes were downloaded from the GenBank database.The phylogenetic location of M. caeruleus in Muscicapidae was determined through a comparison of 17 complete mitochondrial genomes (Table 1).The outgroup selected was Paradoxornis heudei (NC_046943), which belongs to family Paradoxornithidae.In addition, HQ896033 was published as Cyanoptila cyanomelana and is still listed on GenBank as that species but is actually a misidentified Cyornis hainanus or C. rubeculoides.We used these sequences with the correct species labels.MEGA 7.0 [18] was used to compare 20 sequences and remove the missing base parts.Bayesian inference (BI) was used for the phylogenetic analysis.The Bayesian information criterion (BIC) in jModelTest v.0.1.1 was used to determine the optimal nucleotide replacement model as GTR + G + I [20].Using MrBayes [21], we constructed a Bayesian inference (BI) phylogenetic tree based on 13 PCGs, respectively.Four Markov chains in the BI phylogenetic tree ran simultaneously, totaling 400,000 generations.Samples were collected every 100 generations, and the first 25% was discarded as burnin [22].The phylogenetic trees were visualized with FigTree v.1.2.2.

Organization and Structure of M. caeruleus Mitochondrial Genome
The mitochondrial genome of M. caeruleus (GenBank: MN564936) is 16,815 bp in length, of which the mitochondrial genome coding region is 15,568 bp in length and contained 37 genes, including 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, and a non-coding region (Figure 2).They accounted for 62.30%, 15.50%, 8.94%, and 5.75% of the mtDNA's total length, respectively.There are nine genes in the L chain, including eight tRNA genes and one protein-coding gene (ND6), and the rest of the protein-coding genes are encoded through the H-chain.The arrangement of genes in the mtDNA of M. caeruleus is relatively tight, and there are 32 bp (nine places) gene overlaps among 37 genes.There are a total of 22 gene intervals, with a length of 402 bp, accounting for only 2.43% of the total length of mitochondrial genes.The interval length ranged from 1 to 296 bp.The genes are closely arranged in seven Genes 2024, 15, 830 5 of 11 places without overlapping or spacing.The mitochondrial genome analysis of M. caeruleus is shown in Table 2. MEGA 7.0 software was used to calculate the base composition of the complete mitochondrial genome as A = 30.06%,T = 23.71%,C = 31.45%,and G = 14.78%.Among them, the content of A + T base (53.77%) was slightly higher than that of G + C base (46.23%), and the skew of AT (AT-skew) was 0.118, while that of GC (GC-skew) was −0.36 (Table 3).

rRNA and tRNA Genes
Both the s-rRNA and l-rRNA genes of M. caeruleus are on the H strand. s-rRNA is 980 bp long and is located between trnF and trnV.l-rRNA is 1579 bp long and lies between trnV and trnL2.The A + T content of the two rRNA is s-rRNA = 51.94%, and the AT-skew is 0.202.The l-rRNA = 55.67%, and the AT-skew is 0.24.
There are 22 tRNA genes distributed between protein-coding genes and rRNA genes in the mitochondria of M. caeruleus.The total length of tRNA genes is 1546 bp, and the length of single genes ranged from 66 (trnS1) to 75 bp (trnL2 and trnS2).The construction of the predicted secondary structure of tRNA is found to be able to form a clover structure (Figure 3).There are 24 mismatches in 22 tRNAs, including 6 pairs on the DHU arm, 3 pairs on anticodon arm, 7 pairs on TΨC arm, and 8 pairs on amino acid arm, all of which are GU mismatches.

Protein-Coding Genes Composition
The total length of the 13 protein-coding genes is 10,246 bp, accounting for 60.93% of the total mitochondrial genome length.Among the 13 PCGs, only ND6 is encoded by the L strand, and the other 12 genes are encoded by the H chain.The ND5 gene had the longest length (1818 bp).The shortest gene is Atp8 with a length of 168 bp.In this study, the

Protein-Coding Genes Composition
The total length of the 13 protein-coding genes is 10,246 bp, accounting for 60.93% of the total mitochondrial genome length.Among the 13 PCGs, only ND6 is encoded by the L strand, and the other 12 genes are encoded by the H chain.The ND5 gene had the longest length (1818 bp).The shortest gene is Atp8 with a length of 168 bp.In this study, the initial codon of M. caeruleus is usually ATG, but the start codon of COI was GTG.The stop codon is dominated by TAA.Among them, ND2, COII, Atp8, Atp6, ND3, and ND4l have TAA as the termination codons; COI has AGG as the termination codons; ND5 has AGA as the termination codons; ND6 has TAG as the termination codons; and ND1, COIII, and ND4 have incomplete termination codons.They are TA-and T-, respectively (Table 2).The highest RSCU of PCGs is CUA (2.35).The lowest used codon is GCG with a frequency of 0.25 (Table 4, Figure 4).

Non-Coding Sequencin
M. caeruleus has one control region.The total length of the mtDNA control region is 950 bp, located between trnE and trnF.The base combination in the D-loop region is: A: 22.32%; T: 30.63%;G: 17.68%; and C: 29.37%.The content of A + T (52.95%) is slightly higher than that of G + C (47.05%), indicating bias.
TAA as the termination codons; COI has AGG as the termination codons; ND5 has AGA as the termination codons; ND6 has TAG as the termination codons; and ND1, COIII, and ND4 have incomplete termination codons.They are TA-and T-, respectively (Table 2).The highest RSCU of PCGs is CUA (2.35).The lowest used codon is GCG with a frequency of 0.25 (Table 4, Figure 4).

Phylogenetic Analysis
In order to construct the phylogenetic tree, the complete mitochondrial genomes of 16 species of passerine birds from 12 genera in two families were searched and downloaded from the NCBI.This included one Myophonus, three Ficedula, one Niltava, two Muscicapa, one Copsychus, one Cercotrichas, one Saxicola, one Monticola, two Oenanthe, one Cyornis, and two Phoenicurus genera.C. heudei is an outgroup, and these sequences are constructed based on 13 PCGs.A BI phylogeny reconstructed using RAxML-GUI is shown in Figure 5.

Mitogenome Characteristics
The Muscicapidae is the largest group of Passeriformes, with 312 species and 709 subspecies in 49 genera.Like other birds, the mitochondrial genome of M. caeruleus is covalent circular double-stranded, consisting of 37 genes, 13 PCGs, 2 rRNA genes (rrnL and rrnS), 22 tRNA genes, and 1 non-coding control region (CR) [2,5].According to the literature, the mtDNA length of birds is generally between 15 and 20 kb.The mitochondrial genome of M. caeruleus in this paper is within this range.
Similar to other birds [32], gene overlap and gene spacing exist in the mitochondrial genome of M. caeruleus.The gene overlap was 1-10 bp, and the gene interval was 1-17 bp.Our results show that in the phylogenetic trees, all species are divided at the genus level into two large clades, with Muscicapa, Copsychus, and Cercotrichas clustered into one large clade and Oenanthe, Monticola, Saxicola, Phoenicurus, Myophonus, Ficedula, Niltava, and Cyornis clustered into another large clade.In the macroclades, Saxicola and Monticola are grouped together and were sister to Oenanthe.Some of these three genera are sister to Phoenicurus (BI bootstrap support 0.97).M. caeruleus is grouped with Saxicola, Monticola, Oenanthe, and Phoenicurus (BI bootstrap support 0.94).Among them, these four genera are grouped into a single branch and were sister to Myophonus.Furthermore, Ficedula is the sister clade of (Niltava + Cyornis).

Mitogenome Characteristics
The Muscicapidae is the largest group of Passeriformes, with 312 species and 709 subspecies in 49 genera.Like other birds, the mitochondrial genome of M. caeruleus is covalent circular double-stranded, consisting of 37 genes, 13 PCGs, 2 rRNA genes (rrnL and rrnS), 22 tRNA genes, and 1 non-coding control region (CR) [2,5].According to the literature, the mtDNA length of birds is generally between 15 and 20 kb.The mitochondrial genome of M. caeruleus in this paper is within this range.
Similar to other birds [32], gene overlap and gene spacing exist in the mitochondrial genome of M. caeruleus.The gene overlap was 1-10 bp, and the gene interval was 1-17 bp.The gene distribution of this species was the same as that of most vertebrates [33].Except for NAD6 and eight trnas (trnQ, trnA, trnN, trnC, trnY, trnS2, trnP, and trnE), all genes were evenly distributed on the H strand.The mitochondrial A + T content of the whole genome was 53.77%, which was consistent with the typical base bias of vertebrates described by Huang L. [34].The average length of tRNA was 70 bp, the longest was trnL and trnS2 (75 bp), and the shortest was trnS1 (66 bp).All tRNAs can be folded into a standard clover model.The two rRNAs were rrnS and rrnL, with lengths of 980 bp and 1579 bp, respectively, and the two rRNAs were located between trnF and trnL2 and were separated by trnV, as in most vertebrates.The start codon of PCGs was usually ATC, but it was GTG in CO I, consistent with previous findings [35].Four complete terminal codons have been identified, namely TAA (NAD2, COII, Atp8, Atp6, NAD3, NAD4l, cob), AGG (COI), AGA (NAD5), and TAG (NDA6).For NAD1, COIII, and NAD4, the stop codons of these three PCGs are incomplete TA* (NAD1, NAD4) and T** (COIII).For these incomplete stop codons, we speculate that the loss of nucleotides may be due to polyadenylation of DNA during transcription, which is normal in vertebrate mitosis.

Phylogenetic Analyses
The mitochondrial sequences have been widely used to infer phylogenetic relationships between bird species [1].Based on 13 PCGs, the phylogeny of M. caeruleus was studied in this study.The results show that Saxicola is sister to Monticola, and together these are sister to Oenanthe (BI bootstrap support 0.97), which is similar to other studies [36].The grouping of Saxicola, Monticola, Oenanthe, and Phoenicurus as sisters to Muscicapa supports the idea of Fengjun Li and Min Zhao et al. [11,35].Moreover, in the phylogenetic trees, the Muscicapa was a deep branch, which differs from the results of other studies [24].

Conclusions
The mitochondrial genome structure of M. caeruleus is similar to that of other passerine birds, including three PCGs, two rRNA genes, twenty-two tRNA genes, and a D-loop.In addition to trnQ, trnA, trnN, trnC, trnY, trnS2, trnP, trnE, and ND6 distributed on the L chain, the other protein-coding genes were evenly distributed on the H chain.There were 24 mismatches in 22 tRNA secondary structures, all of which were G-U mismatches.There were five types of stop codons.There was a control area between trnE and trnF.In the phylogenetic analysis, the two species (((Monticola+ Saxicola) + Oenanthe) + Phoenicurus) were sister to M. caeruleus.

Figure 1 .
Figure 1.A picture of Myophonus caeruleus.The photo was taken by Jun Liu on 15 January 2 Baihualing of Gaoligong Mountain in Baoshan City, Yunnan province, China.

Figure 1 .
Figure 1.A picture of Myophonus caeruleus.The photo was taken by Jun Liu on 15 January 2020 at Baihualing of Gaoligong Mountain in Baoshan City, Yunnan province, China.

Genes 2024 , 13 Figure 3 .
Figure 3.The twenty-two tRNAs as the secondary structure of M. caeruleus.

Figure 3 .
Figure 3.The twenty-two tRNAs as the secondary structure of M. caeruleus.

Figure 4 .
Figure 4.The relative codon usage frequency of the 13 PCGs.

Figure 4 .
Figure 4.The relative codon usage frequency of the 13 PCGs.

Figure 5 .
Figure 5. Phylogeny of mitochondrial genome sequence of M. caeruleus.Topology of Bayesian inference (BI) analysis inferred from the protein-coding genes.

Figure 5 .
Figure 5. Phylogeny of mitochondrial genome sequence of M. caeruleus.Topology of Bayesian inference (BI) analysis inferred from the protein-coding genes.

Table 1 .
List of the 17 Muscicapidae species and one outgroup used in this paper with their GenBank accession numbers.

Table 3 .
Base composition of the complete mitochondrial genome, protein coding gene, and rRNA gene of M. caeruleus.