Description of the Three Complete Mitochondrial Genomes of Sitta (S. himalayensis, S. nagaensis, and S. yunnanensis) and Phylogenetic Relationship (Aves: Sittidae)

Nuthatches (genus Sitta; family Sittidae) are a passerine genus with a predominantly Nearctic and Eurasian distribution. To understand the phylogenetic position of Sitta and phylogenetic relations within this genus, we sequenced the complete mitochondrial genomes of three Sitta species (S. himalayensis, S. nagaensis, and S. yunnanensis), which were 16,822–16,830 bp in length and consisted of 37 genes and a control region. This study recovered the same gene arrangement found in the mitogenomes of Gallus gallus, which is considered the typical ancestral avian gene order. All tRNAs were predicted to form the typical cloverleaf secondary structures. Bayesian inference and maximum likelihood phylogenetic analyses of sequences of 18 species obtained a well-supported topology. The family Sittidae is the sister group of Troglodytidae, and the genus Sitta can be divided into three major clades. We demonstrated the phylogenetic relationships within the genus Sitta (S. carolinensis + ((S. villosa + S. yunnanensis) + (S. himalayensis + (S. europaea + S. nagaensis)))).


Introduction
The avian mitochondrial genome is characterized by a small size, a relatively fast rate of evolution, and strict maternal inheritance. Avian mitogenomes are easy to extract and amplify, which makes them an ideal marker for the evolutionary analyses at the molecular level [1][2][3]. The avian mitogenome is a double-stranded circular molecule composed of a light strand (L-strand) and a heavy strand (H-strand) ranging from 16.3-23.1 kb in length [4] and consisting of 1 or 2 control region(s), 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs) and 2 ribosomal RNA genes (rRNAs) [5][6][7]. The mitogenome plays an important role in revealing the phylogeny of birds [8].
The genus Sitta, which belongs to the subfamily Sittinae (Passeriformes, Sittidae), includes 29 species of genus Sitta known around the world, 11 of which are distributed in China [9]. Although many molecular data have been published, the phylogenetic relationship of the family Sittidae remains controversial. Ericson and Johansson placed the family Sittidae within Sylvioidea [10]. In recent years, it has also been proposed that Sittidae belongs to Certhioidea [11,12]. Johansson et al. supported that Sittidae was sister to Polioptilidae and Troglodytidae [11], while Treplin et al. proposed that Sittidae was sister to Certhiidae and Troglodytidae [13]. Subsequent research results showed that Sittidae and Troglodytidae were closely related [14]. These controversies are mainly generated by limited mitochondrial genome data. Therefore, more molecular data are necessary to reconstruct a precise phylogeny [15].
Currently, complete mitogenome sequences from the genus Sitta are very scarce. In order to better understand the relationships among the species of Sitta, we sequenced three Genes 2023, 14, 589 2 of 14 mitogenomes of the genus Sitta and compared them with related species in terms of mitochondrial structure and gene rearrangement. In this study, we report the properties of these three Sitta species and infer their phylogenetic relationships with available mitogenome sequences using all PCGs.

Samples and DNA Extraction
A frozen sample of Sitta himalayensis was provided by the Gaoligong Mountain Nature Park of Yunnan Province, China. Frozen samples of Sitta nagaensis and Sitta yunnanensis were provided by the Zixi Mountain Provincial Nature Reserve of Yunnan Province, China. Voucher samples of three Sitta species were deposited in the Department of Biodiversity Conservation, Southwest Forestry University, Kunming. Samples used in this study were preserved in ethanol absolute and stored at −20 • C. The total genomic DNA was extracted from blood using the TIANamp Genomic DNA Kit (DP304, TIANGEN, Beijing, China) according to the manufacturer's protocol.

Phylogenetic Analyses
The sequences of 15 published mitochondrial genomes were obtained from NCBI. These sequences-along with three new mitogenome sequences obtained in this studywere used to reconstruct the phylogenetic relationships within the genus Sitta using Regulus regulus (Accession No. NC_029837) as an outgroup [25,26]. In addition, an unpublished mitogenome sequence of Cyanoptila cyanomelana available in GenBank (HQ896033) was here named Cyornis hainanus or C. rubeculoides following clarification of its identity by Sangster and Luksenburg [8]. The sequence information is listed in Table 1. The mitogenome sequences of the 13 PCGs were aligned using Clustal X in MEGA v.7.0 with the default parameters [34]. The length of the final alignment was 11,380 nucleotides. The substituted saturation of nucleotide sequences was analyzed with DAMBE 5.2.63 [35]. If Iss was significantly lower than Iss.c (p < 0.05), all of the PCG nucleotide sequences entered the next step. The Bayesian information criterion (BIC) in jModelTest v.0.1.1 [36] was used to determine the optimal nucleotide substitution model, which was GTR+G+I. Bayesian inference (BI) and maximum likelihood (ML) analyses were performed using MrBayes v.3.2.1 [37,38] and RAxMLGUI v.1.5b3 [39], respectively. BI analyses initiated from a random tree with four Markov chains running simultaneously for 200,000 generations and sampling every 100 generations and discarding the first 25% as burn-in. The average standard deviation of split frequencies was set below 0.01 to ensure that stationarity was reached [40]. The confidence values of the BI tree were shown as Bayesian posterior probabilities. In the ML analyses, a total of 1000 replicates were performed with the GTR+GAMMA substitution model. Finally, FigTree v.1.2.2 was used to visualize the phylogenetic trees [41].

Genome Organization
The complete mitogenomes of the three newly sequenced Sitta species were very similar to each other. All three mitogenomes were closed circular molecules ranging from 16,822 to 16,830 bp in length and consisting of 13 PCGs, 22 tRNAs, 2 rRNAs, and a control region ( Figure 1). The gene order of the mitogenomes of all three species analyzed were highly conserved (Figure 1), which was also identical to the gene order found in the mitogenome of Gallus gallus ( Figure 2). For the complete mitogenomes of the three species, one PCG (nad6) and eight tRNAs (trnQ, trnA, trnN, trnC, trnY, trnS2(UCN), trnP, and trnE) were encoded by the L-strand, while all the other genes were encoded by the H-strand. The comparison of the three Sitta species showed that the longest overlap (10 bp) was between atp8 and atp6. The longest intergenic spacer (21 bp) was located between trnV and rrnL in the mitogenome of S. himalayensis ( Table 2). The mitogenomes of the three Sitta species showed a significant bias toward A and T, with the nucleotide composition of A and T ranging from 53.05% to 55.68%. The AT-skew and the GC-skew of the complete mitogenomes of three Sitta species were 0.10 to 0.13 and −0.39 to −0.35, respectively (Table 3).
The comparison of the three Sitta species showed that the longest overlap (10 bp) was between atp8 and atp6. The longest intergenic spacer (21 bp) was located between trnV and rrnL in the mitogenome of S. himalayensis ( Table 2). The mitogenomes of the three Sitta species showed a significant bias toward A and T, with the nucleotide composition of A and T ranging from 53.05% to 55.68%. The AT-skew and the GC-skew of the complete mitogenomes of three Sitta species were 0.10 to 0.13 and −0.39 to −0.35, respectively (Table  3).

Protein-Coding Genes and Codon Usage
In the three Sitta species, the A + T content in PCGs was between 51.97% and 55.17% ( Table 3). The start codons and stop codons of the 13 PCGs were mostly the same among the three species; cox1 used GTG as the start codon, while the remaining 12 PCGs initiated strictly with the standard start codon ATG (Table 2). There were six kinds of stop codons (TAA, AGG, AGA, TAG, TA *, and T **) included in the mitogenome of the three Sitta species. The PCG nad1 in the mitochondrial genome of S. nagaensis contained the incomplete stop codon TA *, while TAN (N represents A and G) occurred in the other two species. Except for nad1, all of the other PCGs used the same stop codon across the three Sitta species.
The relative synonymous codon usage (RSCU) of 13 PCGs in the three newly sequenced mitogenomes was calculated. As shown in Figure 3, CGA (Arg) and CUA (Leu1) were the most commonly used in all three Sitta species. The highest value of RSCU was 3.86 of CGA in S. yunnanensis and the lowest value of RSCU was 0.03 of UAG in S. himalayensis and S. nagaensis. In addition, an analysis of the RSCU values for the 13 PCGs indicated an AT bias. The A + T content in PCGs of S. yunnanensis was slightly higher than that of the other two species, and the use of NNA and NNU codons was also more common in S. yunnanensis. Genes 2023, 14, x FOR PEER REVIEW 7 of 14

tRNA Genes and rRNA Genes
The 22 tRNAs of the three Sitta species were typical and included all 20 types of amino acids ranging from 66 to 75 bp in size. The total length of the 22 mitogenome tRNAs of S. nagaensis was 1542 bp, which was the same as that of S. yunnanensis and only one base lower than that of S. himalayensis. The A + T content of the total mitogenome tRNAs of S. nagaensis was 58.04%, which was lower than that of S. himalayensis (58.26%) and S. yunnanensis (58.31%) ( Table 3). The tRNAs of the three Sitta species were all predicted to fold into typical cloverleaf secondary structures. Furthermore, mismatched base pairs were identified in the stems of 22 different tRNAs, most of which were G-U pairs.
In the mitogenome of the three Sitta species, the 16S rRNA was located between trnV and trnL2(UUR) and ranged from 1575 to 1592 bp in length, while the 12S rRNA was located between trnF and trnV and ranged from 977 to 980 bp. The longest 16S rRNA was found in S. nagaensis and the shortest in S. himalayensis, while the longest 12S rRNA was discovered in S. yunnanensis and the shortest in S. himalayensis. The A + T contents of 16S rRNA and 12S rRNA ranged from 55.59% to 56.32% and from 51.17% to 52.25%, respectively (Table 3).

Control Region
The control region of the three species was located between the trnE and trnF genes ( Figure 1). The size of the control region of S. yunnanensis was 975 bp, which was longer than that of S. himalayensis (945 bp) and S. nagaensis (945 bp). The A + T content of the control region ranged from 53.34% to 55.49% (Table 3). The AT-skew was −0.15 to −0.12, the GC-skew was −0.22, and the A + C content was higher than the T + G content. In this study, we analyzed the control region of three Sitta species, and the predicted structures are shown in Figure 4. The entire control region contained three structural domains, namely Domain I, Domain II, and Domain III. Domain II was relatively conservative, while Domain I and Domain III were heterogeneous across species in terms of nucleotide composition and size [42].
Domain I included extended termination-associated sequences such as ETAS1 and ETAS2 and CSB1-like sequences. Domain II was the central conserved domain in the control region and included six conserved sequence blocks (F-box, E-box, D-box, C-box, b-box, and B-box). Domain III included the CSB1 sequence and light/heavy strand promoter (LSP/HSP), which were located at 911-929 bp in the control region ( Figure 4).

Phylogenetic Analyses
All 18 species from Passeriformes (including the three newly sequenced Sitta mitogenomes and mitogenomes available from NCBI from three additional Sitta species, three Trogloytidae species, five Muscicapidae species, three Turdidae species, and one Regulidae species) were subjected to a phylogenetic analysis based on the concatenated nucleotide sequences of 13 PCGs (Table 1). We used a species of Regulidae as the outgroup based on previously published phylogenies [25,26]. The topologies of phylogenetic analysis in both the BI and ML analyses were identical. The BI posterior probabilities (PPs) and ML bootstrap values (BPs) are labeled in Figure 5.
The present phylogenetic analyses in this study indicated that most species grouped together according to different families. Furthermore, the phylogenetic relationships among four families of the Passeriformes used in this study were: Sittidae and Troglodytidae were clustered into one branch with high nodal support value (PP = 1.00; BP = 95); Muscicapidae and Turdidae were herein corroborated to be sister groups (PP = 1.00; BP = 100); and six species of genus Sitta formed a monophyletic group with high support values (PP = 1.00; BP = 100). Within Sitta, the phylogenetic topologies revealed that S. nagaensis, S. europaea, and S. himalayensis were closely related and that S. villosa and S. yunnanensis were closely related. All datasets supported a monophyletic group of S. carolinensis, which was placed as a sister to all other species of Sitta (PP = 1.00; BP = 100).  genomes and mitogenomes available from NCBI from three additional Sitta species, three Trogloytidae species, five Muscicapidae species, three Turdidae species, and one Regulidae species) were subjected to a phylogenetic analysis based on the concatenated nucleotide sequences of 13 PCGs (Table 1). We used a species of Regulidae as the outgroup based on previously published phylogenies [25,26]. The topologies of phylogenetic analysis in both the BI and ML analyses were identical. The BI posterior probabilities (PPs) and ML bootstrap values (BPs) are labeled in Figure 5. The present phylogenetic analyses in this study indicated that most species grouped together according to different families. Furthermore, the phylogenetic relationships among four families of the Passeriformes used in this study were: Sittidae and Troglodytidae were clustered into one branch with high nodal support value (PP = 1.00; BP = 95); Muscicapidae and Turdidae were herein corroborated to be sister groups (PP = 1.00; BP = 100); and six species of genus Sitta formed a monophyletic group with high support values (PP = 1.00; BP = 100). Within Sitta, the phylogenetic topologies revealed that S. nagaensis, S. europaea, and S. himalayensis were closely related and that S. villosa and S. yunnanensis were closely related. All datasets supported a monophyletic group of S. carolinensis, which was placed as a sister to all other species of Sitta (PP = 1.00; BP = 100).

Mitogenome Characteristics
The mitochondrial genomes of the three birds sequenced in this study were similar to those of Sitta species published in NCBI and ranged from 16,816 to 16,830 bp (S. villosa, 16,816 bp; S. yunnanensis, 16,830 bp). Variations among mitogenomes in this study were

Mitogenome Characteristics
The mitochondrial genomes of the three birds sequenced in this study were similar to those of Sitta species published in NCBI and ranged from 16,816 to 16,830 bp (S. villosa, 16,816 bp; S. yunnanensis, 16,830 bp). Variations among mitogenomes in this study were mainly related to the repetition and length of the control regions [43]. The mitogenomes of the three Sitta species were similar to those of Sitta species published in NCBI: a doublestranded circular molecule composed of 37 typical genes and a control region. The gene order of mitogenomes analyzed was highly conserved, which was also identical to the gene order found in the mitogenome of Gallus gallus [44]. Similar to other typical vertebrates, the complete mitogenome of the three Sitta species had a high content of A + T and a similar AT/GC-skew [43,45]. The lowest value of the A + T content was 53.05% in S. nagaensis, and the highest value of the A+T content was 55.68% in S. yunnanensis.
Among the Sitta mitogenomes, most PCGs had complete stop codons, while the cox3 and nad4 genes had incomplete stop codons (TA *, T **). For the incomplete stop codons, the missing nucleotides may be the result of post-transcriptional polyadenylation [46], which is common in animal mitogenomes and could produce functional stop codons via polycistronic transcription cleavages and polyadenylation mechanisms [46,47]. The patterns of codon usage among the three Sitta mitogenomes were nearly the same. The most frequent used codons were NNA and NNU for each amino acid, and the analysis of the RSCU values for the 13 PCGs indicated an AT bias. As for PCGs, the AT bias can be attributed to the frequent use of NNA and NNU codons [48].
All three Sitta mitogenomes contained the 22 typical tRNAs, which were all predicted to fold into typical cloverleaf secondary structures, and secondary structures across species were similar. In all published Sitta species, the length of 12S rRNA ranges from 931 bp (S. europaea) to 980 bp (S. yunnanensis), and the A + T content ranges from 51.17% (S. himalayensis) to 52.50% (S. villosa). The length of 16S rRNA ranges from 1575 bp (S. himalayensis) to 1597 bp (S. villosa), and the A + T content ranges from 55.40% (S. europaea) to 56.32% (S. yunnanensis). The length and the content of 12S rRNA and 16S rRNA genes of the three species involved in this study were within this range. The control region had a rapid evolution rate and was the region with the largest change in the length and sequence of the complete mitochondrial gene [45]. All three species had a control region that showed an obvious AT bias due to the presence of regulatory elements of mitochondrial genome transcription and replication [49].

Phylogenetic Analyses
In this study, a total of 18 species of Passeriformes were selected for phylogenetic analysis to understand the genetic relationship among the genus Sitta. The phylogenetic trees generated via BI and ML methods were fully resolved with identical topologies ( Figure 5).
The taxonomic status of the Sittidae family is controversial. Cracraft et al. grouped Sittidae and Certhiidae together with Polioptidae and Troglodytidae in the superfamily Certhioidea based on phylogenetic hypotheses [50]. However, the phylogenetic relationship between these four families has not been solved so far within Certhioidea [12]. Treplin et al. placed nuthatches (Sittidae) as a sister of treecreepers (Certhiidae) and the wrens (Troglodytidae) [13]. In this study, we supported the phylogenetic relationships of ((Sittidae + Troglodytidae) + (Muscicapidae + Turdidae)). These phylogenetic relationships were in good accordance with traditional classifications of Sittidae and Troglodytidae [14], whereas recent studies based on morphological data also supported the phylogenetic relationship of ((Polioptidae + Troglodytidae) + (Sittidae + Certhiidae)) [50]. Meanwhile, Muscicapidae and Turdidae as a sister group is well supported [51]. As the mitochondrial genome data of Certhiidae species were not involved in our study, we need to further explore the taxonomic status of nuthatches (Sittidae) by combining the species sequences of multiple families in future studies to confirm the results.
In the phylogenetic relationships within the genus Sitta, Päckert et al. estimated divergence times of nuthatches based on the fossil calibration and mean rate estimates for mitochondrial markers and confirmed that S. carolinensis was a representative of the first divergent clade among the six Sitta species [52]. Meanwhile, S. nagaensis and S. europaea were found to be the sister to S. himalayensis, and S. villosa was the sister to S. yunnanensis. The same results were obtained in this study, and we also demonstrated the phylogenetic relationships within the genus Sitta (S. carolinensis + ((S. villosa + S. yunnanensis) + (S. himalayensis + (S. europaea + S. nagaensis)))). All datasets supported a distant position of S. carolinensis. These results were generally identical to those of Pasquet et al. based on molecular markers, biogeography, and life history data [53]. Meanwhile, Pasquet's work also supported the hypothesis of Asia being the center of diversification for nuthatches (with several independent dispersal events to North America) [53]. Currently, the published mitochondrial genome data of Sitta species are very limited, so mitochondrial genomes of more Sitta species should be sequenced to better elucidate these phylogenetic relationships.

Conclusions
In this study, we sequenced three mitochondrial genomes of three Sitta species and compared them with related species in terms of the mitochondrial structure and gene rearrangement. The study showed that the mitogenomes of the three species were conserved in genome organization, gene order, and base composition. This study revealed the properties of the mitochondrial genomes of the genus Sitta for the first time. Meanwhile, the phylogenetic relationships of the selected species using all PCGs showed that the genus Sitta can be divided into three major clades.