Next-Generation Sequencing of the Complete Huaibei Grey Donkey Mitogenome and Mitogenomic Phylogeny of the Equidae Family

Simple Summary The phylogenetic status of the Huaibei grey donkey based on its complete mitochondrial genome, phylogeny, and maternal origin has not been fully established. This study reports the mitochondrial DNA diversity of the Huaibei grey donkey breed and presents its origin and genetic characterization. The Huaibei grey donkey’s complete mitogenome was 16,680 bp in length, containing 22 tRNAs, 2 rRNAs, 13 PCGs, and 1 D-loop region. The median-joining network and phylogenetic tree indicated two possible maternal lineages, with the Somali lineage as the most probable domestication center of Huaibei grey donkey. These results provide novel information on the origin and phylogeny of the Huaibei grey donkey and can be used as a reference for breeding and conservation management. Abstract The Huaibei grey donkey (HGD) is an endangered species and a vital native breed in Anhui Province, China. However, its complete mitogenome, phylogeny, and maternal origin remain unclear. The objectives of this study were to detect the genetic diversity of the HGD and investigate its phylogenetic relationship with other breeds to inform conservation management. The complete mitogenome of the HGD was sequenced through next-generation sequencing, and the most variable region in the mitochondrial DNA displacement-loop (D-loop) was amplified via a polymerase chain reaction (PCR). Next, we used the median-joining network (MJN) to calculate the genetic relationships among populations and the neighbor-jointing method to build a phylogenetic tree and speculate as to its origin. The results showed that the mitogenome contains 22 tRNAs, 2 rRNAs, 13 PCGs, and 1 D-loop region. Analyzing the D-loop region of the HGDs, we identified 23 polymorphic sites and 11 haplotypes. The haplotype and nucleotide diversity were 0.87000 (Hd) and 0.02115 (Pi), respectively. The MJN analysis indicated that the HGD potentially has two maternal lineages, and phylogenetic analysis indicated that the Somali lineage could be the most probable domestication center for this breed. Therefore, our mitogenome analysis highlights the high genetic diversity of the HGD, which may have originated from the Somali wild ass, as opposed to the Asian wild ass. This study will provide a useful resource for HGD conservation and breeding.


Introduction
Mitochondrial DNA (mtDNA) is an extranuclear genetic material with a specific structure that does not undergo recombination during generational transmission and is maternally derived [1,2]. As a genetic marker, mtDNA is highly significant in genetic and evolutionary livestock studies [3,4]. mtDNA contains exons and a non-coding region-the control region (CR)-also known as the displacement-loop region (D-loop) [5]. In the mammalian mtDNA D-loop, the most variable region exists between tRNA Pro and the

Specimen Collection and DNA Isolation
Samples were collected from Anhui Domestic Donkey Conservation and Breeding Co. Ltd., Anhui, China ( Figure 1). The total genomic DNA was isolated using a blood DNA extraction kit (Tiangen Biotech Co, Ltd., Beijing, China) and stored at −20 °C. DNA integrity was evaluated via 1.5% agarose gel electrophoresis. The most variable region in the mtDNA D-loop of 60 samples was amplified using PCR. One female HGD was selected at random to determine the complete mitochondrial genome sequence.

Amplification of Part of the mtDNA D-Loop and Sequencing
The proximal sequences of the D-loop region (418 bp, between 15,419 bp to 15,836 bp, GenBank accession number NC_001788.1 [17]) of the trnP gene and the central conserved sequence block [18] were amplified by PCR using appropriate primers (F: 5′-AC-CACTCGCAAGCACCA-3′; R: 5′-CACAGCATCCCCAAATA-3′). The primers were designed in Primer 3 [19] and synthesized and purified by TSINGKE Biotechnology Co., Ltd. (Nanjing, China). PCR amplification was performed in a 50 μL system containing 2 μL of template DNA, 25 μL 2 × of Taq Master Mix, 2 μL of each primer, and 19 μL of ddH2O. The amplification conditions were as follows: 94 °C for 5 min, 35 cycles at 94 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s, and a final extension step at 72 °C for 10 min. The complete mitogenome was obtained via whole-genome shotgun sequencing using an Illumina No-vaSeq 6000 platform (Illumina, Inc., San Diego, CA, USA) with paired-end read lengths of 150 bp.

Sequence Assembly Annotation and Analysis
After quality control, Bowtie2 was used to align clean the data from the complete mtDNA of the HGDs against a reference genome (GenBank, NC_001788.1). The reads on the alignment were retained, and SPAdes 3.13.0 (parameters: K 127) was used for genome assembly. The splicing results were compared with the circular reference genome using BLASTN, with the assembly results determined accordingly. tRNAs were recognized and their secondary structures predicted using tRNAscan-SE v2.0 [20]. We derived a circular map of the mitogenome using ORDRAW [21], and the base composition skew was calculated using the formulas AT-skew = (A − T)/(A + T) and GC-skew = (G − C)/ (G + C) [22]. The most variable region was identified through sequence alignment against the reference genome. The 418 bp fragments were edited manually using Contig Express software (Contig Express LLC, New York, NY, USA) and aligned with the reference sequence using CLUSTALX 2.0 [23]. DNAsp v6 [24] was used to calculate the nucleotide diversity, number of polymorphic sites, and haplotype diversity of the most variable region in the HGD mtDNA D-loop region.

Structure and Organization of the Complete Mitochondrial Genome
We deposited the complete mtDNA-a circular 16,680 bp molecule-in GenBank under accession no.MZ911746, displaying a characteristic circular structure of the genetic map ( Figure 2). There are 13 protein coding genes (PCGs), including those encoding seven NADH dehydrogenase complex subunits (ND1-6 and ND4L), three cytochrome oxidase subunits (COX1-3), two ATPase subunits (ATP6 and ATP8), and cytochrome b (CYTB). Additionally, the mitogenome contained 2 rRNA genes (rRNAL and rRNAS), 22 tRNA genes, and 1 control region (D-loop). Regarding their location, 14 tRNA genes, 12 PCGs and 2 rRNA genes were located on the heavy strand, and the remaining 8 tRNA genes and ND6 were on the light strand (Table 1). A bias in nucleotide composition was observed toward A (32.31%) and T (25.59%), with respect to C (28.88%) and G (13.22%). We observed a negative GC-skew (-0.3720) and a positive AT-skew (+0.1161), indicating that A and C were marginally more numerous than T and G (

Codon Usage and Protein Coding Genes
The 13 PCGs had a full length of 9 911 bp comprising 57.3% AT nucleotides, with a positive AT-skew (+0.0750) and a negative GC-skew (-0.4239). Codons encoding Trp were

Codon Usage and Protein Coding Genes
The 13 PCGs had a full length of 9 911 bp comprising 57.3% AT nucleotides, with a positive AT-skew (+0.0750) and a negative GC-skew (-0.4239). Codons encoding Trp were infrequent, whereas those encoding Leu and Ser occurred most frequently ( Figure 3). The PCG regions comprised a total of 5 559 codons. The initiation and termination signals, in addition to the gene lengths, are listed in Table 2. Eleven PCGs have ATG as their start codon, except for ND2 and ND3, which have ATA as their start codon. Eight PCGs (ND1, COX1, COX2, ATPase8, ATPase6, ND4L, ND5, and ND6) had TAA as their stop codon; ND2, COX3, and ND3 had TAG as their stop codon; and ND4 and CYTB had AGA as their stop codon. The most frequently used amino acids in the PCGs of the HGD mitogenomes were Leu and Ser (12.4% and 12.4%, respectively); among the 64 available codons, the

Transfer and Ribosomal RNA Genes
The 22 tRNAs had a full length of 1 516 bp and 61.7% AT nucleotides, with a positive AT-skew (+0.1183) and a negative GC-skew (-0.1854). Most tRNA genes had the typical cloverleaf secondary structure, except for tRNA Ser (GCT) (Figure 4). The full length of the two rRNAs was 2 555 bp, the AT content was 60.1%, the AT-skew was positive (+0.2164), and the GC-skew was negative (-0.1508). With respect to their location, rrnL was located between trnV and trnL1, and rrnS was between trnV and trnF. The lengths of rrnL and rrnS were 1 580 bp and 975 bp, respectively.

Transfer and Ribosomal RNA Genes
The 22 tRNAs had a full length of 1 516 bp and 61.7% AT nucleotides, with a positive AT-skew (+0.1183) and a negative GC-skew (-0.1854). Most tRNA genes had the typical cloverleaf secondary structure, except for tRNA Ser (GCT) (Figure 4). The full length of the two rRNAs was 2 555 bp, the AT content was 60.1%, the AT-skew was positive (+0.2164), and the GC-skew was negative (-0.1508). With respect to their location, rrnL was located between trnV and trnL1, and rrnS was between trnV and trnF. The lengths of rrnL and rrnS were 1 580 bp and 975 bp, respectively.

Genetic Variation and Genetic Diversity of Complete D-Loop Region
There were 23 polymorphic sites and 11 haplotypes in the mtDNA D-loop sequences (H1−H11, GenBank accession numbers: OP095367~OP095377). The AT content of the Dloop region was 52%, with a positive AT-skew (+0.1692) and a negative GC-skew (-0.4542) that was identified between trnP and trnF. The haplotype and nucleotide diversity values were 0.87000 (Hd) and 0.02115 (Pi), respectively. There are 5 singleton variable sites and 18 parsimony informative sites. All of the polymorphisms were A/G and T/C transitions, except at the site of 297. Our molecular studies of donkey mitochondrial sequences have defined two distinct matriarchal lineages (Clade I and II) related to domestication. We  Table 4.  66 72 85 151 162 172 174 180 181 202 203 226 227 234 244 280 297 352 383 388 402 403 404

Genetic Variation and Genetic Diversity of Complete D-Loop Region
There were 23 polymorphic sites and 11 haplotypes in the mtDNA D-loop sequences (H1-H11, GenBank accession numbers: OP095367~OP095377). The AT content of the Dloop region was 52%, with a positive AT-skew (+0.1692) and a negative GC-skew (-0.4542) that was identified between trnP and trnF. The haplotype and nucleotide diversity values were 0.87000 (Hd) and 0.02115 (Pi), respectively. There are 5 singleton variable sites and 18 parsimony informative sites. All of the polymorphisms were A/G and T/C transitions, except at the site of 297. Our molecular studies of donkey mitochondrial sequences have defined two distinct matriarchal lineages (Clade I and II) related to domestication. We  Table 4.

Maternal Origin of Huaibei Grey Donkey
The 60 reference mtDNA D-loop sequences (Table 5) and HGD population sequences of 418 bp were compared to determine the relationships among the haplotypes and the population structure of HGDs. The MJN was constructed for the identified haplotypes. There were two distinct lineages (Clade I and Clade II) revealed by the MJN, and most HGDs were classified into Clade I (31 individuals (51.67%) and 4 haplotypes). Clade II included 29 individuals (48.33%) from 7 haplotypes ( Figure 5).

Maternal Origin of Huaibei Grey Donkey
The 60 reference mtDNA D-loop sequences (Table 5) and HGD population sequences of 418 bp were compared to determine the relationships among the haplotypes and the population structure of HGDs. The MJN was constructed for the identified haplotypes. There were two distinct lineages (Clade I and Clade II) revealed by the MJN, and most HGDs were classified into Clade I (31 individuals (51.67%) and 4 haplotypes). Clade II included 29 individuals (48.33%) from 7 haplotypes ( Figure 5).
For additional clarification of the HGD's origin, we performed comparisons of the HGD mtDNA sequence with that of the Nubian wild ass (E. africanus africanus), Somali wild ass (E. africanus somaliensis), Asian wild ass (E. hemionus), and European and Chinese domestic donkeys. The HGDs were markedly clustered with the Somali wild ass sequences. Further, the phylogenetic tree showed that the HGD was clustered separately from the Asian wild ass clade. Therefore, these results indicate Africa as the most probable location for HGD domestication (Figure 6).  Figure 5. Median-joining network constructed from 11 haplotypes obtained from Huaibei grey donkey and 60 reference sequences obtained from 5 different populations. Note: The circle areas correspond to the haplotype frequency. Yellow circles indicate the haplotypes in this study. The green circles are 30 previously identified haplotypes [26]. The reference haplotypes Hap1~Hap30 were downloaded from NCBI [15], and the purple circles represent haplotypes belonging to Clade I (Hap4, Hap6, Hap7, Hap10, Hap12, Hap16, Hap18, Hap20, Hap22, Hap23, Hap24, Hap25, Hap26,  Hap27and Hap29), while the red circles represent Clade II haplotypes.

Figure 5.
Median-joining network constructed from 11 haplotypes obtained from Huaibei grey donkey and 60 reference sequences obtained from 5 different populations. Note: The circle areas correspond to the haplotype frequency. Yellow circles indicate the haplotypes in this study. The green circles are 30 previously identified haplotypes [26]. The reference haplotypes Hap1~Hap30 were downloaded from NCBI [15], and the purple circles represent haplotypes belonging to Clade I (Hap4 For additional clarification of the HGD's origin, we performed comparisons of the HGD mtDNA sequence with that of the Nubian wild ass (E. africanus africanus), Somali wild ass (E. africanus somaliensis), Asian wild ass (E. hemionus), and European and Chinese domestic donkeys. The HGDs were markedly clustered with the Somali wild ass sequences. Further, the phylogenetic tree showed that the HGD was clustered separately from the Asian wild ass clade. Therefore, these results indicate Africa as the most probable location for HGD domestication ( Figure 6).

Discussion
Mitochondrial DNA is a powerful and widely used molecular marker for the estimation of a population's phylogenetic relationships. It plays an important role in phylogenetic studies, comparative and evolutionary genomics, and molecular evolutionary analyses owing to its various advantageous characteristics, such as maternal inheritance, lack of recombination, and accelerated nucleotide substitution rates compared with those of the nuclear DNA [32,33]. The mtDNA D-loop is an extremely variable region characterized by fast evolution and a non-coding function, with the largest mutation rate in the whole mtDNA [34]. The degree of genetic diversity reflects the strength of biological evolution and evolutionary adaptation potential. To date, some studies have reported the

Discussion
Mitochondrial DNA is a powerful and widely used molecular marker for the estimation of a population's phylogenetic relationships. It plays an important role in phylogenetic studies, comparative and evolutionary genomics, and molecular evolutionary analyses owing to its various advantageous characteristics, such as maternal inheritance, lack of recombination, and accelerated nucleotide substitution rates compared with those of the nuclear DNA [32,33]. The mtDNA D-loop is an extremely variable region characterized by fast evolution and a non-coding function, with the largest mutation rate in the whole mtDNA [34]. The degree of genetic diversity reflects the strength of biological evolution and evolutionary adaptation potential. To date, some studies have reported the complete mitochondrial genome, phylogeny, and maternal origin of the donkey using the full mi-togenome and D-loop region [11,15,16,35]. To elucidate the genetic resource, we reported the sequence of the whole mitogenome, genetic diversity, and maternal origin of the HGD.
In this study, we used next-generation sequencing to reconstruct the complete mitochondrial genome of the HGD, and PCR to amplify the most variable region in the mtDNA D-loop. The results showed that the length of the full HGD mitogenome is 16,680 bp, of which the D-loop region is 1216 bp. As reported in previous genetic studies on donkeys [15,36], the D-loop region of ancestral mtDNA supplies adequate information to assess the genetic variation, evolutionary relationships, and matrilineal genetic origins. We detected 23 nucleotide polymorphic sites in the D-loop region sequences of the HGD mtDNA. Among the D-loop 418 bp fragments, polymorphic nucleotide sites accounted for 9.21%, (of which 94.44% were found to be transformed or transposed) and two insertion sites accounting for 5.56% [27]. Therefore, the frequency of transposition was much higher than that of transversion, which is consistent with our results. We found that the haplotype and nucleotide diversity were 0.87000 (Hd) and 0.02115 (Pi), respectively, which indicated the high genetic diversity of HGDs.
The traditional view on the origin and evolution of the Chinese donkey is that it originated from the African wild ass following domestication. Consistent with our study, Lei et al. [12] found that there are two mitochondrial origins for the African wild ass, lineage Somali (Clade II) and lineage Nubian (Clade I), in Chinese domestic donkeys, and that Clade II was prevalent in Chinese domestic donkey breeds. Our findings provide valuable insight into the evolutionary relationships within Equidae and can be useful for improving our understanding of equid phylogenetics.

Conclusions
The present study revealed a high mtDNA diversity in the D-loop region of HGDs, as indicated by the detection of 11 haplotypes. We confirmed that the Chinese donkey originated from the African wild ass following domestication, consistent with previous research. Moreover, we excluded the Asian wild ass as an ancestor of HGDs. These results can be used to inform future HGD phylogenetic studies and provide insights into the evolution of genomes.
Author Contributions: Conceptualization, writing-original draft and writing-review and editing, S.J., Z.G. and J.X.; data analysis, J.X., L.C. and S.J.; sampling and data analysis, Y.J., Y.D., C.C. and D.X.; methodology and funding acquisition, S.J. and Z.G. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: They will be provided during review.