Morphological Structure Identification, Comparative Mitochondrial Genomics and Population Genetic Analysis toward Exploring Interspecific Variations and Phylogenetic Implications of Malus baccata ‘ZA’ and Other Species

Malus baccata, a valuable germplasm resource in the genus Malus, is indigenous to China and widely distributed. However, little is known about the lineage composition and genetic basis of ‘ZA’, a mutant type of M. baccata. In this study, we compared the differences between ‘ZA’ and wild type from the perspective of morphology and ultrastructure and analyzed their chloroplast pigment content based on biochemical methods. Further, the complete mitogenome of M. baccata ‘ZA’ was assembled and obtained by next-generation sequencing. Subsequently, its molecular characteristics were analyzed using Geneious, MISA-web, and CodonW toolkits. Furthermore, by examining 106 Malus germplasms and 42 Rosaceae species, we deduced and elucidated the evolutionary position of M. baccata ‘ZA’, as well as interspecific variations among different individuals. In comparison, the total length of the ‘ZA’ mitogenome (GC content: 45.4%) is 374,023 bp, which is approximately 2.33 times larger than the size (160,202 bp) of the plastome (GC: 36.5%). The collinear analysis results revealed abundant repeats and genome rearrangements occurring between different Malus species. Additionally, we identified 14 plastid-driven fragment transfer events. A total of 54 genes have been annotated in the ‘ZA’ mitogenome, including 35 protein-coding genes, 16 tRNAs, and three rRNAs. By calculating nucleotide polymorphisms and selection pressure for 24 shared core mitochondrial CDSs from 42 Rosaceae species (including ‘ZA’), we observed that the nad3 gene exhibited minimal variation, while nad4L appeared to be evolving rapidly. Population genetics analysis detected a total of 1578 high-quality variants (1424 SNPs, 60 insertions, and 94 deletions; variation rate: 1/237) among samples from 106 Malus individuals. Furthermore, by constructing phylogenetic trees based on both Malus and Rosaceae taxa datasets, it was preliminarily demonstrated that ‘ZA’ is closely related to M. baccata, M. sieversii, and other proximate species in terms of evolution. The sequencing data obtained in this study, along with our findings, contribute to expanding the mitogenomic resources available for Rosaceae research. They also hold reference significance for molecular identification studies as well as conservation and breeding efforts focused on excellent germplasms.


Introduction
Malus baccata (L.) Borkh., commonly known as 'shanjingzi', is a deciduous fruit tree indigenous to China.It belongs to the Malus genus (Rosaceae, Maloideae) and is extensively distributed throughout various regions of China, including Northeast China (Heilongjiang, Jilin, and Liaoning province), North China (Nei Mongol, Hebei, and Shanxi), and Northwest China (Shaanxi and Gansu province).This wide distribution can be attributed to its preference for sunlight, tolerance to cold temperatures, and adaptability characteristics [1][2][3].Apart from China, this species can also be found in countries such as Russia and North Korea in North and East Asia [4].The branches and leaves of M. baccata are lush, with a flowering period typically occurring from April to June.The fruits mature between September and October.Its tree posture, leaf shape, and flower coloration, as well as its fruit coloration, contribute to its exceptional ornamental value.Furthermore, it holds significant economic importance within the apple industry, where it is utilized for rootstock or variety enhancement purposes [5].
Due to the wide distribution of M. baccata and its diverse adaptability to different living environments and ecological conditions, a multitude of varieties and variants have been discovered in various regions of China [6,7], thereby further enriching the germplasm diversity of M. baccata and Malus.For instance, common variations such as M. baccata f. gracilis Rehd., var.latifolia Skv., and f. villosa Skv. have been reported [8,9].In 1976, Chinese scientists identified a dwarf mutation type called M. baccata 'ZA' from 'shanjingzi' in Hulunbuir City, Nei Mongol Autonomous Region [10][11][12].This germplasm exhibits exceptional cold resistance, with stable dwarfish genetic traits controlled by a dominant major gene.Consequently, this valuable mutation resource of M. baccata holds significant advantages for cross-breeding Malus and apple cultivars with dwarfism and enhanced resistance [10].However, its evolutionary origins within the genus Malus and its biological role within the family Rosaceae remain poorly understood, impeding research progress on M. baccata 'ZA'.
In the study of molecular phylogeny and population inheritance, the mitogenome possesses unique advantages [13][14][15].As a crucial component of maternal inheritance, the mitogenome exhibits a relatively short length and gene conservation, rendering it an exceptional molecular dataset [16].However, due to its intricate structure and abundance of exogenous sequences and repetitive fragments, obtaining the complete sequence is challenging [17].With advancements in sequencing technology and assembly tools, numerous plant mitogenomes have been released in recent years [17][18][19], providing vital support for species traceability and genetic breeding.Currently, there are over 100 Rosaceae mitogenomes available in the NCBI database, with approximately ten belonging to Malus species (including M. domestica (Suckow) Borkh., M. sieversii (Ledeb.)M. Roem., M. sylvestris (L.) Mill., M. hupehensis (Pamp.)Rehder, M. baccata).It should be noted that apart from the aforementioned M. baccata, the Malus genus encompasses more than thirty other species as well [8,[20][21][22][23], such as M. asiatica Nakai, M. prunifolia (Willd.)Borkh., M. micromalus Makino, M. sieboldii Rehder, and M. yunnanensis (Franch.)C. K. Schneid.It is impossible to elucidate the complex interspecific relationships of Malus with the limited genomic data.Decoding the mitogenome of the valuable Malus germplasm 'ZA' can not only unravel its identity mystery and increase available resources for the database but can also hold far-reaching importance for elucidating the evolution of Malus and Rosaceae.
The complete mitogenome of M. baccata 'ZA' was assembled and annotated based on next-generation sequencing and reference datasets in this study.Furthermore, the analysis was conducted on its genome composition, intraspecific and interspecific collinearity, distribution of repeat sequences, and sequence migration events.Additionally, the population evolution of Malus and the molecular phylogeny of Rosaceae were discussed by integrating resequencing data with other mitogenome maps.Consequently, a detailed comparison of these datasets establishes a reliable foundation for the conservation and utilization of the 'ZA' dwarf mutant.

Material Collection, Sample Extraction, and DNA Sequencing
Malus baccata 'ZA' for morphological identification and mitogenome assembly was cultivated at Shandong Agricultural University, National Apple Engineering Technology Research Center (36.162410 • N, 117.157452 • E, Taian, Shandong, China), and subjected to standard agronomic measures for daily management during growth.For ultrastructure detection, scanning electron microscopy (Regulus 8100, Hitachi, Tokyo, Japan) was used for imaging, in which the plant leaves were fixed with a glutaraldehyde solution.The content of photosynthetic pigments (chlorophylls and carotenoids) was determined by spectrophotometric method (95% ethanol was used as blank, absorbance was recorded at the wavelength of 665 nm, 649 nm, and 470 nm), and chlorophyll was extracted and separated by organic solvent ethanol.In addition, young leaves free from pests were collected in the morning on a clear day and immediately frozen in a liquid nitrogen storage tank.They were then temporarily stored in an ultra-low-temperature refrigerator at −80 • C for subsequent experimental arrangements.The cetyl trimethyl ammonium bromide (CTAB) method was employed to extract tissue DNA from the samples, and DNA quality was detected by agarose gel electrophoresis.The whole-genome sequencing of M. baccata 'ZA' was completed using the Hiseq-Xten PE150 platform (Illumina Inc., San Diego, CA, USA) and was supported by the Novogene Bioinformatic Technology Co., Ltd.(Tianjin, China).For Illumina sequencing, the paired-end library (2 × 150 bp) was constructed with an insert size of 350 bp.Additionally, 105 germplasm materials of Malus spp., including M. domestica, M. sieversii, M. sylvestris, M. hupehensis, M. baccata, M. sieboldii, M. yunnanensis, M. toringoides (Rehder) Hughes, M. tschonoskii (Maxim.)C. K. Schneid., and M. ioensis (Alph.Wood) Britton, were used for population evolution analysis in this study (Table S1).These species were planted at Qingdao Academy of Agricultural Sciences, Qingdao Apple Rootstock Research and Development Center (36.238269 • N, 120.539478 • E, Qingdao, Shandong, China).The sampling methods, as well as extraction and sequencing procedures, remained consistent with 'ZA'.

Sequencing Data Processing and Mitochondrial Genome Assembly
The Illumina reads in raw data were filtered based on the following criteria: (1) removal of reads containing sequencing adapters; (2) removal of reads with an unknown base ratio greater than 10%; (3) removal of reads containing more than 20% low-quality bases.For the M. baccata 'ZA' mitogenome, the assembly strategy adopted was as follows: De novo assembly of the mitogenome was accomplished using Unicycler software (v0.5.0) [24].In Unicycler, SPAdes initial assembly was performed based on K-mer values (27,53,71,87,99,111,119, and 127), followed by SPAdes contigs creation, loop unrolling bridges formation, and bridges application for assembly graph construction and structure simplification (Figure S1) [25].The assembly results were visualized using Bandage version-0.8.1 [26].Additionally, the integrity of the mitogenome (Figure S2) was confirmed through read coverage analysis (BWA 0.7.17,SAMtools 1.16, and BAMStats 0.3.5)[27,28].To further assess the quality of 'ZA' mitogenome assembly in this study, the core protein-coding genes (PCGs) were annotated below, which also serves as evidence for result reliability (no possible missing PCGs found).Finally, the complete mitogenome sequence of M. baccata 'ZA' was submitted to the NCBI database, and the entry number (PP826182) was obtained.The raw data used for assembly are stored in the Genome Sequence Archive (GSA) at the National Genomics Data Center (CRA016093, https://ngdc.cncb.ac.cn/, accessed on 15 March 2024) [29,30].Furthermore, using canu [31], we assembled the mitogenome (OR876282) of an apple cultivar (M.domestica 'Honeycrisp') from the SRA sequencing dataset available in the NCBI database [32,33].For subsequent comparative analysis of mitogenomes, both newly obtained and previously published sequences were utilized (Table S2).

Mitogenome Composition, Codon Usage Bias, and Collinearity Analysis
The mitogenome comprises tandem repeats (short tandem repeats-STRs, long tandem repeats-LTRs) and scattered repeats (dispersed repeats-DRs).Among them, STRs were obtained using MISA-web (https://webblast.ipk-gatersleben.de/misa/,accessed on 5 April 2024), LTRs were calculated on the TRF website, and REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer,accessed on 5 April 2024) was utilized for analyzing dispersed repeats with a minimal repeat size of 30 and a hamming distance of 3. It should be noted that the motif repetitions identified by MISA-web were specified as follows: 10 repetitions for 1 bp, 5 repetitions for 2 bp, 4 repetitions for 3 bp, and 3 repetitions each for 4 bp, 5 bp, and 6 bp motifs; all other options were selected in default mode.Additionally, the characteristics of mitogenomes also include GC statistics, such as GC content ([nG + nC]/[nA + nT + nG + nC]) and GC skew ([nG − nC]/[nG + nC]), which are determined using CGView 1.0.2software (https://stothardresearch.ca/cgview/, accessed on 5 April 2024) [38,39].The sliding window algorithm is employed for these calculations with a window size of 1000 and a step value of 10.According to the annotations, the coding sequences were extracted from the mitogenomes using Geneious R9 software [40].Subsequently, their codon usage was characterized by employing the CodonW program, and the analysis indexes primarily encompassed codon adaptation index (CAI), codon bias index (CBI), effective number of codons (ENC), frequency of optimal codons (FOP), and relative synonymous codon usage (RSCU).The similarity analysis of mitogenomes in M. baccata 'ZA' and M. baccata was conducted using Geneious R9.Mitogenome collinearity and rearrangement of Malus species were performed using the Geneious process, employing the genome-wide comparison model (Mauve, progressive algorithm) [41,42].

Phylogenetic Relationship and Interspecific Variation of Rosaceae
In order to elucidate more detailed species clustering and phylogenetic relationships, in addition to the mitogenome sequences of Malus (including M. domestica, M. baccata, M. sieversii, and M. sylvestris obtained from NCBI RefSeq: NC_018554.1,NC_065224.1,NC_065225.1,and NC_065226.1,respectively; as well as the reference sequence for M. baccata 'ZA' provided in this study-PP826182), we queried and downloaded the mitogenomes of other genera and species within Rosaceae from the NCBI RefSeq database (Table S2).Firstly, 24 shared protein-coding genes were extracted from these mitogenomes using Geneious R9.Then, their nucleotide diversity (Hd, Pi) and selection pressure (nonsynony-mous_Ka and synonymous_Ks substitution rates) were calculated using DnaSP software (v6) [51]; subsequently, this allowed for a preliminary comparison of interspecific variations among 42 Rosaceae species, including M. baccata 'ZA'.Data statistics and visualization were performed using WPS Office 2024 and ChiPlot v1.Then, through sequence alignment in Codon mode, followed by pruning and concatenation steps conducted with PhyloSuite v1.2.2, MAFFT v7 and Gblocks 0.91b [52], a dataset suitable for evolutionary analysis was generated.Finally, the reconstructed topology of the aforementioned sequence set was inferred using two types of phylogenetic methods: maximum likelihood (ML) and Bayesian inference (BI).The ML tree analysis was performed using IQ-TREE 2.2.6 with the following options: -m MFP for model selection, -b 1000 for bootstrap support estimation, and -alrt 1000 for SH-aLRT support estimation.The resulting tree was validated using both bootstrap and SH-aLRT support values.The outgroup of the unrooted tree was generated based on the first species in multiple sequence alignment (Geum urbanum).Bayesian inference was conducted using MrBayes 3.2.6 with the following settings: (lset nst = 6 rates = gamma mcmc ngen = 10,000,000 printfreq = 1000 samplefreq = 1000 nchains = 4 nruns = 2 burninfrac = 0.25 sumt contype = allcompat).Convergence of the MCMC process (Markov Chain Monte Carlo) was assessed based on the average standard deviation of split frequencies (ASDSF < 0.01), effective sample size (ESS > 200), and potential scale reduction factor (PSRF ≈ 1).Finally, Adobe Illustrator CS6 and FigTree version 1.4.4 were used to further refine and annotate the phylogenetic trees.

Morphological and Physiological Characteristics of M. baccata 'ZA'
In order to more clearly define the morphological differences between the 'ZA' mutation type and wild type (M.baccata, MB), their plant heights and leaf tissues were compared (Figure 1).As shown in Figure 1B, the height of 'ZA' seedlings was significantly lower than that of wild type (WT), accounting for about one-third.Further observation showed that the leaves of 'ZA' were folded and curved (Figure 1A-C), which was significantly different from WT (Figure 1C).Based on scanning electron microscopy, the ultrastructures of these two kinds of leaves ('ZA' and MB) were analyzed (Figure 1D-I).The results showed that in three different visual fields, the cuticle of the 'ZA' leaf was significantly thickened (Figure 1G-I).It should be noted that 'ZA' has more epidermal wax than MB, a phenomenon that can be easily distinguished at 3500× magnification (Figure 1F,I).
In order to more clearly define the morphological differences between the 'ZA' mutation type and wild type (M.baccata, MB), their plant heights and leaf tissues were compared (Figure 1).As shown in Figure 1B, the height of 'ZA' seedlings was significantly lower than that of wild type (WT), accounting for about one-third.Further observation showed that the leaves of 'ZA' were folded and curved (Figure 1A-C), which was significantly different from WT (Figure 1C).Based on scanning electron microscopy, the ultrastructures of these two kinds of leaves ('ZA' and MB) were analyzed (Figure 1D-I).The results showed that in three different visual fields, the cuticle of the 'ZA' leaf was significantly thickened (Figure 1G-I).It should be noted that 'ZA' has more epidermal wax than MB, a phenomenon that can be easily distinguished at 3500× magnification (Figure 1F,I).Since the histomorphology of 'ZA' and WT leaves showed significant differences, chloroplast photosynthetic pigments were also used for comparison (Figure 2).As can be seen from Figure 2A, the photosynthetic pigment content in 'ZA' leaves is higher, and the chlorophyll content in mature leaves of 'ZA' and MB is higher than that in young leaves (Figure 2A).In the four experimental groups tested, the content of chlorophyll a and b in mature leaves of 'ZA' is much higher than that in the other three groups (Figure 2A).In addition, the content of chlorophyll b in the mature leaves of 'ZA' is higher than that of chlorophyll a, which is opposite to other comparisons, and this phenomenon can also be observed in the chlorophyll a/b ratio (Figure 2B).However, although the chlorophyll content of M. baccata 'ZA' is high, it is difficult and time-consuming to extract it (Figure 2C), reflecting its unique biological characteristics.
Since the histomorphology of 'ZA' and WT leaves showed significant differences, chloroplast photosynthetic pigments were also used for comparison (Figure 2).As can be seen from Figure 2A, the photosynthetic pigment content in 'ZA' leaves is higher, and the chlorophyll content in mature leaves of 'ZA' and MB is higher than that in young leaves (Figure 2A).In the four experimental groups tested, the content of chlorophyll a and b in mature leaves of 'ZA' is much higher than that in the other three groups (Figure 2A).In addition, the content of chlorophyll b in the mature leaves of 'ZA' is higher than that of chlorophyll a, which is opposite to other comparisons, and this phenomenon can also be observed in the chlorophyll a/b ratio (Figure 2B).However, although the chlorophyll content of M. baccata 'ZA' is high, it is difficult and time-consuming to extract it (Figure 2C), reflecting its unique biological characteristics.

Basic Characteristics and Annotations of Malus baccata 'ZA' Mitogenome
To explore the lineage composition and genetic clues of M. baccata 'ZA' and other Malus plants, we assembled the complete mitogenome of 'ZA'.The size is 374,023 bp (master circle structure), which is the smallest among the compared Malus species (Figure S1 and Table 1).By aligning the original reads to the mitogenome, the sequencing depth was calculated (mean 604.43×), which can be used for subsequent analysis (Figure S2 and Table S3).Additionally, we assembled the mitogenome of another cultivated apple variety, M.
Table 2. Annotated genes in the mitochondrial genome of M. baccata 'ZA'.

Gene Category Gene Function Gene Name
Core protein-coding genes Subunit of NADH dehydrogenase (complex I) Note: Genes with multiple introns, or copies, are indicated with a lowercase letter, where a is one intron, b is three introns, c is four introns, d is two copies, and e is three copies.

Repeat Sequences in Mitochondrial Genomes of M. baccata 'ZA' and Other Malus Species
In the comparative analysis conducted in this study, three types of repeat sequences were identified: simple sequence repeats (SSRs), LTRs, and DRs.The findings revealed that DRs were the most abundant in 10 Malus mitogenomes, followed by SSRs (Figure 4 and Table S6).By comparing the number of repetitions and repetition units, we can observe the diversity of SSRs within the M. baccata 'ZA' mitogenome (Figure 4A,B and Table S7).Specifically, a total of 115 SSRs were identified in the 'ZA' mitogenome, with tetra-nucleotide and mono-nucleotide types being predominant at 40 (34.7826%) and 39 (33.9130%), respectively (Figure 4C).Similarly, another sample from M. baccata exhibited a total of 121 SSRs, with tetra-( 41) and mono-SSRs (41) also being the most abundant types observed.This trend was consistent across the other eight species as well (Table S6).Amongst all Malus mitogenomes analyzed, M. hupehensis var.mengshanensis displayed the highest number of SSRs at 125, while M. domestica (NC_018554) had the lowest count at 114; meanwhile, M. domestica 'Yantai fuji 8', M. domestica 'Gala', M. domestica 'Honeycrisp', and M. sylvestris all possessed 116 SSRs each.Finally, it is worth noting that no hexa-nucleotide repeat SSR was found among these 10 Malus mitogenomes.
Through the analysis of LTRs (Figure 4D and Table S8), it can be observed that both M. baccata 'ZA' and M. baccata exhibit a higher number (22) compared to the other eight Malus species (16, 17, and 18), except for M. hupehensis var.mengshanensis (23).Dispersed repeats can be categorized into four groups based on their match direction: forward/direct (F), reverse (R), complement (C), and palindromic (P).While each surveyed species possesses only one 'R' and one 'C', there are significant variations in the abundance of 'F' and 'P' elements they harbor (Figure 4E,F).For instance, M. baccata 'ZA' contains 181 'F' repeats (43.614%) and 232 'P' repeats (55.904%) (Figure 4E), whereas its counterpart in M. baccata reaches 245 and 249, respectively (Figure 4F).Furthermore, sequence lengths of DR predominantly range from 30 to 40 bp (Figure 4G and Table S9).

Codon Preference Analysis of Mitochondrial Coding Genes in M. baccata 'ZA'
Codon usage bias is closely associated with the long-term evolution of species and can characterize the specificity of both species and genes.By comparing the RSCU values of mitochondrial coding sequences between M. baccata 'ZA' and four Malus species, it was observed that they exhibit consistent patterns in terms of codon type and frequency, as well as similar bias patterns (Figure 5).For M. baccata 'ZA' and other species, GCU codons are preferred for encoding Alanine (Ala), while Arginine (Arg) tends to utilize AGA and CGA types.Valine (Val) encoding favors GUA and GUU codons, whereas UAA is more commonly used as the stop codon (Figure 5).Additional calculations were performed to determine other characteristics related to codon usage, including four indexes: CAI, CBI, ENC, and FOP.The results presented in Table S10 indicate that M. baccata 'ZA' has the lowest CAI value among all analyzed samples at 0.166; however, both 'ZA' and M. domestica (NC_018554) display consistent CBI, ENC, and FOP values.In terms of GC3s statistics analysis, M. baccata exhibits the lowest value at 0.355, while M. domestica displays the highest value at 0.357 (Table S10).

Interspecific and Intraspecific Collinearity of Mitogenomes
Firstly, collinearity was detected in M. baccata 'ZA' using the BLAST algorithm (Figure S6A), revealing numerous local alignments within its mitogenome.A comparison with M. baccata in the database (NC_065224) showed smaller alignment blocks but confirmed the homology of the two mitogenomes (Figure S6B).Interestingly, some positions were reversed, indicating the change in direction of sequences within the mitogenome (Figure S6B).A further global comparison revealed a significant number of collinear blocks among all 10 Malus samples (Figure 6), including M. baccata 'ZA', while genome rearrangements (the order of collinear blocks is changed) were also common.For example, the connections around the larger purple and red blocks are more complex (Figure 6).As shown in Figure 6, molecular rearrangement led to a more dispersed distribution of collinear regions and hinted at instability and frequent recombination of Malus mitogenomes.

Assembly of Plastid Genome in M. baccata 'ZA' and Identification of MTPTs
The chloroplast genome (GenBank accession: OR876281) of M. baccata 'ZA' was successfully decoded using the same materials as those used for assembling the mitogenome.In this study, the complete plastid genome (M.baccata 'ZA') had a total length of 160,202 bp (base coverage = 4483.5;GC content: 36.5%).It consisted of a large single-copy region (LSC, 88,318 bp), a small single-copy region (SSC, 19,176 bp), and two inverted repeats (IRs, each spanning 26,354 bp) (Figure 7).In terms of sequence length, it accounted for approximately 42.83% of the total length of its mitogenome.Plastid genome annotation revealed a total of 129 genes, including 84 CDSs, 8 rRNAs, and 37 tRNAs (Figures 7A, S7 and S8 and Table S11).Additionally, a significant number of repeat sequences (Figure 7A, Tables S12-S14) were identified in the cp genome of 'ZA', mainly including 50 DRs, 93 LTRs, and 71 SSRs.The presence of intracellular DNA transfer leads to a significant number of foreign sequences in the mitogenome, including partial fragments derived from the nuclear and chloroplast genomes.Through homology analysis of the plastome and mitogenome, followed by manual filtering, a total of 14 instances of fragment transfer driven by plastids were identified in the M. baccata 'ZA' mitogenome (Figure 7B and Table 3).Furthermore, statistical analysis revealed that the respective proportions of transferred fragments in their corresponding genomes were 0.517% (mtDNA) and 1.835% (cpDNA).In all migration events (Table 3), sequence identity ranged from 73.933% (MTPT2) to 100% (MTPT14), with most being gene fragments rather than complete genes; the longest transfer reached an alignment length of 890 bp (Table 3).
determine other characteristics related to codon usage, including four indexes: CAI, CBI, ENC, and FOP.The results presented in Table S10 indicate that M. baccata 'ZA' has the lowest CAI value among all analyzed samples at 0.166; however, both 'ZA' and M. domestica (NC_018554) display consistent CBI, ENC, and FOP values.In terms of GC3s statistics analysis, M. baccata exhibits the lowest value at 0.355, while M. domestica displays the highest value at 0.357 (Table S10).

Interspecific and Intraspecific Collinearity of Mitogenomes
Firstly, collinearity was detected in M. baccata 'ZA' using the BLAST algorithm (Figure S6A), revealing numerous local alignments within its mitogenome.A comparison with M. baccata in the database (NC_065224) showed smaller alignment blocks but confirmed the homology of the two mitogenomes (Figure S6B).Interestingly, some positions were reversed, indicating the change in direction of sequences within the mitogenome (Figure S6B).A further global comparison revealed a significant number of collinear blocks among all 10 Malus samples (Figure 6), including M. baccata 'ZA', while genome rearrangements (the order of collinear blocks is changed) were also common.For example, the connections around the larger purple and red blocks are more complex (Figure 6).As shown in Figure   For ease of representation, species names are reduced to two characters (see Figure 4).

Assembly of Plastid Genome in M. baccata 'ZA' and Identification of MTPTs
The chloroplast genome (GenBank accession: OR876281) of M. baccata 'ZA' was successfully decoded using the same materials as those used for assembling the mitogenome.In this study, the complete plastid genome (M.baccata 'ZA') had a total length of 160,202 bp (base coverage = 4483.5;GC content: 36.5%).It consisted of a large single-copy region (LSC, 88,318 bp), a small single-copy region (SSC, 19,176 bp), and two inverted repeats For ease of representation, species names are reduced to two characters (see Figure 4).

Population Evolution Analysis Based on Mitochondrial Genome Polymorphisms in Malus
The study of population genetics based on molecular variation holds theoretical significance in species identification and variety tracing.Firstly, utilizing the high-quality mitogenome (M.baccata 'ZA': PP826182) constructed in this study, we detected variations among different Malus species (Table S1).Subsequently, high-quality variations (1424 SNPs and 154 INDELs) were obtained by filtering missing rates and minor allele frequencies (Table S15).Notably, there were notable differences in the number of mutations at various positions within the mitogenome (Figure 8A) and more variants at 50, 70, 120, 200, 320, 330, 360, and 380 Kbp.Regarding SNP analysis based on these high-quality variations: base changes and transitions/transversions ratios (Ts/Tv) were calculated; and transversions (26,394) occurred more frequently than transitions (18,852), with a Ts/Tv value of 0.7143, as shown in Table S16.Additionally, when summarizing the key locations affected by these variations, it becomes evident that most occur within upstream regions, downstream regions, or introns of genes (Figure 8B).Furthermore, distance matrix calculations and phylogenetic tree construction allowed us to obtain topological relationships of 'ZA' and other 105 Malus individuals (Figure 9).In terms of the SNP tree (Figure 9A), M. baccata 'ZA', M. domestica, M. baccata, M. sieversii, M. robusa, and M. prunifolia are closely related, indicating a maternal inheritance relationship between them.Although some branches appear more dispersed in the INDEL tree, the same phenomenon exists (Figure 9B).

Population Evolution Analysis Based on Mitochondrial Genome Polymorphisms in Malus
The study of population genetics based on molecular variation holds theoretical significance in species identification and variety tracing.Firstly, utilizing the high-quality mitogenome (M.baccata 'ZA': PP826182) constructed in this study, we detected variations among different Malus species (Table S1).Subsequently, high-quality variations (1424 SNPs and 154 INDELs) were obtained by filtering missing rates and minor allele frequencies (Table S15).Notably, there were notable differences in the number of mutations at various positions within the mitogenome (Figure 8A) and more variants at 50, 70, 120, 200, 320, 330, 360, and 380 Kbp.Regarding SNP analysis based on these highquality variations: base changes and transitions/transversions ratios (Ts/Tv) were calculated; and transversions (26,394) occurred more frequently than transitions (18,852), with a Ts/Tv value of 0.7143, as shown in Table S16.Additionally, when summarizing the key locations affected by these variations, it becomes evident that most occur within upstream regions, downstream regions, or introns of genes (Figure 8B).Furthermore, distance matrix calculations and phylogenetic tree construction allowed us to obtain topological relationships of 'ZA' and other 105 Malus individuals (Figure 9).In terms of the SNP tree (Figure 9A), M. baccata 'ZA', M. domestica, M. baccata, M. sieversii, M. robusa, and M. prunifolia are closely related, indicating a maternal inheritance relationship between them.Although some branches appear more dispersed in the INDEL tree, the same phenomenon exists (Figure 9B).S1.In phylogenetic analysis, the transformation of distance matrix to tree construction is calculated using the TaxAdd_BalME algorithm, and the corresponding scale represents the genetic distance.

Phylogenetic Relationship between M. baccata 'ZA' and Other Species of Rosaceae
The comparison of differentiation relationships in Malus using the mitogenome of M. baccata 'ZA' provides valuable insights into the cytoplasmic inheritance within the Malus genus.Subsequently, by conducting a comprehensive and extensive sample collection (NCBI RefSeq, Table 1 and Table S2), we described and characterized the evolutionary patterns of M. baccata 'ZA' within the Rosaceae family.Our analysis of 24 conserved and shared mtDNA coding genes from 42 species (belonging to 11 genera: Malus, Sorbus, Rubus, Rosa, Pyrus, Prunus, Potentilla, Photinia, Geum, Fragaria, and Eriobotrya/Rhaphiolepis) revealed nucleotide polymorphisms (π) ranging from 0.03087 (nad4L) to 0.00314 (nad3) (Figure 10), as well as varying levels of haplotype diversity, ranging from 0.502 (nad3) to 0.954 (atp6 and ccmFN).These findings highlight significant differences and associations among these species (Figure 10).location of M. baccata 'ZA' is highlighted in solid red circles (two trees constructed from filtered SNP/INDEL data), and the various details of other species are listed in Table S1.In phylogenetic analysis, the transformation of distance matrix to tree construction is calculated using the TaxAdd_BalME algorithm, and the corresponding scale represents the genetic distance.

Phylogenetic Relationship between M. baccata 'ZA' and Other Species of Rosaceae
The comparison of differentiation relationships in Malus using the mitogenome of M. baccata 'ZA' provides valuable insights into the cytoplasmic inheritance within the Malus genus.Subsequently, by conducting a comprehensive and extensive sample collection (NCBI RefSeq, Tables 1 and S2), we described and characterized the evolutionary patterns of M. baccata 'ZA' within the Rosaceae family.Our analysis of 24 conserved and shared mtDNA coding genes from 42 species (belonging to 11 genera: Malus, Sorbus, Rubus, Rosa, Pyrus, Prunus, Potentilla, Photinia, Geum, Fragaria, and Eriobotrya/Rhaphiolepis) revealed nucleotide polymorphisms (π) ranging from 0.03087 (nad4L) to 0.00314 (nad3) (Figure 10), as well as varying levels of haplotype diversity, ranging from 0.502 (nad3) to 0.954 (atp6 and ccmFN).These findings highlight significant differences and associations among these species (Figure 10).Furthermore, the selection pressure between gene pairs was individually calculated, revealing a substantial proportion of genes with a Ka/Ks ratio less than 1 (Figure 11).This indicates that these genes of 42 Rosaceae species (including M. baccata 'ZA') are subject to purifying selection.However, it is worth noting that the nad4L gene exhibited instances where Ka > Ks in certain species, suggesting its rapid evolution and positive selection (Figure 11).Furthermore, the selection pressure between gene pairs was individually calculated, revealing a substantial proportion of genes with a Ka/Ks ratio less than 1 (Figure 11).This indicates that these genes of 42 Rosaceae species (including M. baccata 'ZA') are subject to purifying selection.However, it is worth noting that the nad4L gene exhibited instances where Ka > Ks in certain species, suggesting its rapid evolution and positive selection (Figure 11).
Based on the nucleotide variation loci mentioned above, we reconstructed the molecular evolutionary tree (ML and BI tree) of 42 species (including M. baccata 'ZA').As depicted in Figure 12A,B, both calculation methods yielded consistent branch structures, and the accuracy and reliability of the phylogenetic trees were confirmed through bootstrap percentage (BP) and posterior probability (PP) tests (Figure 12).In general, five genera, namely Geum, Rubus, Rosa, Potentilla, and Fragaria, formed a large evolutionary structure, while the remaining species constituted another main clade belonging to Amygdaloideae (Figure 12).Specifically, Malus, Pyrus, Sorbus, Eriobotrya, and Photinia were grouped together based on clustering relationships.Within the genus Malus, M. baccata 'ZA' formed a clade with M. domestica, M. baccata, M. sieversii, and M. sylvestris (Figure 12).Based on the nucleotide variation loci mentioned above, we reconstructed the molecular evolutionary tree (ML and BI tree) of 42 species (including M. baccata 'ZA').As depicted in Figure 12A,B, both calculation methods yielded consistent branch structures, and the accuracy and reliability of the phylogenetic trees were confirmed through bootstrap (BP) and posterior probability (PP) tests (Figure 12).In general, five genera, namely Geum, Rubus, Rosa, Potentilla, and Fragaria, formed a large evolutionary structure, while the remaining species constituted another main clade belonging to Amygdaloideae (Figure 12).Specifically, Malus, Pyrus, Sorbus, Eriobotrya, and Photinia were grouped together based on clustering relationships.Within the genus Malus, M. baccata 'ZA' formed a clade with M. domestica, M. baccata, M. sieversii, and M. sylvestris (Figure 12).Bayesian inference tree.To distinguish, the location of 'ZA' in the topology is shown in bold red font.The outgroup of the unrooted tree was generated based on the first species in multiple sequence alignment (for this study, the tree was drawn at the outgroup Geum urbanum).The red, orange, yellow, green, and cyan blocks represent the Malus, Pyrus, Prunus, Rosa, and Fragaria genera, respectively.The numbers on the branches represent support, SH-aLRT support (%)/standard bootstrap percentage (%) for the ML tree, and the posterior probability density for the BI tree.The scale bar in the figure indicates the number of substitutions per site.

Discussion
Mitochondria are referred to as semi-autonomous organelles due to their limited genetic material and play crucial roles in energy metabolism in plant cells [59].The complete genetic system of plants collectively constitutes the mitochondrial genome, chloroplast genome, and nuclear genome [60].The mitochondrial DNA is influenced by the sequence of chloroplast or nuclear DNA through intracellular gene transfer [61].(B) Bayesian inference tree.To distinguish, the location of 'ZA' in the topology is shown in bold red font.The outgroup of the unrooted tree was generated based on the first species in multiple sequence alignment (for this study, the tree was drawn at the outgroup Geum urbanum).The red, orange, yellow, green, and cyan blocks represent the Malus, Pyrus, Prunus, Rosa, and Fragaria genera, respectively.The numbers on the branches represent support, SH-aLRT support (%)/standard bootstrap percentage (%) for the ML tree, and the posterior probability density for the BI tree.The scale bar in the figure indicates the number of substitutions per site.

Discussion
Mitochondria are referred to as semi-autonomous organelles due to their limited genetic material and play crucial roles in energy metabolism in plant cells [59].The complete genetic system of plants collectively constitutes the mitochondrial genome, chloroplast genome, and nuclear genome [60].The mitochondrial DNA is influenced by the sequence of chloroplast or nuclear DNA through intracellular gene transfer [61].Compared to the other two genomes, the plant mitogenome exhibits a lower evolutionary rate and has numerous applications in studying plant evolution, classification, and genetic diversity [49,[62][63][64][65]].An analysis of mitogenome and genome-wide variation revealed convergent evolution during maize domestication and improvement [66].By sequencing and assembling mitogenomes, researchers described the evolutionary relationships and adaptation strategies of four Hevea species [67].To assess population structure and variation in Asian rice and wild rice, statistical values such as fixation index (Fst) were calculated using mitogenome data.The results suggested that indica rice may have a significant genetic distance from japonica rice [49].However, due to the complexity of structural variations and transfer fragments within plant mitogenomes, assembly remains a challenging task [61,68].For the Malus genus, there are approximately ten records of mitogenomes available in NCBI encompassing only seven species, which significantly limits research progress on Malus speciation.
Due to extensive outcrossing and natural mutation of Malus species, the resulting hybrids and mutants not only expand their ecological range and genetic diversity but also pose challenges for species traceability and germplasm identification [10,69].For instance, M. baccata 'ZA' (a dwarf mutant) serves as a clear example.Initially, 'ZA' was reported as a mutant type of M. baccata.Morphological and physiological comparisons in this study confirmed that the plant height and leaf shape of 'ZA' were significantly changed compared with WT (Figure 1).Despite its mention in previous studies, there is limited research on the taxonomic and genetic aspects of 'ZA'.In a study investigating the origin of cultivated apples, SNPs were identified through integrating resequencing and transcriptome data, including that of Malus baccata 'ZA'.Population structure analysis and gene flow assessment revealed distinct ancestors for Chinese and European cultivated apples, with contributions from M. baccata and M. hupehensis through gene introgressions [7].Through differential expression gene (DEG) annotation and hormone assay, it was speculated that the down-regulation of the MbIAA19 gene in 'ZA' plays a crucial role in plant dwarfing and auxin regulation-a conclusion confirmed by subsequent genetic transformation experiments [70].However, despite these references to 'ZA', its maternal origin remains unknown, along with evolutionary clues.To understand this issue comprehensively, we decoded the complete mitogenome of 'ZA' using high-throughput sequencing while describing its organelle inheritance as well as variation pattern.The reference sequence length of the 'ZA' mitogenome was 374,023 bp (Figure 3 and Table 1), which differed from published Malus species (385~423 Kb) (Table 1).Our results identified a total of 54 genes, including 24 core protein-coding genes that were similar to other Malus species [53,54] (Figure 3, Tables 2 and S4).Despite conserved coding genes across different mitogenomes, inconsistent gene arrangement is common due to structural and sequence differences.Although we found numerous collinear blocks in sequence homology comparison, genome rearrangement events in 10 Malus plants still require attention [18] (Figures 6 and S6).As reported in Fragaria [18], the authors used mitochondrial genome data from 13 species to identify potential genome rearrangement events and found large-scale structural variations.The relative synonymous codon usage index provides insight into usage patterns.Codon usage analysis revealed amino acid preferences for Ala, Arg, and Val in 'ZA' mitogenome PCGs with TAA as the frequent stop codon occurrence, similar to Punica granatum and Camellia sinensis studies [16,71] (Figure 5).Repetitive sequences play a crucial role as significant indicators of mitogenome evolution, and investigating their quantitative differences across different species is instrumental in uncovering deeper genomic variation information.In Sorghum mitogenomes [17], A/T, AC/GT, AG/CT, and AT/AT motifs were identified as different types of SSRs, and A/T was the most abundant category.Similarly, this situation also exists in the analysis results of 'ZA' in this paper.MTPT transfer DNA reflects the exchange of genetic material between organelles.In this study, highly similar segments were identified in the 'ZA' cp genome (Figure 7 and Table 3), which constituted 0.517% of the mitogenome.Comparable findings were observed in other species, with percentages of 1.56% (Camellia Duntsa), 0.54% (Punica granatum), and 2.10% (Ilex metabaptista) [16,71,72].Population genetic analysis revealed low nucleotide diversity among mitochondrial coding genes in the compared Rosaceae species (Figure 10), with most genes showing no evidence of positive selection during evolution (Figure 11).Taken together, these data indicate a high level of conservation in mitochondrial genes across 'ZA' and different Malus species, including both cultivated and wild varieties, as evidenced by gene count, codon usage, variation sites, and selection pressure metrics.However, further exploration is needed to understand the complexity of repeat sequences and transfer fragments responsible for high polymorphism and structural variations within non-coding regions [53].
For a considerable duration, the interspecific status and species classification of M. baccata have garnered significant attention.Apart from M. baccata 'ZA' mentioned in this article, various forms of M. baccata (e.g., M. baccata f. gracilis, var.latifolia, f. villosa) and geographically diverse individuals serve as representatives within this category.By reconstructing the evolutionary relationships among chloroplast genomes of different Malus species, it was observed that M. baccata f. gracilis clustered together with four other species (M.hupehenisis, M. sikkimensis, M. toringoides, and M. rokii) [9].Based on the genomic assembly of a sample from Shanxi province in China, approximately 47.56% of the genes in M. baccata exhibited a one-to-one orthology relationship with those found in the genome of M. domestica [73].Through SSR amplification and Fst calculation involving 391 Malus accessions, it was determined that both M. baccata and M. × robusta displayed greater similarity to DomSoviet (M.domestica originating from former Soviet regions) while exhibiting more distant genetic relatedness to Chinese and Western varieties of domesticated apples [74].In an analysis conducted on twelve individuals of M. baccata [3], both the maximum likelihood tree and the Bayesian inference tree revealed two primary branches within the phylogenetic structure of this species.In this study, the maternal genetic characteristics of M. baccata 'ZA' were found to be influenced by M. baccata, M. sieversii, and other closely related species (Figures 9 and 12), which has significantly enhanced our understanding of molecular genetics in both M. baccata and Rosaceae.These examples clearly demonstrate that in the era of extensive systematic evolutionary research facilitated by big data [20][21][22][23], relying solely on partial data is insufficient for comprehensive analysis.Therefore, it is imperative to provide additional reference sequences and molecular datasets to enable accurate inference regarding complex interspecies relationships within Malus.
However, there are still some limitations in the research content of this paper.For instance, the single master circle model fails to fully and accurately depict the diverse and dynamic structural information of mitogenomes [17,75].Fortunately, advancements in sequencing technology (PacBio high-fidelity reads, HiFi) and assembly tools (graphbased sequence assembly toolkit, GSAT; plant mitogenome assembly toolkit, PMAT) will aid us in enhancing our experimental methods and resolving these challenges in the future [75,76].Moreover, the development of the ptGAULprocess serves as a reference case for improving continuity and accuracy in chloroplast genome studies [77].Furthermore, the release and publication of the M. baccata 'ZA' mitogenome offers novel insights into complex evolutionary relationships within Malus and even Rosaceae.To some extent, it also establishes a theoretical foundation for enhancing varieties and utilizing production materials-particularly valuable wild germplasms.

Conclusions
The complete mitogenome of M. baccata 'ZA' was decoded and obtained through high-throughput sequencing and assembly methods in this study.Detailed comparative genomics analysis characterized the similarities and differences in the mitogenomes of Malus, including genome GC content, number of core genes, distribution of repeat sequences, and relative synonymous codon usage.In the mitogenomes of M. baccata 'ZA' and M. baccata, homology blocks are widely distributed, while there are similar regions within the 'ZA' mitogenome and between the mitogenome and plastome of 'ZA'.Furthermore, clear rearrangement events were observed in the mitogenomes of Malus.By mapping the evolutionary position of M. baccata 'ZA' within Malus and Rosaceae based on rich interspecific and deletions; IPMGA, intelligent plant mitochondrial genome annotator; IRs, inverted repeats; Ka, nonsynonymous substitution rates; Ks, synonymous substitution rates; LSC, large single-copy region; LTRs, long tandem repeats; MEGA, molecular evolutionary genetics analysis; ML, maximum likelihood; mt, mitogenome; mtDNA, mitochondrial DNA; MTPTs, mitochondrial plastid DNAs; PCGs, protein-coding genes; PE, paired-end reads; PMAT, plant mitogenome assembly toolkit; PP, posterior probability; pt, plastome; rRNAs, ribosomal RNAs; RSCU, relative synonymous codon usage; SNPs, single nucleotide polymorphisms; spp., species; SSC, small single-copy region; SSRs, simple sequence repeats; STRs, short tandem repeats; tRNAs, transfer RNAs; WT, wild type.

Figure 1 . 1 .
Figure 1.Morphological structure identification of M. baccata 'ZA' and wild type (M.baccata, MB).(A) Appearance of the 'ZA' mutant (stems, leaves, and flowers).(B) Growth status and plant height Figure 1.Morphological structure identification of M. baccata 'ZA' and wild type (M.baccata, MB).(A) Appearance of the 'ZA' mutant (stems, leaves, and flowers).(B) Growth status and plant height of 'ZA' and MB seedlings.The middle two plants are WT and the outer two are 'ZA'.(C) Leaf morphological characteristics of two types-WT on the left and 'ZA' on the right.For (A-C), the white short line represents 1 cm.(D-I) Comparison of ultrastructure difference of upper epidermis between MB (D-F) and ZA (G-I).Each sample is displayed at three magnifications (100×, 1000×, and 3500×).

Figure 2 .
Figure 2. Photosynthetic pigment characteristics of M. baccata 'ZA' and wild type MB.(A) Chlorophyll a/b and carotenoid contents of young and mature leaves in 'ZA' and MB.(B) Chlorophyll a/b values of young and mature leaves in 'ZA' and MB.(C) The difference in chlorophyll extraction time between two plants (MB and 'ZA').The interval is one quarter hour.

Figure 2 .
Figure 2. Photosynthetic pigment characteristics of M. baccata 'ZA' and wild type MB.(A) Chlorophyll a/b and carotenoid contents of young and mature leaves in 'ZA' and MB.(B) Chlorophyll a/b values of young and mature leaves in 'ZA' and MB.(C) The difference in chlorophyll extraction time between two plants (MB and 'ZA').The interval is one quarter hour.

Figure 3 .
Figure 3.The mitogenome map of M. baccata 'ZA'.The shaded parts (orange) in the figure represent the GC content of each region of the genome.Different classes of mitochondrial genes are represented by different colors, and annotated genes with introns are marked with parentheses.The short lines in the inner circle represent the repeat sequence of the mitogenome.

Figure 3 .
Figure 3.The mitogenome map of M. baccata 'ZA'.The shaded parts (orange) in the figure represent the GC content of each region of the genome.Different classes of mitochondrial genes are represented by different colors, and annotated genes with introns are marked with parentheses.The short lines in the inner circle represent the repeat sequence of the mitogenome.

Figure 4 .
Figure 4. Identification and comparison of mitogenome repeats in M. baccata 'ZA' and nine Malus species.(A,B) Frequency statistics of SSRs with different repeat units and repeat times in 'ZA' mitogenome.(C) The proportion of the five SSRs (mono-, di-, tri-, tetra-, and penta-) in 'ZA' mitogenome.(D) Comparison of the number of LTRs in mitogenomes of 10 Malus species.(E) Four different classes of DR repeats in 'ZA' mitogenome.(F,G) The number distribution of DR with different groups and lengths in different mitogenomes.

Figure 4 .
Figure 4. Identification and comparison of mitogenome repeats in M. baccata 'ZA' and nine Malus species.(A,B) Frequency statistics of SSRs with different repeat units and repeat times in 'ZA' mitogenome.(C) The proportion of the five SSRs (mono-, di-, tri-, tetra-, and penta-) in 'ZA' mitogenome.(D) Comparison of the number of LTRs in mitogenomes of 10 Malus species.(E) Four different classes of DR repeats in 'ZA' mitogenome.(F,G) The number distribution of DR with different groups and lengths in different mitogenomes.

Figure 5 .
Figure 5. Relative synonymous codon usage of mitochondrial coding sequences in M. baccata 'ZA' and other four Malus species.Different codons encoding the same amino acid are distinguished by different colors.The codons corresponding to the six color blocks in the columnar stacking diagram are described in the box at the lower-right corner of the figure.

Figure 5 .
Figure 5. Relative synonymous codon usage of mitochondrial coding sequences in M. baccata 'ZA' and other four Malus species.Different codons encoding the same amino acid are distinguished by different colors.The codons corresponding to the six color blocks in the columnar stacking diagram are described in the box at the lower-right corner of the figure.
Biomolecules 2024, 14, x FOR PEER REVIEW 13 of 276, molecular rearrangement led to a more dispersed distribution of collinear regions and hinted at instability and frequent recombination of Malus mitogenomes.

Figure 6 .
Figure 6.Interspecies collinearity comparison of Malus based on mitogenome.Different colors represent different collinear blocks, and the different species are connected by lines.For ease of representation, species names are reduced to two characters (see Figure4).

Figure 6 .
Figure 6.Interspecies collinearity comparison of Malus based on mitogenome.Different colors represent different collinear blocks, and the different species are connected by lines.For ease of representation, species names are reduced to two characters (see Figure4).

Figure 7 .
Figure 7.The overall features of chloroplast genome in M. baccata 'ZA' and MTPT transfer fragment analysis.(A) Gene classification and repeat sequence distribution in 'ZA' chloroplast genome.The genome map contains six layers of annotation information from the inside out, corresponding to dispersed repeats (D in red, P in green), long tandem repeats (colored blue), short tandem repeats (the seven types of microsatellite sequences are labeled as green_p1, yellow_p2, purple_p3, blue_p4, orange_p5, red_p6, and black_c), tetrad composition (LSC, SSC, IRa, and IRb), GC content, and gene name (codon usage bias is marked in parentheses), respectively.The lower-left corner of the map lists the color markers used by different functional genes.The gray arrows in the figure indicate the transcription direction of genes.(B) Transferred fragments from plastome to mitogenome in 'ZA'.The red and blue half rings represent cpDNA and mtDNA, respectively.For both the mitogenome and the plastid genome, their direction is clockwise.The color of transferred fragments in the figure is determined according to the alignment results (BLAST identity).

Figure 7 .
Figure 7.The overall features of chloroplast genome in M. baccata 'ZA' and MTPT transfer fragment analysis.(A) Gene classification and repeat sequence distribution in 'ZA' chloroplast genome.The genome map contains six layers of annotation information from the inside out, corresponding to dispersed repeats (D in red, P in green), long tandem repeats (colored blue), short tandem repeats (the seven types of microsatellite sequences are labeled as green_p1, yellow_p2, purple_p3, blue_p4, orange_p5, red_p6, and black_c), tetrad composition (LSC, SSC, IRa, and IRb), GC content, and gene name (codon usage bias is marked in parentheses), respectively.The lower-left corner of the map lists the color markers used by different functional genes.The gray arrows in the figure indicate the transcription direction of genes.(B) Transferred fragments from plastome to mitogenome in 'ZA'.The red and blue half rings represent cpDNA and mtDNA, respectively.For both the mitogenome and the plastid genome, their direction is clockwise.The color of transferred fragments in the figure is determined according to the alignment results (BLAST identity).

Figure 8 .
Figure 8. Distribution and type of Malus population variation based on mitogenome.(A) Distribution of high-quality variants in mitogenome (M.baccata 'ZA' mitogenome was set as the reference genome, and data were counted per 10 Kb).(B) The genomic region and type of population molecular variation.

Figure 8 .
Figure 8. Distribution and type of Malus population variation based on mitogenome.(A) Distribution of high-quality variants in mitogenome (M.baccata 'ZA' mitogenome was set as the reference genome, and data were counted per 10 Kb).(B) The genomic region and type of population molecular variation.

Figure 9 .
Figure 9. Population topology of 106 Malus germplasms based on molecular variations of mitogenome.(A) Single-nucleotide polymorphism tree.(B) Insertion/deletion tree.The biological location of M. baccata 'ZA' is highlighted in solid red circles (two trees constructed from filtered SNP/INDEL data), and the various details of other species are listed in TableS1.In phylogenetic analysis, the transformation of distance matrix to tree construction is calculated using the TaxAdd_BalME algorithm, and the corresponding scale represents the genetic distance.

Figure 10 .
Figure 10.Haplotype diversity and nucleotide polymorphisms of 24 shared protein-coding genes in mitogenomes of 42 Rosaceae species.

Figure 10 .
Figure 10.Haplotype diversity and nucleotide polymorphisms of 24 shared protein-coding genes in mitogenomes of 42 Rosaceae species.

Figure 11 .
Figure 11.Distribution of Ka/Ks ratio of 24 mitochondrial genes in 42 Rosaceae species (including M. baccata 'ZA').The Ka/Ks values of single genes in different species pairs were calculated and counted respectively, and 0 and illegal values were treated as missing data.Different genes are represented by different colored violin graphs, where the width indicates how much of the data is distributed.In addition, extreme values, quartiles, and medians are indicated with box plots.

Figure 11 .
Figure 11.Distribution of Ka/Ks ratio of 24 mitochondrial genes in 42 Rosaceae species (including M. baccata 'ZA').The Ka/Ks values of single genes in different species pairs were calculated and counted respectively, and 0 and illegal values were treated as missing data.Different genes are represented by different colored violin graphs, where the width indicates how much of the data is distributed.In addition, extreme values, quartiles, and medians are indicated with box plots.

Figure 12 .
Figure 12.Phylogenetic topologies of M. baccata 'ZA' and other species of Rosaceae based on mitochondrial shared single-copy genes (23,838 nucleotide sites).(A) Maximum likelihood tree.(B)Bayesian inference tree.To distinguish, the location of 'ZA' in the topology is shown in bold red font.The outgroup of the unrooted tree was generated based on the first species in multiple sequence alignment (for this study, the tree was drawn at the outgroup Geum urbanum).The red, orange, yellow, green, and cyan blocks represent the Malus, Pyrus, Prunus, Rosa, and Fragaria genera, respectively.The numbers on the branches represent support, SH-aLRT support (%)/standard bootstrap percentage (%) for the ML tree, and the posterior probability density for the BI tree.The scale bar in the figure indicates the number of substitutions per site.

Figure 12 .
Figure 12.Phylogenetic topologies of M. baccata 'ZA' and other species of Rosaceae based on mitochondrial shared single-copy genes (23,838 nucleotide sites).(A) Maximum likelihood tree.(B)Bayesian inference tree.To distinguish, the location of 'ZA' in the topology is shown in bold red font.The outgroup of the unrooted tree was generated based on the first species in multiple sequence alignment (for this study, the tree was drawn at the outgroup Geum urbanum).The red, orange, yellow, green, and cyan blocks represent the Malus, Pyrus, Prunus, Rosa, and Fragaria genera, respectively.The numbers on the branches represent support, SH-aLRT support (%)/standard bootstrap percentage (%) for the ML tree, and the posterior probability density for the BI tree.The scale bar in the figure indicates the number of substitutions per site.

Table 1 .
Comparison of mitogenomes assembled in this study with published sequences of Malus.
Note: The asterisk position (*) indicates that it was mentioned in the Wellcome Sanger Tree of Life Programme.The accession number marked numerically in the upper right corner (1) is the NCBI reference sequence.

Table 3 .
The identification of MTPT transfer fragments in M. baccata 'ZA'.