1. Introduction
Tylosema esculentum, also known as the marama bean and the gemsbok bean, is a non-nodulated legume from the Fabaceae family and the Cercidoideae subfamily [
1,
2]. Marama is endemic to the Kalahari Desert and the surrounding arid and semi-arid regions of Botswana, Namibia, and South Africa [
3]. Marama has developed a special drought avoidance strategy by growing giant tubers that can weigh more than 500 pounds to store water, helping them survive the harsh conditions of long-term drought and little rainfall [
4]. The seeds of marama are edible and nutritious, with a protein content of approximately 30–39% dry matter (dm) and a lipid content of 35–48% dm, comparable to those found in commercial crops such as soybean and peanuts, respectively [
5,
6,
7]. Marama also contains many micronutrients and phytochemicals that are beneficial to human health [
8,
9,
10]. However, marama is still mainly collected from wild plants, and the domestication of marama has long been thought to have the potential to improve local food security. Marama usually does not flower and produce seeds until two to four years after planting, making traditional breeding very inefficient [
11]. Therefore, selecting improved marama individuals and exploring the underlying genetic diversity is considered of great significance for the improvement of the bean [
12].
Mitochondria in eukaryotic cells are generally considered to have originated from the endosymbiosis of alpha-proteobacteria, although a number of changes have occurred since then, including the loss of many genes and their transfer to the nuclear genome [
13]. The mitochondrial genomes of animals and plants have been found to vary greatly. Animal mitogenomes are usually small, only about 16 kb in size, and contain 37 genes [
14]. Plant mitogenomes are commonly larger, with 50 to 60 genes and expanded intergenic non-coding regions that result from the DNA transfer from other cellular compartments or even from different organisms [
15,
16,
17]. In land plants, they range in size from 66 kb in
Viscum scurruloideum to 11.3 Mb in
Silene conica [
18,
19].
Plant mitochondrial genes are very conserved, are considered to play important roles in ATP synthesis, and are also related to plant fertility and environmental adaptation [
20,
21,
22]. The base substitution rate of mitochondrial genes is lower than that of chloroplast genes, and it is far lower, only about one-tenth of the rate, when observed in nuclear genes [
23,
24]. Plant mitogenomes evolve even up to 100 times slower than animal mitogenomes [
25]. However, the structure of plant mitochondrial genomes is very dynamic, with a large number of sequence rearrangements, and repeat-mediated homologous recombination plays an important role in its structural variations [
26,
27].
In the mitochondrial genomes of many angiosperms, repetitive sequences account for 5–10% of the total genome size, and in a few plants such as
S. conica and
Nymphaea colorata, the proportion can exceed 40% with repetitive fragments up to 80 kb in length [
18,
28]. Recombination mediated via short or intermediate length repeats is considered to be less frequent, but recombination on long repeats (>1 kb) is thought to occur more frequently and usually generates equimolar-recombined molecules in the plant mitogenomes [
29,
30,
31].
The third-generation sequencing technology, such as PacBio, provides long reads with an average length of 10–25 kb, spanning the long repeats in the plant mitogenome. This makes the study of structural variations caused by the long repeat-mediated recombination possible [
32,
33]. High sequencing coverage is also important, not only to make the genome assembly more reliable, but also to determine the proportion of different chromosome structures. It is also indispensable for the accurate assessment of nucleotide polymorphisms [
34]. The latest PacBio HiFi sequencing with an extremely high accuracy of 99.9%, which needs little correction via the data generated from other sequencing platforms, further promotes the genome assembly [
35,
36].
Although plant mitochondrial genomes are often reported as one master circular chromosome, in reality, they often exist as multipartite structures. This includes a combination of linear molecules, branched molecules, and subgenomic circular molecules [
37,
38]. For example, the mitogenome assembly of
Solanum tuberosum was found to contain at least three autonomous chromosomes, including two small circular molecules and a long linear chromosome of 312,491 bp in length [
39].
The genome of
T. esculentum is small, with an estimated haplotype genome size of only 277.4 Mb [
40]. Previous studies of its mitogenome found that two different equimolar structures coexist in the same individual: two autonomous rings with a total length of 399,572 bp, or five smaller circular molecules [
31]. These two structures are believed to be interchangeable via recombination on three pairs of long direct repeats (3–5 kb in length). As described in the study of the
Brassica campestris mitogenome, recombination on a pair of 2 kb repeats was postulated to split the 218 kb master chromosome into two subgenomic circular molecules of 135 kb and 83 kb in length [
41].
The previous comparative analysis of 84
T. esculentum chloroplast genomes has found two distinct germplasms. The two types of chloroplast genomes are different from each other at 122 loci and at a 230 bp inversion [
42]. Among many of these loci, heteroplasmy, the existence of two or more different alleles, could be seen, albeit one generally at a frequency below 2%. The occasional paternal leakage is considered to cause this phenomenon [
43,
44]. The reason for its stable inheritance is thought to be related to the developmental genetic bottleneck, but the specific mechanism is still unclear [
45].
Research on heteroplasmy should avoid interference from homologous sequences of mitochondrial DNA in other organelles. In fact, the horizontal gene transfers of DNA from the chloroplast genome to the mitogenome and the nuclear genome, or between the mitogenome and the nuclear genome, are very common [
46,
47]. The transfer of DNA from the mitogenome to the chloroplast genome is very rare, but it has been reported in a few studies [
48,
49]. Many chloroplast genes have been found to become pseudogenes and lost function after being inserted into the mitogenome, but the reason behind it is still unclear [
50].
Although the mechanisms underlying the effects of cytoplasmic activities on agronomic traits are not well understood, previous studies on potato cytoplasmic diversity have found that cytoplasmic types are directly related to traits such as tuber yield, tuber starch content, disease resistance, and cytoplasmic male sterility [
51,
52,
53]. Furthermore, certain mtDNA types and certain chloroplast DNA types were found to be linked [
54]. A comparative genomics analysis based on the chloroplast genomes of 3018 modern domesticated rice cultivars found that their genotypes fall into two distinct clades, suggesting that the domestication of these cultivars may have followed two distinct evolutionary paths [
55]. These studies have the potential to reveal important selections occurring in organelle genomes, improving the understanding of plant adaptation to different environments and providing a basis for crop breeding to increase the yield in the corresponding environments.
The chloroplast genome is widely used in research on plant evolution, but the comparative analysis based on the plant mitochondrial genome is not so extensive, and there is even less research on the mitochondrial diversity within the same species [
56,
57]. In this study, the diversity of the marama mitochondrial genome was analyzed by mapping the WGS reads of 84
T. esculentum individuals to the previously assembled reference mitogenome aiming to (1) discover possible mitogenome structural diversity and the impact of structural variations on the gene sequence and copy numbers; (2) compare the differential loci in the mitogenomes of 43 independent marama individuals collected from different geographical locations in Namibia and South Africa to explore the divergences that have occurred and the possible decisive environmental factors behind them; (3) look at heteroplasmy, the co-existence of multiple types of mitogenomes within the same individuals, and compare the allele frequencies in related individuals to better understand the underlying cytoplasmic inheritance; and (4) track polymorphisms accumulated in the chloroplast DNA insertion and interpret the fate of the inserted gene residues. In addition, the conserved protein-coding genes from the mitogenomes of
T. esculentum and other Fabaceae species were compared to explore the evolutionary relationship between them.
3. Results
3.1. Genome Structure and Rearrangement
When the WGS Illumina reads from the 84 individuals were mapped to the two chromosomes of the reference mitogenome of
T. esculentum, LS1 (OK638188) and LS2 (OK638189), two distinct mitogenomes were found. The mitogenomes of 45 individuals were similar to the reference mitogenome, with only a few substitutions and indels seen, termed as type 1. However, the mitogenomes of the other 39 individuals were similar to each other but differed substantially from the reference, including both structural and sequence differences, termed as type 2 (
Table S1). This is consistent with the previously published study of
T. esculentum chloroplast genomes, where these 84 individuals were found to contain two distinct germplasms [
42]. These two cytotypes actually differ not only in the chloroplast genome but also in the mitochondrial genome, but no individuals with a type 1 chloroplast genome and a type 2 mitogenome, or vice versa, have been identified. The PacBio HiFi reads from the type 2 individual Sample 4 were assembled via Canu to generate three circular molecules M1 (OP795449), M2 (OP795450), and M4 (OP795447), and one linear chromosome M3 (OP795448), with a total length of 436,568 bp and a GC content of 44.8% (
Table 1 and
Figures S1–S5). The four chromosomes consisted of 21 contigs assembled directly from the Illumina reads of type 2 individuals, containing four double-copy regions and one triple-copy region (
Figure 1 and
Table 2). Among these multi-copy regions, homologous sequences of contigs H and I were also doubled in coverage in the type 1 mitogenome of
T. esculentum, but the rest were present as single-copy sequences in the type 1 mitogenome [
31].
Both ends of chromosome M3 were found to be long repeats that were homologous to parts of other chromosomes (
Figure 2). In addition, in a very long range of 15 to 20 kb, the sequencing depth gradually decreased towards both ends with many PacBio reads reaching the ends within this range. This furthers confirms that this is a linear chromosome that exists in different lengths in cells because of the lack of telomere protection. Linear chromosomes have been found to stably exist in eukaryotic cells even in the absence of telomeres via strand invasion between terminal sequences and their homologous internal sequences to form t-loops to protect the chromosomes from degradation [
78]. Since the repeats at both ends of M3 are very long, the PacBio reads we obtained cannot span them to verify whether this linear molecule recombines with other chromosomes. Long range PCR amplification that can amplify sequences above 20 kb can be considered here to answer this question.
Numerous differences were found between the type 1 and type 2 mitogenome structures (
Figure 3). One is a rare recombination on a pair of
trnfM genes on the large circular molecule M1, which inverts a 12,846 bp sequence. The sequence before inversion also appeared in type 2 plants, but with a frequency less than 2%. Furthermore, the three small rings of the type 1 mitogenome exist in different forms in the type 2 mitogenome, and four gaps have been found on them. New type 2 exclusive fragments were discovered, including a 2108 bp segment, which connected the originally distantly located contigs C1 and B. In addition, the recombination on a pair of 35 bp direct repeats joined contigs C1 and D2 to form a new circular molecule M2. C2 was found to connect with K1 and then further extended to N and L2 to form a linear chromosome, but the mechanism behind this is unclear. It can be seen that the deletion and insertion of the entire DNA segment, alongside repeat mediated recombination, can lead to dramatic changes in the mitogenome structure.
The two types of
T. esculentum mitogenomes were compared via the NUCmer alignment and then visualized via a synteny diagram, showing a high degree of similarity (
Figure 4). The basic blocks making up the two mitogenomes are highly similar, except for the 2108 bp type 2 unique fragment and some other short insertions and deletions, but the order of these blocks has been changed via recombination. When the WGS Illumina reads from the 84 individuals were mapped to the region where the 2108 bp type 2 unique fragment resides, 39 samples were found to contain this fragment while 45 plants did not (
Figures S19–S28). Furthermore, in the type 1 mitogenome, LS1 and LS2 are two autonomous circular chromosomes that do not recombine into one master circle, but in the type 2 mitogenome, a linear chromosome M3 was found to contain homologous sequences from both LS1 and LS2, suggesting that these molecules may not be independent of each other in all marama individuals.
Blast results showed that this 2108 bp type 2 mitogenome exclusive fragment was highly similar to the mitochondrial sequences of the Fabaceae species,
Lupinus albus and
Indigofera tinctoria, suggesting that they share a common ancestor (
Figure 5A). Two pairs of primers were designed and found to effectively identify this 2108 bp fragment (
Figure 5B). As shown in
Figure 5C, samples A and B, out of the five randomly selected samples, contained both the 990 bp left end band and the 289 bp right end band after amplification, indicating that only these two of the five had type 2 mitogenomes.
3.2. Gene Annotation
Both types 1 and 2 mitochondrial genomes of
T. esculentum share identical sets of genes but differ in the copy numbers of certain genes. These genomes contain 35 unique protein-coding genes, 3 unique rRNA genes, and 16 distinct tRNA genes (
Figure 6;
Table 3) [
31]. The type 2 mitogenomes have two copies of
nad9 and
atp8. The gene
nad9 is located on contig K1, a long repeat present on both chromosomes M3 and M4 in type 2 mitogenomes, while there is only one copy of K1 in type 1 mitogenomes. The gene
atp8 is located on a pair of long repeats J, so its copy number is doubled in both types of mitogenomes. The copy number of exon 3 and 4 of gene
nad5 is also doubled in type 2 mitogenomes but not in type 1, and it is not known whether this affects its expression level. In addition, there are two copies of
rrn5 and
rrnS in type 2 mitogenomes but only one copy in type 1. A total of 26 tRNA genes were found in type 2 mitogenomes, including four copies of
trnfM-CAT, three copies of
trnM-CAT, three copies of
trnC-GCA, two copies of
trnP and
trnQ, and 12 single-copy tRNA genes.
The atypical start codon ACG was used by three genes
nad1,
nad4L, and
rps10, and ATT was used by the gene
mttB. This is consistent with the research on the mitogenome of common beans from which these four genes were all reported to use an alternative initiation codon ACG [
79]. C-to-U editing was found to be widely used in mitochondrial and chloroplast genes in land plants [
80]. ATT is also usually used as an alternative start codon in the mitogenome. For example,
mttB in
Salix purpurea was reported to use an ATT [
81].
3.3. Mitogenome Divergence
The mitogenome of
T. esculentum was highly divergent from those of the six selected legume species, which covered 23% to 48% of the marama mitogenome, ranging in length from 91.9–191.8 kb (
Figure 7). Of these species,
C. canadensis was most closely related to marama, while
M. sativa was the least similar to marama. The mitogenomes of
G. max,
L. japonicus, and
V. radiata all contain homologous sequences covering 25% of the marama mitogenome, equal to 99.89 kb in length.
Bauhinia variegata is closer to marama than
C. canadensis in the phylogenetic tree, but its mitogenome sequence is not available in NCBI GenBank [
82].
The loss of mitochondrial protein-coding genes and the functional transfer of these genes from the organelle genome to the nuclear genome are common in the evolution of angiosperms, but some plants tend to retain a more complete set of mitochondrial genes [
83]. The mitogenomes of the Cercidoideae species,
T. esculentum and
C. canadensis contain functional protein-coding genes
sdh3,
sdh4, and
rpl10, which have been lost in many other legumes (
Table 4). Rare gene losses were also seen in the mitogenomes of some legumes, such as
rpl5 in
G. max,
cox2 in
V. radiata, and
rps1 in
L. japonicus, but these genes all remain intact and functional in
T. esculentum and
C. canadensis [
84,
85,
86].
In a pairwise comparison of the mitogenomes of seven legume species, the synteny plot revealed numerous rearrangements and a high degree of divergence among the mitogenomes of even closely related species (
Figure 8). The mitogenomes of
C. canadensis and
T. esculentum contain many distinct regions but also some long homologous segments.
The phylogenetic tree shown in
Figure 9 was built on the 24 conserved mitochondrial protein-coding genes
atp1,
atp4,
atp6,
atp8,
atp9,
nad3,
nad4,
nad4L,
nad6,
nad7,
nad9,
mttB,
matR,
cox1,
cox3, cob,
ccmFn,
ccmFc,
ccmC,
ccmB,
rps3,
rps4,
rps12, and
rpl16, which are present in all these eight species. This tree is consistent with previously published phylogenetic trees constructed on chloroplast protein-coding genes [
87,
88]. As another species of Cercidoideae,
C. canadensis was expected to be the closest relative in these plants to
T. esculentum. Among the Faboideae species,
M. sativa and
L. japonicus are closely related, and
V. radiata,
G. max, and
M. pinnata belong to another clade.
3.4. Nucleotide Polymorphism
Seventeen haplotypes were found in these 47 plants, which could be clearly divided into two groups: namely, the type 2 plants from the Namibian farm and the UP farm, and the remaining type 1 plants (
Figure 10). The mitogenomes of type 2 plants are relative conserved, which may be caused by sampling errors, and a larger sample size is needed to verify. The only differences between type 2 plants were four deletions at four closely located loci on chromosome LS2. However, the type 1 plants can be divided into many groups based on the variations. Some geographical patterns can be seen in the distribution of variation. For example, in the four plants from Osire, there were three substitutions on chromosome LS1, A > C at 146,140 bp, C > A at 173,053 bp, and C > A at 217,508 bp, and a deletion at 128,192 bp on chromosome LS2. These are variations unique to Osire plants. Similar geographic-specific variation can be seen in plants elsewhere. However, our data may still have sampling issues. For example, only a single sample was collected in some regions including Otjiwarongo and Okamatapati. In addition, although distant wild individuals in each geographic region were intentionally selected, there is no guarantee that they are not related. The findings here still need to be validated by sequencing more samples and studied alongside the phenotypic performance of the plants to determine whether any of these variations are the result of plants evolving to better adapt to different environments.
Type 2 refers to the seven Index plants from the Pretoria Farm excluding Index8 (initially collected from Namibia but the exact location is unknown), 29 M of descendent plants originally from the Namibia Farm excluding M40, and two individuals R1R2 and nar16 of unknown origin. Type 1 represents all remaining plants, including A plants, S plants, Index8, and M40. Numbers in parentheses indicate the counts of exclusive variants of this type. LS1 and LS2 are type 1 marama reference mitochondrial chromosomes in GenBank with accession numbers OK638188 and OK638189.
A total of 254 differential loci were found in the mitogenomes of the 84
T. esculentum individuals, including 143 SNPs, 52 insertions, and 59 deletions (
Table 5). Type 1 and type 2 mitogenomes differed at 230 loci, including 129 substitutions, 50 deletions, and 51 insertions. The mitogenomes of type 2 plants differed at only 4 loci, whereas that of type 1 plants differed at 24 loci.
The mitochondrial gene sequence of
T. esculentum is very conserved. A total of 11 variations were found in the mitochondrial gene sequence, and only one of them was in the coding sequence, which was a 2368 A > G substitution, which resulted in a N303D change in the gene
matR (
Table 6). Furthermore, 10 of the 11 variations were found on one subgenomic ring LS2 of the reference mitogenome of
T. esculentum. Whether chromosome LS1 is more conserved than chromosome LS2 is unknown. Although the intergenic spacer of LS1 contained more variations than LS2, the gene sequence on LS1 appeared to be more conserved, and the cpDNA insertions were also found rarely in LS1 but abundantly in chromosome LS2.
In the phylogenetic tree constructed on the differential loci of the mitogenomes of
T. esculentum, the two types of germplasms fell into two clusters, as expected (
Figure 11). However, using PCR amplification on the newly collected samples, it was found that there may not be obvious differences in geographical distribution between type 1 and type 2 plants. Both types were detected in Otjiwarongo and the Namibia Farm. Among them, three out of eight plants collected in Otjiwarongo were identified as containing the type 2-specific fragment, and 2 out of 13 plants from the Namibian Farm were identified as type 2 individuals.
Furthermore, the type 1 plants were then divided into groups. Plants from Epukiro, Osire, Ombujondjou, and Otjiwarongo belonged to one clade, and they all had GCC at positions 127,684 to 127,686 on chromosome LS1 (OK638188). The remaining plants all had the alternative alleles TAA. However, Sanger sequencing of the newly collected samples found that two out of eight individuals from Otjiwarongo also contained TAA, eight plants from Tsjaka instead contained GCC, 5 out of 13 plants from the Namibian Farm contained TAA, and the remaining eight contained GCC. Thus, this widespread difference in marama mitogenomes may not be geographically restricted either. In addition, despite Tsumkwe and Tsjaka being two distant sampling sites, the plants from these locations were clustered in one clade, as also observed in the phylogenic tree constructed using the complete chloroplast genome. This raises questions about whether there are factors other than geographical distance that are determining the grouping of these plants.
3.5. SSRs and Heteroplasmy Analyses
A total of 48 SSR motifs were found via MISA in the reference mitogenome of
T. esculentum, LS1 (OK638188) and LS2 (OK638189), of which 38 were simple mononucleotide microsatellites, accounting for 79.2% of all discovered SSR motifs (
Figure 12). Among them, 37 are A/T mononucleotide repeats, and only one is a G/C repeat. There are eight dinucleotide repeats and two trinucleotide repeats. No simple sequence repeats with the core motifs of four nucleotides or longer were found. There are three microsatellites in the coding sequence of the gene, including two 10 bp A/T repeats, one located at the boundary of the coding sequence of the gene
sdh3, and the other located in the exon 4 of the gene
nad1, and a 12 bp AT repeat siting in the coding sequence of
mttB. The type 1 mitogenome was dominant in the samples collected from Aminuis, and there was one interesting finding that the individual A11 had approximately 96% type 1 alleles, as well as 4% type 2 alleles at all differential loci between the two types of marama mitogenomes (
Figure 13), which was also reflected in the 2k type 2 extra piece not shown in the figure. It has been confirmed that these minor alleles are not from homologous sequences in the nuclear genome or the chloroplast genome, suggesting the presence of two different mitogenomes in the same individual. It was previously reported that heteroplasmy also exists in the chloroplast genome of A11, with a minor genome frequency reaching 11% [
42]. This means that in the chloroplasts and mitochondria of A11, both major and minor genomes exist, and the frequency of the minor genome is higher than that of the other studied plants. This is most likely to be caused by an accidental occurrence of paternal leakage. The proportions of mitochondrial and chloroplast minor genomes differed in A11, indicating that this may be true heterogeneity rather than due to the accidental mixing of samples. However, this is the only sample with high overall organelle genome heterogeneity, and more plants from Aminuis need to be sequenced and studied, especially from the vicinity of the collection of A11.
In the mitogenomes of other individuals, heteroplasmy was found to be less common than in their chloroplast genomes, and even low proportions of base substitutions below 2% were relatively rare, and even if present, many were found from the mitochondrial homologous segments in the nuclear genome. However, at a few differential loci, including the three consecutive substitutions at positions 127,684 to 127,686 and another substitution at 140,922 on chromosome LS1 (OK638188), obvious heteroplasmy could be seen in multiple individuals, with the proportion of minor alleles even up to 35%. The possible role, if any, of these loci in the evolution of individuals under environmental selection has also not been determined.
3.6. Sequence Transfer between Chloroplast and Mitochondrial Genomes
It was interesting to find that the chloroplast DNA insertions were concentrated in one of the subgenomic rings of the mitogenome (
Figure 14). A low collinearity between the two genomes in these regions of similarity was seen, possibly due to each of the inserted cpDNA fragments being independently transferred. The largest of these fragments was 9798 bp containing the chloroplast pseudogenes
psbC,
rps14,
psaB, and
psaA, and the unchanged transfer RNA genes
trnG,
trnM, and
trnS. A large number of variations were observed between the chloroplast and mitochondrial versions. Primers were designed to amplify across the two ends of this fragment from both the plastome and the mitogenome to verify its presence (
Figure 15). On this long homologous DNA fragment, 20 loci showed differences between types, 13 of which were speculated to occur in the mitogenome and seven in the chloroplast genome (
Table 7). In addition, the two organelle genomes differed at another 72 loci in this segment (this included 22 deletions, 6 insertions, and 44 SNPs), which are the same for both germplasms, making it difficult to tell where these variants arose. Ten variations were found in the gene sequences on this segment, of which only two synonymous substitutions occurred in the chloroplast genome, and the remaining eight were in the mitogenome, some of which may have a strong impact on transcription, including the introduction of early stop codons (
Table S3). However, only the sequences of the protein-coding genes were affected, while the sequences of the tRNA genes remained unchanged.
The mitogenome of T. esculentum was 399,572 bp (type 1) in length, and a total of 254 variations were found in the mitogenomes of the 84 individuals. The length of the chloroplast genome was 161,537 bp (type 1), and a total of 147 variations were found in the plastomes of these plants. The value of variation per nucleotide of the chloroplast genome is higher than that of the mitogenome. However, these chloroplast genes are originally conserved in the chloroplast genome, but after being inserted into the mitochondria, they become prone to mutation accumulation, which may be related to the loss of the original protection mechanism after entering the new environment, thus rendering them nonfunctional pseudogenes.
4. Discussion
The comparative analysis of the organelle genomes of 84
T. esculentum individuals revealed two germplasms with distinct mitochondrial and chloroplast genomes. The type 1 mitogenome contains two autonomous rings or five smaller subgenomic circular molecules with a total length of 399,572 bp. These two equimolar structures are thought to be interchangeable via recombination on three pairs of long direct repeats [
31]. The type 2 mitogenome contains three circular molecules and one linear chromosome with a total length of 436,568 bp. The size of the assembled marama mitogenomes are close to that of some other legumes, 402.6 kb for
G. max, 401.3 kb for
V. radiata, 425.7 kb for
Pongamia pinnata, and 380.9 kb for
L. japonicus [
59,
85,
86]. A mix of linear and circular molecules has also been previously proposed in the mitogenomes of plants such as
Lactuca sativa and
S. tuberosum [
32,
39]. Read alignment revealed a progressive decrease in coverage starting up to 18 kb from the chromosome ends, supporting the presence of linear molecules in the marama mitogenome. Both ends of the marama linear mitochondrial chromosome are long repetitive sequences, also present in other molecules, on which sequence invasion and recombination may occur to protect the linear chromosome from degradation [
78]. The type 2 mitogenome also has a unique fragment of 2108 bp in length, likely derived from the same ancestral sequence as in other legumes such as
Lupinus and
Indigofera. This fragment was not found in any type 1 marama individuals and may be useful for the studies of marama germplasm typing and phylogeny. Since the phenotypic performances vary greatly among wild marama individuals, it is unclear whether these germplasmic genomic differences are associated with any phenotypes. In the future, it is expected that more samples will be collected, and genomic differences will be studied together with phenotypes to facilitate breeding efforts.
The structural variation resulted in an increased copy number of the genes
atp8,
nad5 (exon3 and exon4),
nad9,
rrnS,
rrn5,
trnC, and
trnfM in the type 2 mitogenome, but it is unknown whether this change is reflected in the gene expression level. The genes
nad1,
nad4L,
rps10, and
mttB were found to use alternative start codons ACG and ATT, similar to the mitogenome of the common bean [
79]. Trans-splicing is considered to possibly play an important role in the activities of marama mitochondrial
nad genes including
nad1,
nad2, and
nad5 because the exons of these genes were found to exist in different subgenomic structures, which is consistent with the previous study [
89]. A total of 254 differential loci were found in the mitogenomes of the 84
T. esculentum individuals. Type 1 and type 2 mitogenomes differ from each other at 230 of these loci. However, the mitochondrial gene sequence is very conserved, in which only one of these 254 variations was found, which altered the amino acid sequence synthesized by
matR, whose function is thought to involve the splicing of various group II Introns in Brassicaceae mitochondria [
90].
The evolutionary study of the differential loci in the mitogenomes of
T. esculentum found that the two types of plants fell into two clusters, as expected. The type 1 plants can be further divided into several clades; however, some of these geographic distribution-related differences may arise from sampling errors. For example, the sequencing of a larger number of samples confirmed that the widespread variation in three consecutive substitutions at positions 127,684 to 127,686 on chromosome LS1 (OK638188) may not be related to the geographic distribution of the plants. The phylogenetic tree constructed based on the mitochondrial genes of
T. esculentum and other related Fabaceae species was consistent with the previously published one built on chloroplast genes [
87]. The study found that
C. canadensis and
T. esculentum from the same subfamily Cercidoideae are closely related, and that they have more complete sets of mitochondrial genes than the Faboideae species, including
L. japonicus,
M. sativa,
M. pinnata,
G. max, and
V. radiata. This is in line with previous studies where multiple gene losses occurred during the evolution of the Faboideae species [
91].
Heteroplasmy in the mitochondrial genome of
T. esculentum is not as prevalent as in its chloroplast genome, and most of the alternative alleles were actually found on the nuclear mitochondrial DNA segments [
42]. In the mitogenome, heteroplasmy appears to be stably inherited, with very low levels, generally below 2%, but also with higher levels at certain loci, such as loci 127,684 to 127,686 on chromosome LS1. Among all samples, only one individual A11 from Aminuis had a generally high degree of heteroplasmy at most differential loci, consistent with what was seen in its chloroplast genome. Whether it is because of some of its own characteristics that A11 escaped the genetic bottleneck at the developmental stage is still unclear.
Chloroplast insertions are shown to be concentrated in one of the subgenomic rings of the mitogenome of
T. esculentum with low collinearity. The mitogenomes of all studied marama individuals were found to contain a long chloroplast DNA insertion over 9 kb in length. Polymorphism studies of this fragment implied that the sequence in the chloroplast genome was protected by some mechanism, so only a small number of synonymous substitutions remained, but after being inserted into the mitogenome, a large number of mutations accumulated on it, progressively resulting in the loss of gene function, but the sequence of the tRNA gene has not changed and may still be functional [
92,
93,
94].
5. Conclusions
The comparative analysis of 84 T. esculentum individuals revealed two distinct germplasms with significant differences in both the mitochondrial and chloroplast genomes. Unlike the previously known type 1 mitogenomes, the newly discovered type 2 mitogenomes consist of three circular molecules and one long linear chromosome. The type 2 mitogenomes also possess a unique 2108 bp fragment homologous to the Lupinus mitogenome. The structural variation increased the copy number of certain genes, including nad5, nad9, rrnS, rrn5, trnC, and trnfM, but the impact on their expression level remains uncertain. A total of 254 differential loci were identified in the populations, with 230 differences observed between the two germplasms. However, only one nonsynonymous mutation was found in the coding sequence of matR, indicating a conserved nature of plant mitochondrial genes. The evolutionary relationship between the mitogenome of marama and its related legume species is consistent with relationships built on chloroplast genes. Notably, both C. canadensis and T. esculentum, as species from the Cercidoideae subfamily, tend to possess a more complete set of mitochondrial genes compared to the Faboideae species, suggesting a loss of mitochondria genomic content during the evolution of the modern legume species. Heteroplasmy in the mitogenome of marama is not as prevalent as in its chloroplast genome, but it is concentrated at specific loci, including 127,684 to 127,686 on chromosome LS1, with the cause and effect remaining unclear. The mitogenome of marama contains a long chloroplast DNA insertion with a large number of polymorphisms. The study of this segment reveals the accumulation of mutations, leading to the loss of gene function, which possibly only occurs after the transfer into the new environment.