Characterization and Analysis of the Mitochondrial Genome of Common Bean (Phaseolus vulgaris) by Comparative Genomic Approaches

The common bean (Phaseolus vulgaris) is a major source of protein and essential nutrients for humans. To explore the genetic diversity and phylogenetic relationships of P. vulgaris, its complete mitochondrial genome (mitogenome) was sequenced and assembled. The mitogenome is 395,516 bp in length, including 31 unique protein-coding genes (PCGs), 15 transfer RNA (tRNA) genes, and 3 ribosomal RNA (rRNA) genes. Among the 31 PCGs, four genes (mttB, nad1, nad4L, and rps10) use ACG as initiation codons, which are altered to standard initiation codons by RNA editing. In addition, the termination codon CGA in the ccmFC gene is converted to UGA. Selective pressure analysis indicates that the ccmB, ccmFC, rps1, rps10, and rps14 genes were under evolutionary positive selection. The proportions of five amino acids (Phe, Leu, Pro, Arg, and Ser) in the whole amino acid profile of the proteins in each mitogenome can be used to distinguish angiosperms from gymnosperms. Phylogenetic analyses show that P. vulgaris is evolutionarily closer to the Glycininae than other leguminous plants. The results of the present study not only provide an important opportunity to conduct further genomic breeding studies in the common bean, they also provide valuable information for future evolutionary and molecular studies of leguminous plants.


Introduction
Mitochondria (mt) are semi-autonomous organelles that are part of almost all eukaryotic cells (cells with clearly defined nuclei). Their primary function is to produce a steady supply of adenosine triphosphate (ATP). Mitochondria are thus termed the 'powerhouses' or 'energy factories' of cells. Chloroplasts (cp) and mitochondria most likely originated from formerly free-living bacteria through endosymbiotic acquisition, which can explain the presence of their own genomes [1,2]. With rapid developments in sequencing and genome assembly methods, an increasing number of complete organelle genomes have been assembled in the last decade. Thus far, over 4900 complete chloroplast and plastid genomes have been assembled but only 321 plant mitogenomes have been assembled and deposited in GenBank Organelle Genome Resources (as of 14 May 2020; https://www.ncbi.nlm.nih. gov/genome/browse/), suggesting that their assembly is complex and difficult.
Mitochondria are specific to each plant and have complex genome structures [3][4][5], variable genome sizes [6,7], numerous repetitive sequences [8,9], multiple RNA editing modifications [10,11], and frequent gene gains or losses during evolution [9,12,13]. In seed plant mitogenomes, the genome sizes are highly variable, ranging from an exceptionally small genome of 66 kb in the parasitic plant Brazil [30]. Root tips obtained from germinated seeds were pre-treated with 2 mM 8-hydroxyquinoline for 18 h at 10 • C, fixed in ethanol-acetic acid (3:1 v/v), and stored in fixative at −20 • C for up to several weeks. Total genomic DNA was extracted from root tips using DNAeasy Plant Mini Kits (Qiagen). To construct the shotgun library, DNA was fragmented by nebulization. The raw reads were sequenced with a combination of Roche/454 GS FLX sequencing reads, Illumina HiSeq-2500 sequencing short reads (primarily to correct 454 sequencing errors) and PacBio RS II sequencing long reads (primarily to validate the assembly of the master conformation). The raw reads of P. vulgaris used in this study were available in the NCBI Sequence Read Archive (SRA) under accessions SRR069592, SRR5628227, and SRR2912756.

Mitogenome Assembly and Annotation
An efficient procedure for plant mitochondrial genome assembly using whole-genome data from the 454 GS FLX sequencing platform has been applied in many plants, such as Boea hygrometrica [31], Daucus carota [32], Gossypium raimondii [26], and Salix suchowensis [33]. Briefly, as shown in Figure S1, we first assembled all the Roche/454 GS FLX sequencing reads using Newbler (version 3.0) [34] with the following parameters: -cpu 20, -het, -sio, -m, -urt, -large, and -s 100. Then, we used custom Perl scripts to construct a draft assembly graph from the file "454AllContigGraph.txt" generated from Newbler. As shown in Figure 1, we obtained six contigs to construct the completed draft mitochondrial graph for assembling the P. vulgaris mitogenome. Among the six selected contigs, two (Contig15 and Contig40) were assembled into the mitogenome twice, while the others were assembled only once. To assemble the master conformation (MC), we mapped the PacBio sequencing reads to the mt contigs that spanned repetitive contigs using BLASTN to obtain a major contig relationship map for the repeat regions [35,36].
Specifically, for each repeat pair (Contig15 and Contig40), we built four reference sequences according to Dong et al. [37], each with 200 bp up-and down-stream of the two template sequences (original sequences). Then, we searched the PacBio long reads against the database built up from the reference sequences and extracted the matching reads with a blast identity above 80%, an e-value cut-off of 1e −100 , and a hit length of over 3000 bp. Next, we mapped the best-matched reads to the four reference sequences in MacVector v17.0.7. As shown in Figure 1, we obtained one master genome and two isomeric genomes (ISO) based on the number of PacBio reads that were mapped to both end contigs of the repetitive contigs (Table S1). We then mapped Illumina sequencing reads to the draft MC mitogenome with BWA [38] and SAMtools [39] softwares to correct the homopolymer length errors (especially in A/T enriched regions) from 454 GS FLX Titanium [26]. Finally, the complete mitogenome sequence of P. vulgaris was obtained.
The mitogenome was annotated using the public MITOFY analysis web server (http://dogma.ccbb. utexas.edu/mitofy/) [8]. The putative genes were manually checked and adjusted by comparing them with other legume mitogenomes in MacVector v.17.07. All transfer RNA genes were confirmed by using tRNAscan-SE with default settings [40]. The start and stop codons of PCGs were manually adjusted to fit open reading frames. The relative synonymous codon usage (RSCU) values and amino acid composition of PCGs were calculated by MEGA X [41]. The OrganellarGenomeDRAW (OGDRAW) program was used to visualize the circular map of the P. vulgaris mitogenome [42].

Selective Pressure Analysis
To reflect the selective pressure of PCGs, we calculated the nonsynonymous (K a ) and synonymous (K s ) substitution rates of each PCG between P. vulgaris and other higher plants. Arabidopsis thaliana (A. thaliana; Brassicaceae) is a popular model organism in plant biology and genetics. Citrullus lanatus (C. lanatus; Cucurbitaceae) and Vitis vinifera (V. vinifera; Vitaceae) are highly cultivated fruits worldwide and belong to the Rosids clade, like leguminous plants. Therefore, we selected the mitogenomes of A. thaliana, V. vinifera, and C. lanatus as references to infer the direction and magnitude of natural selection acting on PCGs during the evolution of P. vulgaris. The orthologous gene pairs from P. vulgaris, A. thaliana, V. vinifera, and C. lanatus were aligned and formatted by ParaAT2.0 with default parameters [43]. The K a , K s , and K a /K s values were calculated using KaKs_Calculator v.2.0 based on the YN method, and Fisher's exact test was performed to justify the validity of the K a and K s values [44,45]. The master genome and two isomeric genomes observed from P. vulgaris mitogenome mediated by two pairs of large repeats (Contig15 and Contig40). The mt contigs were generated and selected from Newbler assembly software. MC and ISO mean the master and isomeric conformations, respectively. Arrows denote the sequence orientation of assembled contigs.
Specifically, for each repeat pair (Contig15 and Contig40), we built four reference sequences according to Dong et al. [37], each with 200 bp up-and down-stream of the two template sequences (original sequences). Then, we searched the PacBio long reads against the database built up from the reference sequences and extracted the matching reads with a blast identity above 80%, an e-value cutoff of 1e -100 , and a hit length of over 3000 bp. Next, we mapped the best-matched reads to the four reference sequences in MacVector v17.0.7. As shown in Figure 1, we obtained one master genome and two isomeric genomes (ISO) based on the number of PacBio reads that were mapped to both end contigs of the repetitive contigs (Table S1). We then mapped Illumina sequencing reads to the draft MC mitogenome with BWA [38] and SAMtools [39] softwares to correct the homopolymer length errors (especially in A/T enriched regions) from 454 GS FLX Titanium [26]. Finally, the complete mitogenome sequence of P. vulgaris was obtained.
The mitogenome was annotated using the public MITOFY analysis web server (http://dogma.ccbb.utexas.edu/mitofy/) [8]. The putative genes were manually checked and adjusted by comparing them with other legume mitogenomes in MacVector v.17.07. All transfer RNA genes were confirmed by using tRNAscan-SE with default settings [40]. The start and stop codons of PCGs were manually adjusted to fit open reading frames. The relative synonymous codon usage (RSCU) values and amino acid composition of PCGs were calculated by MEGA X [41]. The Figure 1. The master genome and two isomeric genomes observed from P. vulgaris mitogenome mediated by two pairs of large repeats (Contig15 and Contig40). The mt contigs were generated and selected from Newbler assembly software. MC and ISO mean the master and isomeric conformations, respectively. Arrows denote the sequence orientation of assembled contigs.

Prediction of RNA Editing Sites
The online PREP-Mt (predictive RNA editors for plants) suite of servers (http://prep.unl.edu/) was used to predict the possible RNA editing sites in the PCGs of P. vulgaris and the other four leguminous mitogenomes (G. max, L. japonicus, V. radiata, and M. pinnata). In order to predict more true RNA editing sites, the cut-off for prediction score was set as C = 0.2, which has been proven to be a slight optimum [46]. A low cut-off value will predict more true edit sites but will also increase the probability of misidentifying an unedited site as an edited one.

Phylogenetic Analyses
In order to accurately infer the phylogenetic relationships of P. vulgaris within the Fabaceae family, maximum likelihood (ML) analysis was performed based on the conserved mitochondrial PCGs (amino acid sequences) of 23 higher plants. The NCBI accession numbers and abbreviations of all these observed mitogenomes are listed in Table S2. Apart from the 19 representative Fabaceae species, taxon sampling also included two species of Solanales (C. annuum and N. tabacum) and two species of Malpighiales (P. tremula and S. suchowensis) as outgroups. The single-copy orthologous PCGs common among the 23 analysed species were selected with local Perl scripts. All conserved mitochondrial PCGs were extracted from each mitogenome. The conserved gene sequences from the mitogenome were concatenated into a single dataset and aligned using Muscle software with default settings [50]. Poorly-aligned sequences were deleted or manually adjusted for each alignment. Prior to constructing the phylogenetic tree, we applied MEGA X to determine the most appropriate amino acid substitution model [41]. Based on the model selection results, the ML tree based on a JTT + F model with a gamma distribution was constructed using MEGA X. The bootstrap index value (%) in which the associated taxa clustered together was shown next to the branches and was calculated from 1000 replications.

Genomic Features of the P. vulgaris Mitogenome
The complete genomic sequence of the P. vulgaris mitogenome was submitted to the NCBI Genome Database (https://www.ncbi.nlm.nih.gov/genome/browse/) under accession number NC_045135.  Table S2). In fact, the mitogenome sizes vary considerably among the papilionoid legumes, ranging from 271,618 bp in Medicago truncatula to 588,000 bp in Vicia faba. Mitogenome sizes can vary greatly in different cultivars of the same species. For example, the mitogenome size of G. max Aiganhuang (N21249) is 402,558 bp, whereas that of G. max cultivar Zhonghuang 13 is 513,779 bp [51].
The nucleotide composition of the whole mitogenome is A: 27.37%, C: 22.40%, G: 22.71%, and T: 27.52% (Table 1). The overall GC content is 45.11%, which is consistent with other leguminous plants (G. max: 45.03%, V. faba: 45.04%, and V. radiata var. radiata: 45.11%). Strikingly, the GC content of the PCGs is very small compared to those of other regions. As shown in Table 2, a total of 49 unique genes were detected in the P. vulgaris mitogenome, comprising 31 PCGs, 15 tRNA genes and 3 rRNA genes. However, none of the genes encodes subunits of Complex II (succinate dehydrogenase), which has also been lost in some other leguminous plants. Additionally, two tRNA genes located in repeat sequences were found to contain two or three copies (trnC-GCA and trnfM-CAU). The total lengths of the PCGs and cis-spliced introns comprise 7.26% and 8.24% of the whole mitogenome, while tRNA and rRNA genes only comprise 0.34% and 1.33% of the mitogenome, respectively. Most PCGs have no introns; however, eight genes ( Table 2; nad1, nad2, nad4, nad5, nad7, ccmF C , rps3, and rps10) were found to contain one or more introns. Three genes (nad1, nad2, and nad5) required trans-splicing to assembly fully-translatable mRNA ( Figure 2).  Table 1). The overall GC content is 45.11%, which is consistent with other leguminous plants (G. max: 45.03%, V. faba: 45.04%, and V. radiata var. radiata: 45.11%). Strikingly, the GC content of the PCGs is very small compared to those of other regions. As shown in Table 2, a total of 49 unique genes were detected in the P. vulgaris mitogenome, comprising 31 PCGs, 15 tRNA genes and 3 rRNA genes. However, none of the genes encodes subunits of Complex II (succinate dehydrogenase), which has also been lost in some other leguminous plants. Additionally, two tRNA genes located in repeat sequences were found to contain two or three copies (trnC-GCA and trnfM-CAU). The total lengths

Codon Usage Analysis of PCGs
In the P. vulgaris mitogenome, most of the PCGs use ATG as the start codon, while mttB and nad1 start with ACG (C to U RNA editing on the second site is presumed) as the start codon (Table 3). Four types of stop codons were found in the PCGs: (1) TAA (15 genes; atp4, atp8, atp9, cox1, nad1, nad2, nad3, nad4L, nad5, nad6, nad9, rpl5, rpl16, rps1, and rps4), (2) TGA (10 genes; atp1, ccmB, ccmC, ccmF N , cox3, matR, mttB, nad4, rps10, and rps12), (3) TAG (5 genes; atp6, cob, nad7, rps3, and rps14), and (4) CGA (ccmF C ; C to U RNA editing on the first site is presumed). As shown in Figure 3, the codon usage analysis revealed that leucine (Leu) and serine (Ser) are the most frequently-used amino acid residues, while cysteine (Cys) and tryptophan (Trp) are the least-used amino acid residues in the plant mitochondrial proteins. By comparison of the composition of P. vulgaris with other angiosperms plants, we found that the distribution of amino acid residues across the mitochondrial proteins are very similar in angiosperms ( Figure 3). In addition, most of the amino acid residues were found to be very conserved between angiosperms (P. vulgaris, G. max, L. japonicus, V. radiata, V. faba, A. thaliana, C. lanatus, and T. aestivum) and gymnosperms (Ginkgo biloba and Cycas taitungensis), except for five of them (Phe, Leu, Pro, Arg, and Ser).    The relative synonymous codon usage (RSCU) analysis for the P. vulgaris mitogenome is shown in Figure 4, which indicates that all codons are present in the PCGs. Excluding the termination codons, the 31 PCGs in the P. vulgaris mitogenome consist of 9545 codons in total. Additionally, the codon usage showed that the RSCU values of the NNT and NNA codons are higher than 1.0 except for Ile (AUA) and Leu (CUA; Figure 4), suggesting a strong As or Ts bias in the third codon position of P. vulgaris mitochondrial PCGs, which is a very common phenomenon observed in all studied mitogenomes (Table S3). The codon usage pattern of P. vulgaris mitogenome is highly consistent with two other papilionoid legumes. The distributions of some codons encoding Pro (CCU, CCA, and CCG) differ between dicotyledons (P. vulgaris, G. max, V. angularis, C. lanatus, and A. thaliana) and monocotyledons (T. aestivum), and some codons (UCG, AGU, AGC, CCU, CCG, ACG, CGG, and AGA) are distributed differently between angiosperms and gymnosperms.

Selective Pressure Analysis
In genetics, the K a /K s ratio is useful for inferring the direction and magnitude of natural selection acting on homologous PCGs across diverged species. The ratio is a more powerful test of the neutral model of evolution than many others available in population genetics as it requires fewer assumptions [52]. A K a /K s ratio <1 implies purifying or stabilizing selection (acting against change), while a ratio of >1 implies positive or Darwinian selection (driving change) and a ratio of exactly 1 indicates neutral selection. Importantly, the K a /K s ratio is unlikely to be significantly above 1 without at least some of the mutations being advantageous.
In this study, the K a /K s ratio was determined for all 31 PCGs following comparison of the P. vulgaris mitogenome with those of C. lanatus, V. vinifera and A. thaliana ( Figure 5). Nearly all of the K a /K s ratios were <1.0, suggesting that most of the PCGs were under stabilizing selection during evolution.
Combining the information in Figure 5 and Table 1, the K a /K s ratios of all Complex I-V genes were <1, indicating that these genes were highly conserved in the evolutionary process of higher plants. The large number of mitochondrial genes under stabilizing selection (K a /K s < 1) may play important roles in stabilizing the normal functioning of mitochondria [53,54].
The relative synonymous codon usage (RSCU) analysis for the P. vulgaris mitogenome is shown in Figure 4, which indicates that all codons are present in the PCGs. Excluding the termination codons, the 31 PCGs in the P. vulgaris mitogenome consist of 9,545 codons in total. Additionally, the codon usage showed that the RSCU values of the NNT and NNA codons are higher than 1.0 except for Ile (AUA) and Leu (CUA; Figure 4), suggesting a strong As or Ts bias in the third codon position of P. vulgaris mitochondrial PCGs, which is a very common phenomenon observed in all studied mitogenomes (Table S3). The codon usage pattern of P. vulgaris mitogenome is highly consistent with two other papilionoid legumes. The distributions of some codons encoding Pro (CCU, CCA, and CCG) differ between dicotyledons (P. vulgaris, G. max, V. angularis, C. lanatus, and A. thaliana) and monocotyledons (T. aestivum), and some codons (UCG, AGU, AGC, CCU, CCG, ACG, CGG, and AGA) are distributed differently between angiosperms and gymnosperms. .

Selective Pressure Analysis
In genetics, the Ka/Ks ratio is useful for inferring the direction and magnitude of natural selection acting on homologous PCGs across diverged species. The ratio is a more powerful test of the neutral model of evolution than many others available in population genetics as it requires fewer assumptions [52]. A Ka/Ks ratio <1 implies purifying or stabilizing selection (acting against change), while a ratio of >1 implies positive or Darwinian selection (driving change) and a ratio of exactly 1 indicates neutral selection. Importantly, the Ka/Ks ratio is unlikely to be significantly above 1 without at least some of the mutations being advantageous.
In this study, the Ka/Ks ratio was determined for all 31 PCGs following comparison of the P. vulgaris mitogenome with those of C. lanatus, V. vinifera and A. thaliana ( Figure 5). Nearly all of the Ka/Ks ratios were <1.0, suggesting that most of the PCGs were under stabilizing selection during evolution. Combining the information in Figure 5 and Table 1, the Ka/Ks ratios of all Complex I-V genes were <1, indicating that these genes were highly conserved in the evolutionary process of higher plants. The large number of mitochondrial genes under stabilizing selection (Ka/Ks < 1) may play important roles in stabilizing the normal functioning of mitochondria [53,54].  As shown in Figure 5, the Ka/Ks ratios of ccmB were > 1 between P. vulgaris and all of the three selected species, indicating that ccmB may have suffered from positive selection since divergence from their last common ancestor. Particularly, the Ka/Ks ratio of ccmB between P. vulgaris and V. vinifera was significantly >1 (4.01), suggesting that some advantage occurred during evolution. Additionally, the Ka/Ks ratios of ccmFC, rps1, rps10, and rps14 were also >1, indicating that these genes were under positive selection after divergence of the last common ancestor. Since CcmB and ccmFC genes encode for some important components of the c-type cytochrome maturation pathway in mitochondria, we speculate that the adaptive evolution of P. vulgaris is closely related to the roles of c-type cytochromes in respiratory and photosynthetic electron transport [55][56][57]. Additionally, rps1, rps10, and rps14 genes encode small mitoribosomal subunit proteins, which have been reported to play crucial roles in various biological processes in eukaryotic organisms, such as embryogenesis, leaf morphogenesis, and the formation of reproductive tissues [58][59][60]. The high Ka/Ks ratios of rps genes observed here may be very important for the evolution of P. vulgaris. Ka/Ks ratios >1 have also been reported for some other mitochondrial genes, including atp8, ccmFN, matR, and mttB [26,33,61,62], indicating that mitochondrial genes in different plant species may be subjected to diverse selection pressures during evolution. Most importantly, the Ka/Ks ratio of the orthologous gene-pairs is an average over all sites and, even under positive selection, it can be < 1 because some Figure 5. K a /K s ratios for 31 protein coding genes of P. vulgaris, C. lanatus, V. vinifera, and A. thaliana. The blue, orange, and gray boxes indicate K a /K s ratios of P. vulgaris vs. C. lanatus, P. vulgaris vs. V. vinifera, and P. vulgaris vs. A. thaliana. Figure 5, the K a /K s ratios of ccmB were >1 between P. vulgaris and all of the three selected species, indicating that ccmB may have suffered from positive selection since divergence from their last common ancestor. Particularly, the K a /K s ratio of ccmB between P. vulgaris and V. vinifera was significantly >1 (4.01), suggesting that some advantage occurred during evolution. Additionally, the K a /K s ratios of ccmF C , rps1, rps10, and rps14 were also >1, indicating that these genes were under positive selection after divergence of the last common ancestor. Since CcmB and ccmF C genes encode for some important components of the c-type cytochrome maturation pathway in mitochondria, we speculate that the adaptive evolution of P. vulgaris is closely related to the roles of c-type cytochromes in respiratory and photosynthetic electron transport [55][56][57]. Additionally, rps1, rps10, and rps14 genes encode small mitoribosomal subunit proteins, which have been reported to play crucial roles in various biological processes in eukaryotic organisms, such as embryogenesis, leaf morphogenesis, and the formation of reproductive tissues [58][59][60]. The high K a /K s ratios of rps genes observed here may be very important for the evolution of P. vulgaris. K a /K s ratios >1 have also been reported for some other mitochondrial genes, including atp8, ccmF N , matR, and mttB [26,33,61,62], indicating that mitochondrial genes in different plant species may be subjected to diverse selection pressures during evolution. Most importantly, the K a /K s ratio of the orthologous gene-pairs is an average over all sites and, even under positive selection, it can be <1 because some sites might be under positive selection while others are under purifying selection [53,61,63].

Prediction of RNA Editing Sites in PCGs
Many previous studies have documented that RNA editing is one of the necessary steps for gene expression in the mitochondrial and chloroplast genomes of higher plants [64][65][66][67]. RNA editing is a post-transcriptional modification that converts specific cytidines (C) to uridines (U) and uridines to uridines in the transcripts of nearly all mitochondrial PCGs. Based on the web-based PREP-mt program, we predicted a total of 486 RNA editing sites in 31 PCGs and 100% C-to-U RNA editing. Among the 486 RNA editing sites, 34.57% (168 sites) were predicted at the first base position of the codon and 65.43% (318 sites) were found in the second position, while none were found in the third position. The lack of predicted RNA editing sites in the silent position is probably due to the limitation of the PREP-Mt predictive methodology rather than there being no RNA editing in this position. Since most of the RNA editing sites in third codon positions did not change the amino acid encoded by the codon, the tie-breaking rules used by PREP-Mt could not select the edited state [68]. Therefore, RNA editing in the silent editing position needs to be further identified by experimental methods.
The occurrence of RNA editing can cause alteration of initiation and termination codons in PCGs, and the frequency of their generation is much higher than that of their removal. As shown in Table 3, mttB, nad1, nad4L, and rps10 genes use ACG as their initiation codons, which may be altered to the normal AUG by RNA editing modification. Additionally, the ccmF C gene uses CGA as its termination codon, which may be altered to UGA by RNA editing modification. As shown in Figure 6, the number of RNA editing sites in different genes varies greatly, and the Complex I (NADH dehydrogenase) and Cytochrome c biogenesis genes (ccmB, ccmC, ccmF C , and ccmF N ) encode the most predicted RNA editing sites. Based on a comparison of the predicted RNA editing sites in five leguminous plants, the nad4 gene encodes the most RNA editing sites, while atp1 encodes the fewest ( Figure 6). The occurrence of RNA editing can cause alteration of initiation and termination codons in PCGs, and the frequency of their generation is much higher than that of their removal. As shown in Table 3, mttB, nad1, nad4L, and rps10 genes use ACG as their initiation codons, which may be altered to the normal AUG by RNA editing modification. Additionally, the ccmFC gene uses CGA as its termination codon, which may be altered to UGA by RNA editing modification. As shown in Figure  6, the number of RNA editing sites in different genes varies greatly, and the Complex I (NADH dehydrogenase) and Cytochrome c biogenesis genes (ccmB, ccmC, ccmFC, and ccmFN) encode the most predicted RNA editing sites. Based on a comparison of the predicted RNA editing sites in five leguminous plants, the nad4 gene encodes the most RNA editing sites, while atp1 encodes the fewest ( Figure 6). Previous studies have shown that the frequency and type of RNA editing in each organelle is highly lineage-specific [26,65,69,70]. As shown in Figure 6, the number of predicted RNA editing sites in different papilionoid legume mitogenomes is very conserved, ranging from 486 sites in P. vulgaris to 503 sites in Lotus japonicus, suggesting that they share extremely conserved PCGs. In angiospermous mitogenomes, nearly all of the RNA editing sites are C to U, and the number of RNA editing sites is concentrated between 400 to 500. For example, 463 and 444 RNA-editing sites were found in the C. lanatus and C. pepo mitogenomes, of which 394 are shared [8]; 441 and 427 RNAediting sites were found in A. thaliana and B. napus mitogenomes, of which 358 are shared [10]. In the Previous studies have shown that the frequency and type of RNA editing in each organelle is highly lineage-specific [26,65,69,70]. As shown in Figure 6, the number of predicted RNA editing sites in different papilionoid legume mitogenomes is very conserved, ranging from 486 sites in P. vulgaris to 503 sites in Lotus japonicus, suggesting that they share extremely conserved PCGs. In angiospermous mitogenomes, nearly all of the RNA editing sites are C to U, and the number of RNA editing sites is concentrated between 400 to 500. For example, 463 and 444 RNA-editing sites were found in the C. lanatus and C. pepo mitogenomes, of which 394 are shared [8]; 441 and 427 RNA-editing sites were found in A. thaliana and B. napus mitogenomes, of which 358 are shared [10]. In the gymnosperm Cycas taitungensis, 1084 RNA editing sites were found in its mitogenome [71]. The clearly descending number of RNA editing sites is in accordance with gene losses from gymnosperms to angiosperms. In contrast to angiosperms and gymnosperms, both types of C-to-U and U-to-C conversions are found in the mitochondrial transcripts of ferns and hornworts [69,72].

Analysis of Repeat Sequences in the P. vulgaris Mitogenome
The vast majority of the variance in genome size of plant mitogenomes can be explained by differences in the sizes of repeat sequences, which are composed of SSRs, tandem repeats and dispersed repeats. Plant mitogenomes, particularly those of angiosperms, were already well known for its sizeable fractions of repetitive sequences even before any complete mitogenomes were available. SSRs are DNA tracts of tandem-repeated motifs of one to six bases that are useful molecular markers in studying genetic diversity and species identification [21]. In this study, a total of 314 perfect SSRs were identified in the P. vulgaris mitogenome, including 139 mono-, 140 di-, 5 tri-, 22 tetra-, 3 penta-, and 5 hexa-nucleotide repeats ( Table 4). The mononucleotide repeats of A/T (129 repeats) were found to be more prevalent than other repeat types. The dinucleotides repeats, TA/AT, are the second most numerous (50 repeats), while tri-, tetra-, penta-, and hexa-nucleotide repeats are fewer in number and only observed in intronic or intergenic regions. As shown in Table 5, seven tandem repeats with lengths ranging from 13 bp to 57 bp were also detected in the P. vulgaris mitogenome. Among these seven tandem repeats, only one is localized in a coding region (rrnL), while the others are all found in intergenic spacers.
Besides SSRs and tandem repeats, 143 dispersed repeats with lengths > 30 bp (total length: 35,000 bp; 8.85% of the genome) were also identified in the P. vulgaris mitogenome (Figure 7; Table S4). Most of the repeats (77 repeats, 53.85%) are 30 bp to 59 bp long, and 25 repeats are longer than 100 bp, with only two longer than 1 kb (R1: 4866 bp; R2: 3529 bp). Previous studies have documented the importance of large repeats (>1 kb) in genomic structural changes, and pairwise direct and inverted large repeats may produce two small subgenomic conformations or isomeric conformations, respectively. As shown in Figure 1, the largest repeat was assembled as Contig15 and the second largest was assembled as Contig40, both of which were inverted repeats. By aligning the PacBio long reads to both ends of the two large repeats, we constructed the master circle and two isomeric molecules ( Figure 1). Repeats are commonly found in plant mitogenomes but are poorly conserved across species, even within the same family. As shown in Figure 7 and Table S4, the total number of repeats ranges from 59 in V. angularis to 215 in M. pinnata, and the total length of repeats ranges from 9224 bp (2.28% of the whole genome) in V. angularis to 411,265 bp (69.94% of the whole genome) in V. faba. Mitogenome enlargement in V. faba is mainly caused by the expansion of repeated sequences. Thirteen large (>1 kb) repeats covered 398.8 kb or 68% of the whole mitogenome size [17]. However, when all but single copies of the large repeat sequences were excluded, the V. faba mitogenome size is 388.6 kb, which is similar to other Papilionoideae mitogenomes [18]. The extremely complex repeat patterns should be responsible for the various genome sizes of the plant mitogenome. However, genome size is by no means only determined by the size of repeats. The mitochondrial genome of Vitis vinifera has only 7% repeats despite a genome size of nearly 773 kb [73], while the moderately-sized (404.5 kb) V. angularis genome has fewer and smaller repeats than those found in the much smaller genomes of Brassica napus (222 kb) and Silene latifolia (253 kb; Table S2) [10,74].  is 388.6 kb, which is similar to other Papilionoideae mitogenomes [18]. The extremely complex repeat patterns should be responsible for the various genome sizes of the plant mitogenome. However, genome size is by no means only determined by the size of repeats. The mitochondrial genome of Vitis vinifera has only 7% repeats despite a genome size of nearly 773 kb [73], while the moderatelysized (404.5 kb) V. angularis genome has fewer and smaller repeats than those found in the much smaller genomes of Brassica napus (222 kb) and Silene latifolia (253 kb; Table S2) [10,74].

Phylogenetic Analyses and Multiple Losses of PCGs during Evolution
With rapid developments in sequencing technology and assembly methods, an increasing number of complete plant mitogenomes has been assembled, providing an important opportunity for phylogenetic analyses using mitogenomes. In this study, to determine the phylogenetic position of P. vulgaris, we downloaded 23 plant mitogenomes from the GenBank database (https://www.ncbi.nlm. nih.gov/genome/browse/), including 19 species of Fabales, two species of Solanales, and two species of Malpighiales. A set of 26 conserved single-copy orthologous genes (atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, ccmF C , ccmF N , cob, cox1, cox3, matR, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, rps3, rps4, and rps12) present in all of the 23 analyzed mitogenomes was used to construct the phylogenetic tree, and species from the Solanales and Malpighiales were designated as the outgroup. As shown in Figure 8, the bootstrap values of each node are all over 70% supported and 15 nodes are supported 100%. The ML phylogenetic tree strongly supports that P. vulgaris is evolutionarily close to the clade formed by two Vigna species. The tree also strongly supports the separation of Fabales from the clade composed of Solanales and Malpighiales (100% bootstrap value), as well as the separation of Papilionoideae from the clade composed of Cercidoideae, Detarioideae, and Caesalpinioideae (100%). The bootstrap value for the separation of Detarioideae and Caesalpinioideae is 80%, and the value for the separation of Cercidoideae from the clade composed of Detarioideae and Caesalpinioideae is 70%.
As described by Richardson et al. [75], the mitochondrial genomes of higher plants vary significantly in genome size, gene content and order. Losses of PCGs occurred frequently during the evolution of higher plants. The phylogenetic tree provides a backdrop for the further analysis of gene loss during evolution, and the gene contents of all observed species are summarized in Figure 9. Most of the PCGs were conserved in different plant mitogenomes, especially for the genes in the groups of Complex I, Complex III, Complex V, cytochrome c biogenesis, maturases, and transport membrane protein [13]. The conservation of these genes suggests that they play crucial roles in the function of mitochondria. However, the ribosomal proteins and succinate dehydrogenase genes were highly variable. As shown in Figure 9, the cox2 gene was only lost in the subfamily Phaseolinae (V. angularis, V. radiata, and P. vulgaris) but retained in other leguminous plants, suggesting that this gene was lost after separation from the subfamily Glycininae. The rpl2 gene was lost in most leguminous plants but regained in A. ligulate, L. trichandra, H. brasuletto, and L. coriaria, suggesting that this gene was lost before the emergence of Fabales but could be regained in some leguminous plants. Similar phenomena were found in many ribosomal proteins (rpl10, rpl16, rps7, rps10, and rps19). Additionally, rpl6 and rps8 genes were lost from liverworts (M. polymorpha) during evolution [76], the rps11 gene was lost from gymnosperms (G. biloba) and liverworts during the divergence of the angiosperms and gymnosperms [77], and the rpl10 gene was lost in monocots and gymnosperms but regained in dicots [33,78]. The enhanced loss of ribosomal proteins in plant mitogenomes indicates that these genes were encoded partly by mitochondrial native genes and partly by nuclear genes, due to the gene transfer between mitochondria and nucleus [79][80][81]. nad5, nad6, nad7, nad9, rps3, rps4, and rps12) present in all of the 23 analyzed mitogenomes was used to construct the phylogenetic tree, and species from the Solanales and Malpighiales were designated as the outgroup. As shown in Figure 8, the bootstrap values of each node are all over 70% supported and 15 nodes are supported 100%. The ML phylogenetic tree strongly supports that P. vulgaris is evolutionarily close to the clade formed by two Vigna species. The tree also strongly supports the separation of Fabales from the clade composed of Solanales and Malpighiales (100% bootstrap value), as well as the separation of Papilionoideae from the clade composed of Cercidoideae, Detarioideae, and Caesalpinioideae (100%). The bootstrap value for the separation of Detarioideae and Caesalpinioideae is 80%, and the value for the separation of Cercidoideae from the clade composed of Detarioideae and Caesalpinioideae is 70%.

Conclusions
In this study, we first assembled and characterized the complete mitogenome of P. vulgaris. By aligning the PacBio sequencing reads to the draft mitogenome, one master circle and two isomeric

Conclusions
In this study, we first assembled and characterized the complete mitogenome of P. vulgaris. By aligning the PacBio sequencing reads to the draft mitogenome, one master circle and two isomeric molecules were assembled based on two large repeats. Selective-pressure analysis of PCGs indicates that ccmB, ccmF C , rps1, rps10, and rps14 genes with K a /K s ratios > 1 might play important roles during evolution, whereas all Complex I-V genes with K a /K s ratios < 1 were highly conserved in the evolutionary process of higher plants. The C-to-U conversions may generate initiation, termination, or internal codons with completely unpredictable functions. The prediction of RNA editing sites in P. vulgaris mt PCGs will provide important clues for the investigation of gene functions with novel codons. The comparison of genomic features in all sequenced leguminous plants should contribute to a comprehensive understanding of the evolutionary process of legumes. The sequencing of the P. vulgaris mitogenome not only provides an important opportunity to conduct further genomic breeding studies in the common bean, it also provides valuable information for future evolutionary and molecular studies of leguminous plants.