Allelic Diversity of Acetyl Coenzyme A Carboxylase accD/bccp Genes Implicated in Nuclear-Cytoplasmic Conflict in the Wild and Domesticated Pea (Pisum sp.)

Reproductive isolation is an important component of species differentiation. The plastid accD gene coding for the acetyl-CoA carboxylase subunit and the nuclear bccp gene coding for the biotin carboxyl carrier protein were identified as candidate genes governing nuclear-cytoplasmic incompatibility in peas. We examined the allelic diversity in a set of 195 geographically diverse samples of both cultivated (Pisum sativum, P. abyssinicum) and wild (P. fulvum and P. elatius) peas. Based on deduced protein sequences, we identified 34 accD and 31 bccp alleles that are partially geographically and genetically structured. The accD is highly variable due to insertions of tandem repeats. P. fulvum and P. abyssinicum have unique alleles and combinations of both genes. On the other hand, partial overlap was observed between P. sativum and P. elatius. Mapping of protein sequence polymorphisms to 3D structures revealed that most of the repeat and indel polymorphisms map to sequence regions that could not be modeled, consistent with this part of the protein being less constrained by requirements for precise folding than the enzymatically active domains. The results of this study are important not only from an evolutionary point of view but are also relevant for pea breeding when using more distant wild relatives.


Introduction
Reproductive isolation is an important component of species differentiation. Mechanisms that create reproductive barriers between once-conspecific organisms have long been a focus of evolutionary biology [1]. Although geographical separation plays a vital role in speciation [2], and taxonomically widespread among plant species. Plastids can also contribute to nucleo-cytoplasmic incompatibility. Although cytonuclear chlorosis or albinism of hybrids is not as common as CMS, these have been widely observed, and their implications for speciation were recognized early on [32][33][34][35]. The role of plastids in speciation processes is known from species with a biparental mode of plastid inheritance, e.g., Geranium, Pelargonium and Medicago [36], and mainly from genus Oenothera, which became one of the models for studying plant evolution [24]. Various incompatible phenotypes have also been reported from Rhododendron, Hypericum, Trifolium, Zantedeschia, and Pisum [24]. Cyto-nuclear co-adaptation has been described in Arabidopsis thaliana [18] and demonstrated to affect its adaptive traits [37]. Interestingly, crop domestication may also increase the likelihood that genes causing incompatibility become fixed in the population through genetic hitchhiking [38].
The plastid accD gene coding for the acetyl-CoA carboxylase beta subunit and the nuclear gene bccp coding for the biotin carboxyl carrier protein of acetyl-CoA carboxylase were nominated as candidate genes responsible for nuclear-cytoplasmic incompatibility in peas based on data from crosses between wild and domesticated pea forms [39]. Incompatible hybrids exhibit chlorophyll deficiency, reduction of leaf size low pollen fertility, low seed set, and poorly developed roots [40]. The acetyl-CoA carboxylase (ACCase) complex is involved in the biosynthesis of fatty acids, which takes place in the plastids [40]. ACCase belongs to a group of biotin dependent carboxylases, catalyzing acetyl-coenzyme A carboxylation to malonyl coenzyme A and providing the only entry point for all carbon atoms in the fatty acid synthesis pathway [41]. Uniquely in Eukaryota, plants have two distinct ACCases: one eukaryotic-like homomeric multidomain ACCase in the cytosol and a bacterial-like heteromeric ACCase within the plastids [41]. The heteromeric form of ACCase is found in prokaryotes and the plastids of Viridiplantae. Presumably, all genes encoding ACCase subunits initially resided in the plastid genome after the original endosymbiotic event in algae and underwent sequential transfer to the nuclear genome [42]. Plastid ACCase participates in fatty acid synthesis, whereas the cytosolic enzyme is engaged in the synthesis of very long chain fatty acids, phytoalexins, flavonoids, and anthocyanins. Plastid-localized ACCD enzyme is responsible for catalyzing the initial tightly-regulated and rate-limiting step in fatty acid biosynthesis. Nuclear encoded Biotin Carboxyl Carrier Protein (BCCP) is a part of the enzyme Acetyl-CoA carboxylase complex and serves as a carrier protein for biotin and carboxybiotin throughout the ATP-dependent carboxylation of acetyl-CoA to form malonyl-CoA. The resulting Acetyl-CoA carboxylase is a heterohexamer composed of the biotin carboxyl carrier protein, biotin carboxylase, and two subunits each of the ACCase subunit alpha and the ACCase plastid-coded subunit beta [40].
The plastid ACCase of legumes (Papilionoideae) consists of four subunits, each coded by a separate gene: biotin carboxylase (accC), biotin carboxyl carrier protein (accB=bccp), alpha-carboxyltransferase (accA), and beta-carboxyltransferase (accD). The genes coding accC, accB, and accA are localized in the nuclear genome, whereas the accD gene is localized in the plastid genome [42]. Multiple independent lineages have experienced accelerated rates of substitution in similar subsets of non-photosynthetic genes, including accD (in legumes [43][44][45] and in Oleaceae [46]). In Silene (Caryophyllaceae) species with accelerated plastid genome evolution, the nuclear-encoded subunits of the ACCase complexes are also evolving rapidly, indicating a strong positive selection [47]. Such patterns of molecular evolution in these plastid-nuclear complexes are unusual for ancient conserved enzymes but resemble cases of antagonistic coevolution between pathogens and host immune genes. Genetic characterization of hybrid necrosis in crosses between tomato species [48] and between Arabidopsis ecotypes [49,50] has revealed that incompatibilities among complementary disease resistance genes might play such a role in the evolution of hybrid inviability [51].
In this work, we explored the allelic diversity of accD/bccp in the geographically diverse set of wild pea (Pisum sp.). The accD/bccp are recently identified genes underlying nuclear-cytoplasmic incompatibility in Pisum sp. [39]. We sought to map the allelic combinations of accD/bccp occurring in nature to determine geographic patterns in their distribution, and to identify possible relationships to pea genetic diversity.

Structure and Variation of accD Gene
The accD gene is located between positions 70,882 and 72,654 in the P. sativum cv. Feltham First (HM029370) reference chloroplast genome, resulting in a 1772 bp DNA encoding a protein of 432 amino acid residues. The primers used in our study were designed to match the most conserved region and were located close to the ends of the accD coding sequence. Consequently, we did not capture the very 5 and 3 end of the coding sequence due to quality trimming. The beginning and end of the accD sequence, comprising 48 nt from the start codon and 58 nt from the stop codon, consequently missing the first 16 and last 19 codons, were thus excluded from the subsequent analysis.
The length of the accD gene within our studied material ranged from 1403 bp to 1859 bp at DNA level and from 467 to 619 amino acid residues, respectively (GenBank accession numbers MK619486-MK619678). In the studied set of 195 accessions, there was extraordinary variation in the gene length, due to the occurrence of 13 indels whose length varied between 3 and 167 nucleotides. This variation is due to insertions consisting of tandem repeats of 10-150 bp units present in 1 to 37 nearly identical copies, all in the same (i.e., direct) orientation relative to each other ( Figure 1). The repetitive sequences can be divided into 6 categories. In the shortest 1403 bp allele (JI1010, P. fulvum) there are four, three, and one repeats of 9 to 12 bp long. These expand in the longest 1859 bp allele (JI267, P. elatius), which has 37 repeats of 10 to 33 bp, 1 repeat of 57 bp, 1 repeat of 102 bp, and 1 repeat of 149 bp. We identified the main five longest tandem repeats blocks, which consist of two or three individual blocks of different lengths and degrees of identity. These blocks are not identical and contain many nucleotide changes and triplet duplications. Such repeats were identified by the presence of small, almost identical blocks, that are part of larger tandem repeats. The first tandem repeat block is the most complex and most degenerate, consisting of three sequential blocks (highlighted in yellow in Figure 1, Figure S1). These blocks are of different lengths and are degenerate to varying degrees from each other. The most similar are first two blocks (89%), which differ by 3 amino acids and by the insertion L-I-L-I for a total of 64 amino acid residues. Characteristic for this tandem repeat is the presence of multiple duplications of three amino acids D-T-N alone or together with D-I-S. The complex, degenerate, and mixed tandem repeat is also the penultimate (3 and 4 grey blocks). This tandem repeat has multiple duplications of five amino acid stretch of S-E-E-E-K. The remaining repeats consist of two blocks separated from each other by 7 or 9 amino acids ( Figure 1).  - DRDDIYETNIKHIWERYSEIYRRNREKSTFVTIDYSDPNCMEKLARLWVQCKTCYGLNFQQFFRPKMNICEHCGEHLKMSSSDRIDLSIDRDTWNPMDEDMVSLDPIQFDS -IKELSSED   370  380  390  400  410  420  430  440  450  460  470 Figure S1).

Variation in Nuclear bccp Gene
The predicted ORF of the bccp gene encoding the biotin carboxyl carrier protein of P. sativum cv. Cameor from the pea RNA atlas is 873 bp long and encodes a protein of 290 amino acids. In the pea RNA atlas, this is represented by the ubiquitously expressed PsCam051640 transcript, which corresponds to Tayeh et al. (2015) map PsCam051640 at LGIII. The genomic DNA extracted from the shotgun genome sequence is 5906 bp, with 9 exons interspersed by 8 introns (Exon 1 is 234 bp, exon 2 is 206 bp, exon 3 is 76 bp, exon 4 is 54 bp, exon 5 is 262 bp, exon 6 is 62 bp, exon 7 is 69 bp, exon 8 is 46, and exon 9 is 265 bp). The respective introns are 1170, 541, 263, 874, 111, 856, 84, and 733 bp. The following analysis was conducted on cDNA, avoiding introns. The detected polymorphism, thus, only concerns the coding sequence, and is correspondingly lower than that expected for the complete locus. Notably, to obtain sufficient PCR product we had to perform out two consecutive nested PCR amplifications. This likely reflects the relatively low expression level of the gene in young leaf tissue. There were altogether 39 variable positions and no indels in a total of 195 studied accessions (NCBI accession numbers MK644626-MK644819). These identified 31 protein bccp variants (Table S1). Sixteen analyzed P. fulvum accessions had three bccp alleles (bccp1/2/3) separated by 4 to 10 amino acid changes from the nearest P. elatius alleles. From domesticated P. sativum landraces (60 acc.), 16 had the bccp_22, and six had the bccp_18 allele. From the independently domesticated Ethiopian pea P. abyssinicum (24 acc.), 19 had the specific bccp_26 allele, shared with two P. elatius accessions (PI343978, PI343979 from Turkey), four had the bccp_20 allele, separated by one or two amino acid exchanges from nearest P. elatius. Ninety-five analyzed P. elatius accessions had the largest diversity (all together 28 distinct bccp alleles, Table S1).

Network and Maximum Parsimony Analyses
Various approaches in the visualization of the data through networks and maximum parsimony (MP) analysis produced a very similar view, with only minimal differences. For further interpretation of clustering of identified alleles into larger groups, the consensus maximum parsimony tree method was used. This produced a very similar clustering of alleles as inspected networks (Median network, NeighborNet, SplitDecomposition networks; not shown). The MP analysis found 18 equally parsimonious trees for the accD gene (length 73 steps) and 19 for the bccp (42 steps) ( Figure 2, Figure 3). The resulting trees contained several polytomies. This is because of a large part of the total sequence variability being due to indels in the case of accD, and this information was not included into the MP analysis. In addition, a number of homoplasious mutations were also excluded, with the resulting trees contained several polytomies. However, as we were not interested in the assessment of the gene phylogeny, we did not try to interpret these polytomies. Produced clades (with a rather high bootstrap support) were very similar to groups inferred from the network analyses. Based on the similarity in the grouping of alleles between inspected networks and the MP analysis, the groups of alleles were inferred from the consensus MP tree for both investigated genes. For the accD gene 10 groups (A-J) were inferred; 15 groups were inferred for the bccp gene (A-O) were inferred ( Figure 2, Figure 3, Table S1). The accD gene group D (comprising alleles accD_13 and accD_14) was specific for P. abyssinicum, except for one sample of P. sativum from Montenegro (accession n • PI357292), which also possessed the accD_14 allele. Group F (comprising alleles accD_17/18/19/20/21) was specific for P. fulvum. Accessions of P. elatius and landraces of P. sativum were represented by multiple alleles belonging to different groups.
In the case of the bccp gene, P. abyssinicum was represented by groups J (allele bccp_26) and G (alleles bccp_20/22). However, in contrast to the accD gene, inferred alleles were not specific for P. abyssinicum, but were also found within P. elatius and samples of P. sativum ( Figure 3, Table S1). The three identified alleles observed for P. fulvum (bccp_1/2/3) clustered together and represented group A. Two of these alleles were specific (bccp_1/2) for P. fulvum and one (bccp_3) was shared with two samples of P. sativum from Greece (JI1525 and JI2573). The identified alleles for the investigated accessions of P. elatius fall within 12 groups and for P. sativum within six groups, which were shared between these two species (Table S1)  Branch coloring follows the species presence of particular alleles: olive green = alleles observed only within P. fulvum; grey = alleles shared among P. fulvum and P. elatius; orange = alleles shared among P. sativum and P. elatius; red = alleles observed only for P. sativum; turquoise = alleles shared among P. abyssinicum and P. sativum; yellow = alleles observed only within P. abyssinicum; blue = alleles observed only within P. elatius. Bootstrap support ≥ 50 is shown above branches.  Branch coloring follows species presence of particular alleles: olive green = alleles observed only within P. fulvum; magenta = alleles shared among P. fulvum and P. sativum; orange = alleles shared among P. sativum and P. elatius; red = alleles observed only for P. sativum; green = alleles shared among P. abyssinicum, P. sativum and P. elatius; blue = alleles observed only within P. elatius. Bootstrap support ≥ 50 is shown above branches.

Frequency of Amino Acid Substitutions and Their Distribution
Analysis We next attempted to investigate the location of the individual amino acid substitutions, and the conspicuous indels found in accD. This was performed with respect to the 3D folding of both ACCD and BCCP proteins, to the extent that we were able to predict their spatial structure by threading on experimentally characterized related templates. We could produce only partial models for both proteins (File S1, S2 For ACCD, the model covered approximately 43% of the sequence, corresponding to the C-terminal portion of the protein). The N-terminal region and an additional loop within the modelled segment were disordered in the prediction. For the BCCP protein, approximately 45% of the sequence was covered by the best templates but only two short separate fragments from this domain could be reliably modeled; the rest of the molecule was disordered in the prediction (Figure 4, Table 1). Asterisks denote significant differences in the frequency of the given category of mutations in non-modelled (disordered) parts of the protein compared to the modelled ones (*-p < 0.05, **-p < 0.01).
Remarkably, mapping of the identified protein sequence polymorphisms revealed that most of the above-described repeat and indel polymorphisms in the ACCD sequence map to sequence regions could not be modelled due to the lack of suitable templates and intrinsic disorder. This is consistent with this part of the protein being less constrained by requirements for precise folding than the enzymatically active domain. Point mutations were also somewhat enriched in the part of the ACCD protein that was not modeled. However, no such bias was detected for BCCP ( Figure 4, Table 1).  Asterisks denote significant differences in the frequency of the given category of mutations in nonmodelled (disordered) parts of the protein compared to the modelled ones (*-p < 0.05, **-p < 0.01).

Relationship to Pisum Genetic Diversity
Having previously analyzed genetic diversity based on genome-wide sampled polymorphism [52,53], we examined the distribution of both accD and bccp alleles within respective genetic groups. Cultivated Pisum sativum accessions can be divided into two (nr. 3 and 6) equally abundant (24 and, 27 accessions, respectively) groups. The independently domesticated Ethiopian pea ( P. abyssinicum) forms a separate (nr. 7) group (Table S1). With respect to accD/bccp alleles, accD_29 and bccp_22 alleles predominate in 60 analyzed P. sativum accessions (41, 38 accessions respectively) ( Supplementary  Table S1), while all 24 P. abyssinicum accessions had single unique accD_14 and bccp_26 (17 acc.), bccp_20 (5 acc.) and bccp_22 (JI1974) alleles corresponding to its separate domestication history and associated bottleneck. P. fulvum as a separate species forms a separate genetic group (nr. 2) and has also distinct and the most distant accD (accD_17-21) and bccp (bccp_1-3) alleles, separated by 39 to 40, and 7 to 8 amino acids, respectively, from the closest P. elatius alleles. On the contrary, wild P. elatius is genetically the most diverse and has seven genetic groups ), one of which (nr. 3) overlaps with P. sativum. This diversity is also reflected with 22 different accD and 25 bccp alleles, respectively. The most abundant are accD_25 (13 acc.), accD_29 (9 acc.), accD_2 (11 acc.), and bccp_22 (28 acc.), and bccp_31 (16 acc.) (Table S1). There is only a partial relationship between the genome wide DARTseq and accD/bccp based diversity. Genetic group nr. 10 of P. elatius accessions from the Caucasus region has the most distinct accD_30, 31, 34, but not bccp alleles. Similarly, genetic groups nr. 4 and 5 have a high proportion of accD_2 (8 acc.) and accD_15/16 (5 acc.) alleles in samples from Israel or eastern Turkey and Georgia, respectively. No clear genetic group assignment was found for bccp alleles within P. elatius accessions.

Geographic Distribution of accD/bccp Alleles
Pisum fulvum (16 acc.) is geographically restricted to Israel (7 acc.), Syria (7 acc.), Jordan (1 acc.), and southeastern Turkey (1 acc.), and displays distinct accD/bccp alleles. Genetically and geographically the most diverse set is from P. elatius (96 acc.). Of these, there were 34 accessions from Turkey, which had the highest genetic diversity ( Figure 6, Table S1). These accessions have various accD/bccp alleles, although the combination accD_25 and bccp_20 is the most frequent (10). The next large group is P. elatius from Israel, which had 25 accessions that belong to various genetic groups.

Relationship to Pisum Genetic Diversity
Having previously analyzed genetic diversity based on genome-wide sampled polymorphism [52,53], we examined the distribution of both accD and bccp alleles within respective genetic groups. Cultivated Pisum sativum accessions can be divided into two (nr. 3 and 6) equally abundant (24 and, 27 accessions, respectively) groups. The independently domesticated Ethiopian pea (P. abyssinicum) forms a separate (nr. 7) group (Table S1). With respect to accD/bccp alleles, accD_29 and bccp_22 alleles predominate in 60 analyzed P. sativum accessions (41, 38 accessions respectively) (Supplementary Table S1), while all 24 P. abyssinicum accessions had single unique accD_14 and bccp_26 (17 acc.), bccp_20 (5 acc.) and bccp_22 (JI1974) alleles corresponding to its separate domestication history and associated bottleneck. P. fulvum as a separate species forms a separate genetic group (nr. 2) and has also distinct and the most distant accD (accD_17-21) and bccp (bccp_1-3) alleles, separated by 39 to 40, and 7 to 8 amino acids, respectively, from the closest P. elatius alleles. On the contrary, wild P. elatius is genetically the most diverse and has seven genetic groups ), one of which (nr. 3) overlaps with P. sativum. This diversity is also reflected with 22 different accD and 25 bccp alleles, respectively. The most abundant are accD_25 (13 acc.), accD_29 (9 acc.), accD_2 (11 acc.), and bccp_22 (28 acc.), and bccp_31 (16 acc.) (Table S1). There is only a partial relationship between the genome wide DARTseq and accD/bccp based diversity. Genetic group nr. 10 of P. elatius accessions from the Caucasus region has the most distinct accD_30, 31, 34, but not bccp alleles. Similarly, genetic groups nr. 4 and 5 have a high proportion of accD_2 (8 acc.) and accD_15/16 (5 acc.) alleles in samples from Israel or eastern Turkey and Georgia, respectively. No clear genetic group assignment was found for bccp alleles within P. elatius accessions.

Geographic Distribution of accD/bccp Alleles
Pisum fulvum (16 acc.) is geographically restricted to Israel (7 acc.), Syria (7 acc.), Jordan (1 acc.), and southeastern Turkey (1 acc.), and displays distinct accD/bccp alleles. Genetically and geographically the most diverse set is from P. elatius (96 acc.). Of these, there were 34 accessions from Turkey, which had the highest genetic diversity ( Figure 6, Table S1). These accessions have various accD/bccp alleles, although the combination accD_25 and bccp_20 is the most frequent (10). The next large group is P. elatius from Israel, which had 25 accessions that belong to various genetic groups. These also have different accD_2 and bccp_5 (22 alleles occurring in 13 accessions). European samples cover a large region of Western (Spain, Portugal, France), Central (Italy), and Eastern (Greece, Hungary, Serbia) Europe (Table S1). The later samples are distinct by both by genome wide analyses and by accD/bccp alleles analysis. Finally, the most separate group of P. elatius is from Armenia, with unique accD_34 and bccp_21/22 alleles (Figure 6).
These also have different accD_2 and bccp_5 (22 alleles occurring in 13 accessions). European samples cover a large region of Western (Spain, Portugal, France), Central (Italy), and Eastern (Greece, Hungary, Serbia) Europe (Table S1). The later samples are distinct by both by genome wide analyses and by accD/bccp alleles analysis. Finally, the most separate group of P. elatius is from Armenia, with unique accD_34 and bccp_21/22 alleles (Figure 6).  Table S1) within the Middle East.
The cultivated pea is geographically less precisely localized, except for P. abyssinicum, which is found only in Ethiopia and Yemen. All P. abyssinicum accessions have accD_14/bccp_20/26 alleles. Landraces of P. sativum originate from 24 countries and span a large geographical area from the Western Mediterranean to Central and Southern Asia. They predominantly have accD_29 (41 acc.) and bccp_22 (36 acc.) alleles typical for cultivated pea. There are few distinct accessions that have different alleles. Two were from Algeria (accD_32/bccp_11/12), and two accessions were from Greece (specific bccp_3 allele). Two accessions from China (ATC6925, ATC6937) have a accD_6 allele shared with P. elatius, while PI560969 from Nepal has distinct accD_2/bccp_5 alleles (Table S1).

Discussion
Here we report the allelic composition and geographical distribution of two genes involved in postzygotic reproductive isolation in the pea [39]. Taking advantage of the available germplasm resources [52,53], we analyzed the allelic composition of chloroplast localized accD and nuclear encoded bccp genes. Our results extend the experimental data of Bogdanova et al. [39]. We analyzed the allelic composition of accessions collected from the wild (including all recognized Pisum species) and domesticated peas of various geographical origins.
Postzygotic reproductive isolation, expressed as hybrid sterility or inviability, hybrid weakness or necrosis, and hybrid breakdown, is considered one of the two major fundamental processes leading to speciation [2,9]. The plastome-genome dysfunctions concern various kinds of albinism. Generally, incompatible hybrid materials suffer from reduced pigment content, lower rates of photosynthesis, and an impaired thylakoid structure. We detected the occurrence of albinotic plants in crosses of wild Pisum fulvum or P. elatius with the cultivated pea P. sativum, which upon identification of the respective genes [39] prompted this study.  Table S1) within the Middle East.
The cultivated pea is geographically less precisely localized, except for P. abyssinicum, which is found only in Ethiopia and Yemen. All P. abyssinicum accessions have accD_14/bccp_20/26 alleles. Landraces of P. sativum originate from 24 countries and span a large geographical area from the Western Mediterranean to Central and Southern Asia. They predominantly have accD_29 (41 acc.) and bccp_22 (36 acc.) alleles typical for cultivated pea. There are few distinct accessions that have different alleles. Two were from Algeria (accD_32/bccp_11/12), and two accessions were from Greece (specific bccp_3 allele). Two accessions from China (ATC6925, ATC6937) have a accD_6 allele shared with P. elatius, while PI560969 from Nepal has distinct accD_2/bccp_5 alleles (Table S1).

Discussion
Here we report the allelic composition and geographical distribution of two genes involved in postzygotic reproductive isolation in the pea [39]. Taking advantage of the available germplasm resources [52,53], we analyzed the allelic composition of chloroplast localized accD and nuclear encoded bccp genes. Our results extend the experimental data of Bogdanova et al. [39]. We analyzed the allelic composition of accessions collected from the wild (including all recognized Pisum species) and domesticated peas of various geographical origins.
Postzygotic reproductive isolation, expressed as hybrid sterility or inviability, hybrid weakness or necrosis, and hybrid breakdown, is considered one of the two major fundamental processes leading to speciation [2,9]. The plastome-genome dysfunctions concern various kinds of albinism. Generally, incompatible hybrid materials suffer from reduced pigment content, lower rates of photosynthesis, and an impaired thylakoid structure. We detected the occurrence of albinotic plants in crosses of wild Pisum fulvum or P. elatius with the cultivated pea P. sativum, which upon identification of the respective genes [39] prompted this study.

Hypervariability of the Chloroplast accD Gene
The region of the chloroplast genome around the accD gene has been found to be prone to accumulation of repeats, resulting in high interspecific variability in numerous species (Pisum and Lathyrus [45], Capsicum [54], Glycine [43], Silene [47], Oenothera [55,56], Cupressophytes [57]) but much less variability at the intraspecific level (Medicago truncatula [44], tea, Camellia sinensis [58], and pea, Pisum sp. [39,40]). Our present study substantially expands the previous reports [39,40] by analyzing 195 pea samples covering the entire geographical and species range [52,59]. Our results on the ratios of nonsynonymous to synonymous substitutions (Ka/Ks) in the pea accD gene agree with data from Oenothera, Silene, and Cupressophytes [47,55,57]. This indicates positive selection, since Ka/Ks values significantly above 1 are unlikely to occur without at least some of the mutations being advantageous. The large variation in plastid-encoded accD gene sequences, both between and within the Pisum species, is consistent with findings in Silene, where positive selection in the phylogenetic context has been detected [47]. In many cases of plastid genome evolution, mutations have disproportionately affected nonsynonymous sites, resulting in elevated ratios of nonsynonymous to synonymous substitution rates. Notably, plastid genome comparison between Lathyrus sativus and Pisum sativum resulted in identification of a region spanning the accD gene with increased mutation rate [45]. Analysis of publicly available accD sequences for Lathyrus and Vicia species supported these findings (unpublished).
Variation detected in the Pisum sp. accD sequence is mainly caused by the insertion of multiple tandem repeated sequences, as found in Cupressophytes [57] and Medicago [44]. In particular, the later study corresponds well to our pea accD data, since each of the 24 studied Medicago truncatula genotypes appears to have a different accD sequence, yet with maintained reading frames despite the high variability. Mapping of the insertion sites onto the predicted protein structure indicated their clustering within the N-terminal part of the ACCD protein that could not be reliably modelled due to intrinsic disorder. Such disordered protein regions are known to be extremely flexible and dynamic, alleviating some structural constraints [60], and were reported to be prone to insertions and deletions [61]. It has been suggested that regions surrounding tandem repeats evolve faster than other non-repeat-containing regions, which results in increased frequency of substitutions near the flanking sequences [62]. As shown in tobacco, a functional accD is essential for development [63]. Interestingly, the relationship to biparental inheritance of plastids was proposed to be related to the plastid competition [56]. Since about 20% of all angiosperms contain plastid DNA in the sperm cell, it is likely that this mechanism of cytonuclear conflict is also present in other systems [64][65][66][67].

Allelic accD/bccp Combinations Found in Wild and Domesticated Peas
One of our major aims was to detect allelic combinations of both genes occurring in wild peas, as well as in cultivated pea crop. Altogether we found 36 accD and 35 bccp alleles in the set of 195 accessions. Within the wild pea (P. elatius) these occurred in 60 out of 671 possible combinations, indicating a high diversity, while both domesticated P. sativum and P. abyssinicum had only a reduced subset. There was no overlap between P. fulvum and P. elatius, except for one P. fulvum JI2539 accession from Israel, which had accD_22 (G lineage) allele shared with three P. elatius samples from Turkey. Notably, in our previous study [52], we have found in this accession a typical P. elatius trnSG_E6 allele, suggesting some past hybridization event between P. fulvum and P. elatius. Interestingly, in another two P. fulvum accessions (JI2510, JI2521) that also have the trnSG_E6 allele [52], the accD allele was canonical to P. fulvum (accD_20, 21, e.g., F lineage). P. abyssinicum had accD alleles and combinations distinct from P. sativum, supporting its independent domestication [53]. The accD_14 allele of P. abyssinicum was not found in any of P. elatius or P. sativum samples. Notably, two of the most frequent alleles of each gene, accD_29 and bccp_22, contributed to the most frequent combination of accD_30/bccp_25 found in domesticated P. sativum.
It remains to be experimentally tested by crosses if the allelic combinations detected in the natural conditions create barriers against gene flow in natural pea populations. Some experimental crosses between cultivated pea and selected P. fulvum and P. elatius accessions were conducted by Bogdanova et al. [68]. These crosses revealed hybrid sterility, ultimately leading to identification of the respective genes [39]. In our work, we made reciprocal crosses between P. elatius L100 (accD_2/bccp_5) and P. sativum cv. Cameor (accD_29/bccp_22), which resulted in the appearance of albinotic plants (Smýkal, unpublished), while a cross between P. elatius JI64 (accD_30/bccp_5) and P. sativum JI92 (accD_29/bccp_22) was fully viable and fertile [69,70]. This corresponds to the findings of Bogdanova et al. (2015) [39] of a incompatible cross between P. elatius L100 (accD_2/bccp_5) and P. sativum WL12238 (accD_29/bccp_22); a cross between P. elatius JI1794 (accD_25/bccp_27), 721 (accD_5/bccp_22), and P. abyssinicum VIR 2759 (accD_14/bccp_26) were compatible with the cultivated pea P. sativum WL12238 (accD_29/bccp_22) [68]. Moreover, the existence of a second, unlinked, and yet unidentified nuclear scs2 locus also involved in nuclear-cytoplasmic conflict has been proposed [39]. In this study, the authors proposed a model of determinants, based on seven substitutions and three deletions in ACCD and four amino acid substitutions in the biotinyl domain of BCCP protein.
The results of our study add to this complexity, as there are far more possible combinations.

Domestication and Hybrid Incompatibility
In crops, artificial selection and hybridization accelerate the evolutionary process [71]. The majority of economically important crops were isolated from their progenitors through the existence of prezygotic or postzygotic reproductive barriers (or both), even though geographic isolation was absent during the domestication [38]. The reproductive barriers between wild crop progenitors and domesticated crops might be attributed to several mechanisms, including differences in karyotype or chromosomal rearrangements. Such karyotype differences are reported between P. fulvum and P. elatius, P. sativum, and between P. sativum and P. abyssinicum [72,73], and contribute to the partial fertility of the respective hybrids. Much less is known about the interactions between nuclear and cytoplasmic genomes. To date, only a few genes implicated in hybrid incompatibility have been isolated in crops. In maize, Tcb1, Ga1, and Ga2 alleles influence interaction of pollen tubes with silk tissue and confer prezygotic barriers in crosses between cultivated Zea mays and the wild teosinte Z. m. mexicana [74]. About 50 loci controlling postzygotic reproductive barriers between rice subspecies have been identified and molecular products of some genes have been characterized [22]. For example, the S5 locus, a determinant of japonica-indica sterility, is located in proximity to the domestication OsC1 gene [75]. Similarly, the Gn1a gene involved in rice yield formation is linked with S35, which determines pollen sterility of japonica-indica hybrids [76]. Another example was shown in the tomato, where the Cf-2 gene from wild Lycopersicon pimpinellifolium confers resistance to the fungus Cladosporium fulvum in an Rcr3 dependent manner [48]; these two genes interact with each other to induce hybrid necrosis syndrome in the hybrids. Although the occurrence of albino plants in many interspecific crosses in crops is widely documented [77,78], its causes have not been studied in most cases. Notably, crosses between cultivated chickpea (Cicer arietinum) and its progenitor (C. reticulatum) yielded yellow and albino plants and a biparental plastid inheritance [77,78]. We speculate that this was caused by a similar mechanism as in the pea.
The results of this study might be relevant for breeding, particularly using more distant crop wild relatives, as well as hybrid crop breeding [79,80], but it remains to be tested by experimental crosses to identify causal effectors.

Plant Material
We analyzed 195 previously described pea accessions (Smýkal et al. 2017) [52,53,59], consisting of wild P. elatius (95) and P. fulvum (16) accessions (Table S1). Sixty domesticated P. sativum landraces and 24 domesticated P. abyssinicum accessions were selected to maximize the genetic diversity and to cover the entire range of the wild and landrace pea habitats. This span is approximately 5000 km in longitude from Morocco to Iran, and in latitude from Tunisia to Hungary; altitude ranged from sea level to about 2000 m. This material was previously morphologically described and assessed for its genetic diversity structure [52,53]. Plants were grown in 5 L pots with peat-sand (90:10) substrate mix (Florcom Profi, BB Com Ltd. Letohrad, Czech Republic), in glasshouse conditions (UP campus, Olomouc, Czech Republic).

DNA and RNA Analysis
Genomic DNA was isolated from a single plant per accession from approximately 100 mg of dry leaf material using the Invisorb Plant Genomic DNA Isolation kit (Invisorb, Berlin, Germany) and standard protocol [52,59]. Total RNA was isolated from young leaves using plant RNA kit (Macherey-Nagel, Düren, Germany). Isolated RNA was treated with DNaseI to remove genomic DNA. The accD gene was amplified directly from genomic DNA using primers (F1-GCATTAGTTTTCATTTTCAGTCC located 27 bp upstream of stop codon, R4-CTTTAATAGGGGTTTAGAATACA, located 94 bp upstream of ATG codon) [39]. We used cDNA as a template to avoid large intron sequences present in the bccp3 gene. One microgram of a total RNA was reversely transcribed with Oligo(dT) primer and AMV reverse transcriptase (Promega, Madison, USA) according to manufacturer´s protocol (Hradilová et al. 2017) [71]. Two step nested PCR amplification was used. After the first PCR (with primers F-CTAATGAAAGTGGCGGAAATC, R-CCTTATTACGCGTCTTAGTGAATG), the product was diluted (1:100) and the second PCR was performed (F33-CCATTCTCTGCACTCCCTTTCGCG, R1113-CAATTATTTCTCAATCTATTCAAA ACG), using the conditions as described in Hradilová et al. [71]. PCR products were verified on a 1.5% agarose gel, treated with Exonuclease-Alkaline Phosphatase (Thermo Scientific, Brno, Czech Republic) and sequenced at Macrogene.

Sequence Analysis
For initial analysis, Geneious 7.1.7 (Biomatters Ltd., Auckland, New Zealand ) was used to edit and align sequences. Due to the presence of large gaps in the accD gene, sequences were translated into protein sequences, which reduced the overall length of the accD nucleotide alignment and partially helped to eliminate large gaps. This procedure reduced the complexity of the accD sequences. Sequences of the bccp gene were treated in the same manner, although these sequences were largely devoid of large indels. The translated protein sequences were aligned in Geneious using the MAFFT algorithm and the final alignment was manually adjusted. From the final alignment, different alleles and their frequencies were identified using the online tool FABOX [81].
To explore possible connections or relationships among the identified alleles, the reduced dataset (including each allele defined only once) was used for the network analysis. Several approaches of network construction were used (based on characters, Median network, Median-joining; based on distances, Neighbor network, Split decomposition) and implemented in SplitsTree [82]. The results were then compared. To compare the results of network analysis with a classically constructed bifurcating tree, a maximum parsimony (MP) tree was built using MEGA 6 with 1000 bootstrap replicates [83]. Because of the complex pattern of gaps within the accD gene, indels were treated as "partially deleted" (pairwise deletion, option implemented in MEGA) during the MP analysis. The final consensus tree was computed from all the equally parsimonious trees found during the analysis and was midpoint rooted. The tree topology was compared against the constructed networks. To simplify or reduce the number of identified alleles, groups of related alleles were inferred based on the constructed networks and the final consensus MP tree for both investigated genes. DnaSP v5.10 was used to determine nucleotide diversity and synonymous/non-synonymous sites ratios [84]. All studied accD and bccp sequences were deposited in the GenBank database under the accession numbers MK619486 to MK619678, and MK644626 to MK644819, respectively.

Tandem Repeat Analysis
Tandem repeats within DNA and protein sequences were identified in a combination of two algorithms (FastPCR [85] and RADAR [86]). The consensus DNA sequence of accD gene was first scanned by FastPCR at a repeat length ≥20 bp (k-mer = 12 with a tolerance for up to one mismatch within k-mer) with a similarity of above 70%. Potential tandem repeats for consensus protein sequence were further identified by RADAR software. Both methods complemented each other, since the boundaries of some degenerate and mixed tandem repeats were difficult to identify separately.

Protein Sequence Analysis and Structure Modelling
To identify the domains we used InterPro (www.ebi.ac.uk/interpro) and SMART databases (http://smart.embl-heidelberg.de). To generate molecular models of both proteins, standard sequences of the pea accD (GenBank YP_003587558.1) and bccp (GenBank DR89228.1) were used as queries to identify suitable templates and to perform molecular modelling by threading using Phyre2 in "normal" mode [87]. Only a partial model was generated for each protein, as portions of the sequence predicted to be disordered or lacking a suitable template (including some internal loops) could not be reliably modeled. In the case of ACCD, the structure of Staphylococcus acetyl-CoA carboxylase carboxyltransferase (PDB 2F9I) was identified as the best template. The second best template (PDB 2F9Y, also of bacterial origin) yielded a model of similar coverage and spatial organization. A similar model, also based on the PDB 2F9I template, was obtained for the same part of ACCD using another algorithm, RaptorX [88]. For BCCP, the best template identified by Phyre2 was the pyruvate carboxylase from Methylobacillus flagellatus (PDB 5KS8). The same template was also found by RaptorX as second best; namely, pyruvate carboxylase from Listeria monocytogenes (PDB 4QSH) yielded a spatially similar model. The Phyre2-generated models were subjected to additional refinement in the DeepView environment [89] to eliminate amino acid sidechain clashes. Subsequent evaluation of the resulting models using the WHAT_CHECK tools [90] revealed no critical errors, with scores for some parameters only slightly poorer than observed for the template for both proteins.

Mapping Protein Sequence Polymorphisms on Predicted Structure
Unique protein sequences encoded by alleles, each of the two loci were identified within aligned protein sequence sets using the ElimDupes tool at the Los Alamos HIV database website (https: //www.hiv.lanl.gov/content/sequence/elimdupesv2/elimdupes.html). A map of polymorphisms was then generated manually from the resulting unique sequence alignments. A distribution of the polymorphisms between the modeled and non-modeled portions of the protein was statistically evaluated using the Chi-square test.
Supplementary Materials: The following are available online at http://www.mdpi.com/1422-0067/20/7/1773/ s1. Table S1: List and description of analyzed material, Table S2: Table of accD/bccp combinations, Figure S1: The alignment of amino acid sequences of all identified accD alleles, File S1: Theoretical model of pea BCCP protein structure, (co-ordinates in standard PDB format), File S2: Model of BCCP protein structure. Theoretical model of pea ACCD protein structure, (co-ordinates in standard PDB format). File S3: Theoretical model of pea BCCP protein structure, (co-ordinates in standard PDB format).