Legume Cytosolic and Plastid Acetyl-Coenzyme—A Carboxylase Genes Differ by Evolutionary Patterns and Selection Pressure Schemes Acting before and after Whole-Genome Duplications

Acetyl-coenzyme A carboxylase (ACCase, E.C.6.4.1.2) catalyzes acetyl-coenzyme A carboxylation to malonyl coenzyme A. Plants possess two distinct ACCases differing by cellular compartment and function. Plastid ACCase contributes to de novo fatty acid synthesis, whereas cytosolic enzyme to the synthesis of very long chain fatty acids, phytoalexins, flavonoids, and anthocyanins. The narrow leafed lupin (Lupinus angustifolius L.) represents legumes, a plant family which evolved by whole-genome duplications (WGDs). The study aimed on the contribution of these WGDs to the multiplication of ACCase genes and their further evolutionary patterns. The molecular approach involved bacterial artificial chromosome (BAC) library screening, fluorescent in situ hybridization, linkage mapping, and BAC sequencing. In silico analysis encompassed sequence annotation, comparative mapping, selection pressure calculation, phylogenetic inference, and gene expression profiling. Among sequenced legumes, the highest number of ACCase genes was identified in lupin and soybean. The most abundant plastid ACCase subunit genes were accB. ACCase genes in legumes evolved by WGDs, evidenced by shared synteny and Bayesian phylogenetic inference. Transcriptional activity of almost all copies was confirmed. Gene duplicates were conserved by strong purifying selection, however, positive selection occurred in Arachis (accB2) and Lupinus (accC) lineages, putatively predating the WGD event(s). Early duplicated accA and accB genes underwent transcriptional sub-functionalization.


Introduction
Legumes are the third largest family of higher plants and are second to cereals in agricultural importance. Besides being a well-known source of dietary protein, grain legumes provide about one-third of vegetable oil for human consumption [1]. They are also appreciated as organic fertilizer plants, due to nitrogen fixation in a symbiotic system with bacteria [2]. The origin and diversification of this ecologically and economically important family has been recognized by time-calibrated phylogenies, based on multiple DNA sequences from nuclear and plastid genomes [3][4][5][6]. The last decade has witnessed spectacular progress in the development of next-generation sequencing

Contig Assembly and Bacterial Artificial Chromosome Clone Sequencing
BAC clones showing positive hybridization signals were verified by PCR amplification using BAC DNA as a template and the same primer pairs as those used for probe amplification. The PCR-verified clones were subjected to BAC-end sequencing on the ABI PRISM 3130 XL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using pIndigoBAC5 sequencing primers (Table S1).
Sequences obtained with BAC3 and BAC5 primers were named with suffixes "3" and "5", respectively, and deposited under GenBank NCBI accession numbers MF346806-MF346859. BAC-end sequences (BESs) were analyzed and manually edited with Chromas Lite software (Technylesium, South Brisbane, Australia). BESs were aligned using BLASTn algorithm [59] in Geneious 8.1 [60] to the draft assembly of the L. angustifolius genome [23] and overlapping BAC clones were identified. The contigs were verified by PCR amplification using BAC DNA and primers designed on the templates of appropriate BESs. The whole insert sequencing was performed using Miseq platform (Illumina, San Diego, CA, USA) in paired-end 2 × 250 bp approach (Genomed, Warsaw, Poland). Nucleic acid fragmentation was performed in E210× (Covaris, Woburn, MA, USA). Sequencing libraries were constructed using NEBNext ® DNA Library Prep Master Mix Set for Illumina (E6040, NEB, Ipswich, MA, USA). Ten BAC clones per lane were sequenced with expected output min. 7.5 Gb. Sequence trimming was done in Cutadapt (https://cutadapt.readthedocs.io/en/stable/) whereas the assembly was in CLC Genomics Workbench.

Functional Annotation of Bacterial Artificial Chromosome Sequences
The process of in silico annotation included de novo detection of specific marks located on the genomic sequences as well as a comparative analysis. Narrow-leafed lupin genome contains more than 57% of repetitive content [23]. Prior to gene prediction, it is recommended to mask repetitive sequences including low-complexity regions and transposable elements [61,62]. Repetitive elements were identified using RepeatMasker Web Server version 4.0.6 with implemented repeat libraries RMLib 20160829 and Dfam 2.0 (A.F.A. Smit, R. Hubley & P. Green, Institute for Systems Biology, Seattle, WA, USA, http://www.repeatmasker.org). The following options were chosen: search engine, cross_match; speed/sensitivity, slow; DNA source, Arabidopsis thaliana. DNA sequences were also submitted to Censor [63]. The following settings were applied: Viridiplantae sequence source, report simple repeats, force translated search. Repetitive content was masked according to Repeatmasker/Censor results. To find reference protein sequences BLAST to RefSeq database (NCBI, https://www.ncbi.nlm.nih.gov/refseq/) was performed. The following parameters were applied: e-value cut-off, 1E-10; word size, 8; gap existence cost, 5; gap elongation cost, 2; nucleotide match/mismatch scores, 1/−2; max matches in a query range, 1. For each putative gene, protein-based hidden Markov model gene prediction in FGENESH+ [64] was executed, using matching reference RefSeq proteins. Annotated BAC sequences were deposited in NCBI GenBank under accession numbers (MK045264-MK045274).

Fluorescent In Situ Hybridization
Localization of BAC clones in metaphase mitotic chromosomes was performed with the use of fluorescent in situ hybridization (BAC-FISH). BAC probe labelling, chromosome squash preparation, and the FISH procedure was done according to the protocol [34]. The quality of chromosome slides was checked under a phase-contrast microscope BX41 (Olympus, Tokyo, Japan). After FISH, slides were examined with an epifluorescence microscope BX60 (Olympus) using the Cell_F software (Olympus). Images were captured using a CCD monochromatic camera and superimposed using Micrografx Picture Publisher 8 (Corel, Ottawa, ON, Canada).

Droplet Digital PCR
The number of gene copies was determined by the PCR multiplex reaction performed using QX200 Droplet Digital PCR System (Bio-Rad, Hercules, CA, USA). Briefly, each of the 20 µL reactions contained 1× QX200 ddPCR EvaGreen Supermix (Bio-Rad), 200 nM gene-specific primers, 50-80 nM reference (aspartate aminotransferase, AAT) gene-specific primers, and L. angustifolius cv. Sonet DNA template (in dilutions ranging from 0.125 to 2.0 ng/µL). The PCRs were performed in a C1000 Touch Thermal Cycler (Bio-Rad) with the following cycling conditions: 1× (95 • C for 5 min), 40× (95 • C for 30 s, 57-60.1 • C for 30 s, 72 • C for 45 s), 1× (4 • C for 5 min, 90 • C for 5 min) with 2 • C/s ramp rate. Primer annealing for ACC, accA, and accB/accC genes was performed in 60 • C, 60.1 • C, and 57 • C, respectively. The number of droplets and their fluorescence intensity were measured with the QX200 Droplet Reader (Bio-Rad). Data analysis was performed in QuantaSoft (Bio-Rad). To determine the number of gene copies in 1 µL of the analyzed samples, the fraction of positive droplets was background-corrected using template-free control samples and evaluated using Poisson statistics. Primers used for Droplet Digital PCR are provided in Table S1.

Genetic Mapping
Annotated BES and BAC sequences were used to design PCR primers for genetic marker development (Table S1). Amplification was performed using DNA isolated from the parental lines of the L. angustifolius mapping population (83A:476 and P27255). Amplicons were recovered directly from the post-reaction mixtures (QIAquick PCR Purification Kit; Qiagen) and sequenced. Allele-specific PCR (AS-PCR) polymorphisms were visualized by 1% agarose gel electrophoresis, whereas nucleotide substitution polymorphisms were solved by the Cleaved Amplified Polymorphic Sequence (CAPS) approach. Restriction sites were identified using dCAPS Finder 2.0 [65]. Digestion products were separated by 2% agarose gel electrophoresis. The segregation data of skeleton markers from the most recent genetic map of the narrow-leafed lupin [23] was used for linkage mapping (Map Manager QTXb20) [66]. Graphical illustration of linkage groups was performed using MapChart [67].

Microsynteny Analysis
Regions (~200,000 nt) carrying the genes encoding cytosolic ACCase and subunits of plastid ACCase were extracted from the L. angustifolius genome, repeat-masked, and blasted against the same genome assemblies as those used in gene prediction (Section 2.9). An identical scenario was applied for BAC clone sequences carrying analyzed genes. Sequence similarity search was performed using CoGe BLAST with parameters: e-value, 1E-10; word size, 8; gap existence/extension cost, 5/2; low complexity filter, yes; match/mismatch scores: 1/2. Sequences producing excessive numbers of alignments to loci dispersed over numerous chromosomes were considered as repetitive and removed from total score calculation and collinearity visualization. Genome Synteny Viewer (GSV) [72] and Circos [73] were used for drawing synteny graphs.

Phylogenetic Survey
CDSs with trimmed stop codons were aligned in Geneious using MAFFT 7.017 [74] with the following parameters: alignment, translation; genetic code, standard; translation frame, 1; algorithm, E-INS-I; scoring matrix, blosum 80; gap open penalty, 1.25; offset value, 0. The same parameters were applied for protein sequence alignment. Selection of best-fit model for nucleotide substitution was performed in jModelTest 2.1.10 [75]. The mRNA-based phylogenetic inference was conducted in Mr Bayes 3.2.2 [76] with the following settings: chain length (number of iterations), 1,000,000; subsampling frequency, 200; burn-in length (number of initial iterations excluded), 200,000; nchains (number of chains used for Metropolis-coupled Markov chain Monte Carlo analysis), 6; heated chain temperature 0.30; rate variation, gamma; gamma categories, 4; unconstrained branch lengths, exponential 10; shape parameter, exponential 10. Two substitution models were tested, GTR+I+G and codon M1 (site-specific neutral). For protein-based phylogeny, LG [77] and JTT substitution model were selected [78]; all other parameters were identical. Majority consensus trees were drawn in Geneious. The conservation of selected protein regions was calculated as a percent of identical amino acids at all positions within given region. The analysis was performed using conserved domain database (CDD, https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml).

Selection Pressure Analysis
Following the topology of the tree, pairs of sequences were selected for selection pressure test. Pairwise translation sequence alignments were done in MAFFT v7.017, using E-INS, BLOSUM 80 and gap open penalty 1.25. Selection pressure parameters, Ka (the number of nonsynonymous substitutions per nonsynonymous site), Ks (the number of synonymous substitutions per synonymous site), and Ka/Ks ratios were calculated in DnaSP 5 [79]. To address the selection pressure in a wider phylogenetic context, branch-site test of positive selection was performed in PAML4 [80] for Lupinus and Arachis clades. The following foreground branches were analyzed: Arachis spp. ACC, G. max ACC, Lupinus spp. ACC, Arachis spp. accA1 and accA2, G. max accA, Lupinus spp. accA, B. tomentosa accA2 (with descending branches), Arachis spp. accB1, G. max accB1, Lupinus spp. accB1, A. duranensis accB2a + A. ipaensis accB2b, A. duranensis accB2b + A. ipaensis accB2a, G. max accB2, Lupinus spp. accB2, Arachis spp. accC1, Arachis spp. accC2, G. max accC, and Lupinus spp. accC. The numbers of sequences in the alignments were as follows: ACC, 24; accA, 32; accB1, 18; accB2, 38; accC, 26. Two models were considered: a null model (model = 2, NSites = 2, fix_omega = 0), in which the foreground branch might have different proportions of sites under neutral selection to the background (i.e., relaxed purifying selection), and an alternative model (model = 2, NSites = 2, fix_omega = 1), in which the foreground branch might have a proportion of sites under positive selection. Verification of hypothesis was based on the likelihood ratio test (alternative vs. null model) and p-value under Chi-square distribution and 1 degree of freedom. False discovery rate (FDR) correction (0.05) was applied in Q-value, https://github.com/StoreyLab/qvalue.

Both Homologous and Heterologous Probe(s) Were Applicable to Select BAC Clones Carrying Genes Encoding Cytosolic ACCase and Subunits of Plastid ACCase
Screening of the L. angustifolius genomic BAC library with probes carrying fragments of ACCase gene sequences (Table 1 and Table S1) resulted in selection of seven clones for cytosolic ACCase gene (ACC) and 32 for plastid ACCase subunit genes (accA, accB and accC). PCR verification using DNA isolated from BACs as a template, followed by amplicon sequencing, confirmed the presence of ACC gene sequence in four, accA in six, and accB in five clones. As the accC probe originated from M. truncatula cDNA sequence, no PCR verification with L. angustifolius primers was possible and all 12 clones tagged by hybridization were used in the subsequent analyses.
BESs were obtained for all 27 clones analyzed (accessions MF346806-MF346859). Mapping of BESs to the L. angustifolius genome sequence anchored all clones to this assembly and revealed that clones carry two ACC (Lan_ACC1 and Lan_ACC2), two accA (Lan_accA1 and Lan_accA2), one accB (Lan_accB1), and one accC (Lan_accC1) genes. BAC clones that were revealed to carry no target genes were named as "random". BES mapping confirmed the results of PCR-based verification and yielded in construction of contigs carrying 2 Lan_accC1 clones, 7 Lan_accA2 clones, 3 Lan_ACC1 clones, and 5 Lan_accB1 clones, as well as two contigs carrying "random" clones (Table S2). Eleven singletons were identified, including those carrying Lan_ACC2 and Lan_accA1 genes. BAC-FISH was performed to localize BAC clones in the mitotic chromosomes of L. angustifolius as well as to determine if particular pairs of clones are located in the same regions of the genome. Out of 27 clones tested, 15 yielded single locus signals, including one clone from the each of Lan_ACC1, Lan_ACC2, Lan_accA1, and Lan_accB1 sub-libraries, two clones from the Lan_accC1 and four clones from the Lan_accA2 sub-libraries (Table 2). BAC clones originating from the same contig gave overlapping single locus signals on one pair of homologous chromosomes, whereas those from different contigs showed single locus signals on two different chromosome pairs (Table 2, Figure 1). Singletons were confirmed to be unlinked to any other clone from the analyzed set.

Genes Encoding Cytosolic ACCase and Subunits of Plastid ACCase are Located in Different L. angustifolius Chromosomes
BAC-FISH was performed to localize BAC clones in the mitotic chromosomes of L. angustifolius as well as to determine if particular pairs of clones are located in the same regions of the genome. Out of 27 clones tested, 15 yielded single locus signals, including one clone from the each of Lan_ACC1, Lan_ACC2, Lan_accA1, and Lan_accB1 sub-libraries, two clones from the Lan_accC1 and four clones from the Lan_accA2 sub-libraries (Table 2). BAC clones originating from the same contig gave overlapping single locus signals on one pair of homologous chromosomes, whereas those from different contigs showed single locus signals on two different chromosome pairs (Table 2, Figure 1). Singletons were confirmed to be unlinked to any other clone from the analyzed set. Table 2. Direct visualization of bacterial artificial chromosome (BAC) clones distribution in Lupinus angustifolius chromosomes. 009k09 073e17 002f03 005g02 009a01 049f04 089l06 096g16 126d16 011g20 016j11 046i04 048n08 051f15 060f02 009k06  Chromosomes were counterstained with DAPI (blue). BACs were labeled with digoxigenin-11-dUTP (green signals) and tetramethylrhodamine-5-dUTP (red signals); (a) 009a01 (green) and 126d14 (red) (accC sub-library, NLL-06); (b) 046i04 (green) and 060f02 (red) (accA sub-library, NLL-11); (c) 080b11 (green) and 009a01 (red) (NLL-06). Scale bar = 5 μm.
To anchor contigs and singletons to the linkage groups of the narrow-leafed lupin genetic map, molecular markers were developed. Fifty-four sequences derived from 27 clones were used for primer design (Table S1). Expected PCR amplicons were obtained for 63 of 71 primer pairs. The presence of sequence polymorphism between parental lines of the mapping population (83A:476 and P27255) was found for 14 primer pairs, allowing design of twelve CAPS and two AS-PCR molecular markers (accessions: MF398274-MF398299). Segregation in RIL population revealed statistically significant deviation from Mendelian under Chi-square distribution (p = 0.05) for two markers, 016J11_5 and Genes 2018, 9, 563 9 of 30 060F02_3 (Table 3). All markers were localized in linkage groups with LOD (logarithm of odds) values from 9.2 to 22.6. Taking into consideration the RIL number in the population (N = 112) and several lines with missing marker data between datasets (resulting from different RIL sets used by research teams during 12 years of the narrow-leafed lupin studies), the statistical significance is conferred by LOD values of about 3-4. LOD values of about 10 can be considered as very high, whereas LOD of about 23 is the maximum possible value. Overlapping clones showed close linkage, as follows: markers 009K06_5 and 040M06_3 (linkage group NLL-14, distance between markers 0.0 cM), 016J11_5 and 060F02_3 (NLL-11, 0.7 cM), 011G20_5 and 069L17_3 (NLL-13, 0.7 cM). All BAC-derived ACCase gene sequences were mapped to linkage groups, as follows: Lan_ACC1 to NLL-14, Lan_ACC2 to NLL-15, Lan_accB1 to NLL-13, and Lan_accC1 to NLL-06. New markers anchored 12 clones showing single locus BAC-FISH signals, making a considerable improvement of the integrated map of the species, carrying hitherto 73 chromosome-specific landmarks [33]. Segregation data for developed molecular markers was provided in the Table S3.

Lupins and Soybean
Have Duplicates of All Nuclear ACCase Genes (ACC, accA, accB and accC) Twelve BAC clones were selected for sequencing including those carrying genes in question (Lan_ACC1, Lan_ACC2, Lan_accA1, Lan_accA2, and Lan_accB1) and a few "random" ones. The sequencing provided 2.6 mln reads per clone (ranging from 0.4 to 5.1 mln), resulting in an average sequence coverage of 9700×. Full-length insert sequences were recovered for all BACs except two clones from "random" sub-library (089L06 and 096G16). Excluding these two truncated inserts, the length of BAC sequences varied from 31,722 to 162,642 bp, with an average value of 65,220.1 bp (Table 4). This is about 35% less than the average value calculated for the genome BAC library as well as 32.5% less than the average value for clones selected by hybridization from this library in previous studies [31][32][33]90]. BAC clone 011G20 was shown to be fully nested within the 075C05 clone and therefore was discarded from further analyses.
Repetitive content varied from 2.2% (009K06) to 36.2% (092K09). No correlation between the abundance of repeats and the type of BAC-FISH signal was found, i.e., the low-repeat clone 070D20 (7.1% of repetitive content) yielded dispersed signals whereas the high-repeat clone (37.3% of repetitive content) gave single locus signals. There was no correlation with the type of repeats as well; there were clones carrying LTR/Copia, LTR/Gypsy, and DNA/hAT (051F15) or LTR/Gypsy, LTR/Copia, and DNA/Helitron (060F02) and still yielding BAC-FISH signals single locus type. Gene annotation revealed that BAC clone sequences differed by the number of genes predicted, from 0 in the clone 089L06 to seven in 009K06 (Table 4, Figure 2). The presence of gene Lan_ACC1 in 070D20, Lan_ACC1 in 073E17, Lan_accA1 in 051F15, Lan_accA2 in 060F02, Lan_accB1 in 075C05 and Lan_accC1 in 009A01 clones was confirmed. Two clones carrying Lan_accA1/Lan_accA2 homologs were found to have partially similar gene content, whereas gene content of two clones carrying Lan_ACC1/Lan_ACC2 duplicates was different. Annotated BAC clones were deposited under accession numbers (MK045264-MK045274). Droplet Digital PCR profiling with primers anchored in conserved regions of multiple sequence alignment confirmed the presence of two ACC, three accA, one accB1, and two accC genes in the L. angustifolius genome (Table S4). accession numbers (MK045264-MK045274). Droplet Digital PCR profiling with primers anchored in conserved regions of multiple sequence alignment confirmed the presence of two ACC, three accA, one accB1, and two accC genes in the L. angustifolius genome (Table S4).  blue, BAC clone sequences (009A01, 009K06, 042E20, 051F15, 060F02, 070D20, 073E17, 075C05, 086L06, 092K09, 096G16); black, repetitive elements; green, genes; red, coding sequences. All sequences are drawn to scale provided at the top of the figure (bp).
Ten genome and thirteen transcriptome assemblies of species representing all major legume clades were analyzed by BLAST and protein-based gene prediction to reveal the presence of coding sequences (CDS) for cytosolic ACCase and subunits of plastid ACCase. Homologous sequences were identified in all datasets, however, due to low quality of the RNA-seq assembly, only truncated sequences were retrieved for eight species, namely Cladrastis lutea, Desmanthus illinoensis, Gleditsia sinensis, Gleditsia triacanthos, Gymnocladus dioicus, Lathyrus sativus, Senna hebecarpa, and Xanthocercis zambesiaca. Due to the unpublished stage of the one KP (1000 Plants consortium) initiative hosting these sequences, we were not allowed to access raw data and to re-assemble these transcriptomes using L. angustifolius genes as references. To keep the reasonable number of gapped positions in the multiple sequence alignment, species with truncated homolog sequences were not included in the phylogenetic analysis. The lowest numbers of genes encoding both types of ACCase proteins were identified for Cercis canadensis and Copaifera officinalis, whereas the highest for L. albus, G. max, and L. angustifolius. The latter three species revealed to retain duplicates of all four nuclear genes (ACC, accA, accB, and accC). Particular genes differed by copy number: ACC and accC were the rarest (an average of 1.4-1.5 copies per the assembly), whereas accB the most abundant (an average of 3.2 copies) ( Table 5, Tables S5 and S6).

The Structure of Nuclear Genes Encoding Cytosolic ACCase and Subunits of Plastid ACCase Is Highly Conserved among the Legume Family
The structure of ACCase genes was evaluated in ten legume species. The genes for cytosolic ACCase have an average length of 12,697 bp (min. 10,145 bp, max. 14,071 bp). Thirty-one exons were predicted for all cyto-ACCase homologs except both M. truncatula copies which had 29 exons (Table S5 and Figure S1). An average gene length of accA homologs, except Cca_accA1, was estimated as 4980 bp (min. 3666 bp, max 7368 bp). Cca_accA1 revealed a very long first intron, putatively artificially introduced by gene prediction algorithm due to the lack of expected transcription start site. The number of exons varied from 10 in thirteen homologs through 11 in four sequences up to 12 in Mtr_accA (Table S5 and Figure S2). The average length of accB genes was estimated to be 3830 bp (min. 1653 bp, max. 7463 bp). Exon number was also highly conserved, six in five homologs and seven in thirty (Table S5 and Figure S3). Gene length of accC homologs was relatively even, with an average value of 7242 bp (min. 5114 bp, max 9656 bp). The number of predicted exons was 15 for three accC genes and 16 for eleven (Table S5 and Figure S4).
The analysis of conserved domain revealed that cytosolic ACCase is composed of three parts of near equal length (Figure 3a). The N-terminal part contains domains of BC and BCCP activity, and the C-terminal region contains CT activity. Between these two parts there is the ACC central domain, pfam08326, specific for Eukaryotes, which does not possess any bacterial homologs and whose function is unknown. The conservation degree of this domain, only slightly lower than this of CT domain, suggests some stabilizing function during evolution.
in thirty (Table S5 and Figure S3). Gene length of accC homologs was relatively even, with an average value of 7242 bp (min. 5114 bp, max 9656 bp). The number of predicted exons was 15 for three accC genes and 16 for eleven (Table S5 and Figure S4).
The analysis of conserved domain revealed that cytosolic ACCase is composed of three parts of near equal length (Figure 3a). The N-terminal part contains domains of BC and BCCP activity, and the C-terminal region contains CT activity. Between these two parts there is the ACC central domain, pfam08326, specific for Eukaryotes, which does not possess any bacterial homologs and whose function is unknown. The conservation degree of this domain, only slightly lower than this of CT domain, suggests some stabilizing function during evolution.  light grey, 20-59%; dark grey, 60-79%; black, 80-100%. Legume sequences used for calculation are provided in the Tables S11, S13 and S14. Subunits of plastid ACCase revealed different degrees of conservation resulting from different selective pressures. The BCCP (accB) homologs possess two conserved regions: one shorter in the center of the protein and another longer located on the C-terminus (Figure 3b,c). The biotinylation site is located in the C-terminal conserved region. The C-terminal region of plant BCCP carrying both conserved regions is homologous with cyanobacterial protein, whereas the N-terminal region does not show any significant similarity to bacterial ancestor. The L. angustifolius BCCP1 is more conserved than BCCP2, but the difference is very small (60% vs. 52% in the middle domain) and might result from lower number of analyzed proteins in the BCCP1 clade.
The BC subunit (accC) is very conserved (large block with~80% sequence identity) and does not reveal any major remodeling between cyanobacterial and plant proteins (Figure 4a). In CT-alpha (accA), the conservation degree decreases gradually from N-to C-terminus, starting from ca. 80% at the N-end part (Figure 4b). The transition from cyanobacterial to eukaryotic protein is associated with multiple inserts scattered along the N-terminal part of the protein.
are provided. Blocks are colored according to sequence conservation level, namely: white, 4-19%; light grey, 20-59%; dark grey, 60-79%; black, 80-100%. Legume sequences used for calculation are provided in the Tables S11, S13 and S14. Subunits of plastid ACCase revealed different degrees of conservation resulting from different selective pressures. The BCCP (accB) homologs possess two conserved regions: one shorter in the center of the protein and another longer located on the C-terminus (Figure 3b, c). The biotinylation site is located in the C-terminal conserved region. The C-terminal region of plant BCCP carrying both conserved regions is homologous with cyanobacterial protein, whereas the N-terminal region does not show any significant similarity to bacterial ancestor. The L. angustifolius BCCP1 is more conserved than BCCP2, but the difference is very small (60% vs. 52% in the middle domain) and might result from lower number of analyzed proteins in the BCCP1 clade.
The BC subunit (accC) is very conserved (large block with ~80% sequence identity) and does not reveal any major remodeling between cyanobacterial and plant proteins (Figure 4a). In CT-alpha (accA), the conservation degree decreases gradually from N-to C-terminus, starting from ca. 80% at the N-end part (Figure 4b). The transition from cyanobacterial to eukaryotic protein is associated with multiple inserts scattered along the N-terminal part of the protein.

Whole-Genome Duplication Event(s) Shaped the Evolution of L. angustifolius Nuclear Genes for Cytosolic ACCase and Plastid ACCase Subunits
Sequence collinearity of genome regions carrying ACCase genes was analyzed in ten legume species. Highly conserved synteny, evidenced by high total score values of sequence alignments, was observed for Lan_accB2c, Lan_ACC2, and Lan_ACC1 regions. The least conserved collinearity was revealed for Lan_accB1. Regions surrounding ACC and accA genes revealed synteny to other genome regions carrying these genes as well as to other regions lacking any appropriate homolog. Genome regions carrying accB and accC genes were syntenic only to those encoding corresponding homologs. The pattern of synteny indicated lineage-specific duplications of ACC (L. angustifolius, M. truncatula and G. max), accA (L. angustifolius and G. max), accB1 (G. max), and accC (L. angustifolius and G. max) genes. accB2 genes revealed a complex pattern, indicating both lineage-specific and ancestral duplications. No synteny between accB1 and accB2 genome regions was observed. The representatives of accB1 and accB2 genes were found in all legume taxa having sequenced genomes. The presence of accB1 and accB2 genes in the genome of a model plant, A. thaliana, highlights early separation of these genes in plant evolution (Tables S7-S10). Patterns of synteny observed for sequences: Lan_ACC1 vs. Lan_ACC2, Lan_accA1 vs. Lan_accA2 vs. Lan_accA3, Lan_accB2a/Lan_accB2b vs. Lan_accB2c, and Lan_accC1 vs.

Legume ACC Genes Evolved by Lineage-Specific Duplications, Whereas accA, accB and accC Genes Both by Early and Lineage-Specific Duplications
The topologies of majority rule consensus trees of ACC, accA, accB1, accB2, and accC genes highlighted the recently resolved phylogeny of legumes, with the early diverging Cercideae, Detarieae, and mimosoid/MCC clades localized at top nodes (Tables S11-S15, Figure 5b, Figure 6b, Figure 7b,d and Figure 8b). Arachis and Lupinus clades localized at expected positions, ancestral (accB1 and accC) or parallel (ACC, accA and accB2) to the mirbelioid clade represented by G. polymorphum. Lupinus lineage was found to be descendant (accB1, accC1) or parallel (ACC, accA and accB2) to Arachis.
All lupin ACC, accA, accB1, accB2, and accC copies arose after the divergence of the Lupinus lineage. However, lupin ACC, accA, accB2, and accC sequences do not form monophyletic clades which indicates the occurrence of one duplication event before the divergence between L. angustifolius and L. albus. No such duplication was revealed for the accB1 clade. The split between accB1 and accB2 clades must have occurred very early, before the divergence of Arabidopsis. Therefore, we used the C. reinhardtii sequence to anchor all trees except ACC as the C. reinhardtii cytosolic ACCase possesses five long insertions scattered along the peptide. Tree topology indicated that one duplication event of accB2 might have occurred at the early evolution of Papillionoideae, putatively after the divergence of the mimosoid/MCC clade but before the appearance of dalbergioids (Figure 7b). One duplicate was well preserved in all descendant lineages analyzed, whereas the second was revealed to be currently absent in few of them, including Lupinus, Phaseolus, and Vigna. The issue whether this duplication encompassed genistoid lineage requires high-resolution phylogenetic inference involving extensive sampling of Old-and New World lupins. Such studies are currently hampered by the lack of high quality sequence data. genes. A pair of Adu_accC1/Aip_accC1 was revealed to evolve independently of all other legume homologs (Figure 8b). The possible explanation of such observation is ancient accC duplication before the origin of legumes and subsequent elimination of new duplicates in all descendant lineages except Arachis. This hypothesis requires further studies. A similar phenomenon was revealed for accA genes, where Bto_accA2, Cof_accA2, Cca_accA1, Adu_accA2, Aip_accA2, and Lja_accA2 were found to form a clade diverging before the A. thaliana lineage (Figure 6b).    Despite the general consistence of gene consensus trees with the recently resolved legume phylogeny [19], some exceptions involving one or more branches were revealed in accA and accC genes. A pair of Adu_accC1/Aip_accC1 was revealed to evolve independently of all other legume homologs (Figure 8b). The possible explanation of such observation is ancient accC duplication before the origin of legumes and subsequent elimination of new duplicates in all descendant lineages except Arachis. This hypothesis requires further studies. A similar phenomenon was revealed for accA genes, where Bto_accA2, Cof_accA2, Cca_accA1, Adu_accA2, Aip_accA2, and Lja_accA2 were found to form a clade diverging before the A. thaliana lineage (Figure 6b).

Purifying Selection Shaped Evolution of Nuclear Genes Encoding Cytosolic ACCase and Subunits of Plastid ACCase
According to the topology of the majority consensus tree, 53 pairs of duplicated sequences were analyzed, including those located at sister branches and those originating from different clades. The nonsynonymous to synonymous substitution rate (Ka/Ks) ratio analysis revealed that sequence pairs Lal_accB1a/Lal_accB1b and Lal_accB1a/Lal_accB1c were subjected to positive selection. These sequences were derived from the transcriptome assemblies, therefore they were confirmed to be expressed. The lack of premature stop codons and the presence of nonsynonymous mutations close to the 5 end indicate that these sequences are not pseudogenes [91]. For three pairs (Aip_accA1/Aip_accA2, Cca_accA1/Cca_accA2, Lja_accA1/Lja_accA2) the proportion of differences was too high to compute Jukes and Cantor correction. Overlapping parts of Lan_accB2a/Lan_accB2b sequences were identical. All the remaining pairs were revealed to be under purifying selection.
Average Ka/Ks value was the lowest for ACC genes (0.16) and the highest for accB genes (0.50) ( Table S16). The clade encompassing Bto_accA2, Cof_accA2, Cca_accA1, Adu_accA2, Aip_accA2, and Lja_accA2 sequences revealed considerably different rates of gene evolution than the clade carrying Ath_accA and descendant branches. This was manifested by 34.7-fold higher average Ks value calculated for alignments encompassing sequences from both clades than those carrying only sequences located below Ath_accA node.
To evaluate the selection pressure in a wider phylogenetic context, branch-site test of positive selection was performed for Arachis, Lupinus, and Glycine clades (Table S17). Statistically significant p-values (<0.05) under Chi-square distribution were revealed for accB2 Arachis spp. clade (FDR q-value 0.02) and accC Lupinus spp. clade (FDR q-value 0.07). Positively selected sites were F46 to H and A68 to N in Adu_accB2a/Aip_accB2b branch, S257 to G in Adu_accB2b/Aip_accB2a branch, and G445 to N in Lupinus spp. accC clade. The sites S257 and G445 are located in conserved regions, while the regions carrying F46 and A68 sites are relatively variable.
Branch-site model surveys revealed a very strong positive selection signal (p = 2.1E-07) in the clade composed of Bto_accA2, Cof_accA2, Cca_accA1, Adu_accA2, Aip_accA2, and Lja_accA2 genes (FDR q-value 3E-06). Affected positions were as follows: P75 (substitution to M or I), F182 (to W), K411 (to S), and P723 (to K). First three positions are located in conserved regions. This observation highlights the difference in evolutionary patterns of these homologs and those from the main clade, as visualized by distinct positions on the phylogenetic tree.

Transcription Profiles of accA, accB, and accC Duplicates are Different, Indicating the Possibility of Gene Sub-Functionalization
To address the issue of possible transcriptional sub-functionalization of ACCase duplicates, transcriptome-derived datasets were screened and gene expression across species representing all main descendant lineages of Papilionoideae was evaluated. Some duplicates were not found in gene expression atlases, mostly due to lack of corresponding probe set or annotated gene. However, the number of identified homologs was sufficient to provide insight into differential gene expression patterns among several tissues-nodules, roots, leaves, stems, flowers, seeds and pods (Figure 9, Tables S18 and S19).
High expression of cytosolic ACCase was observed in roots, young leaves, and seeds or pods during maturation in the majority of analyzed species (Figure 9a). The two G. max genes (Gma_ACC1 and Gma_ACC2) revealed very similar expression patterns which suggests that their divergence is very recent or that there is no pressure for functional differentiation.
L. japonicus, C. cajan, A. duranensis, and A. ipaensis possess two accA genes assigned into two clades: the well-separated minor one, grouping also B. tomentosa and C. officinalis homologs, and the major one grouped together with the remaining Papilionoideae accA genes (Figure 6b). Interestingly, accA genes from the distinct, minor clade (Adu_accA2, Aip_accA2, Lja_accA2, Cca_accA1) showed a similar expression profile, remarkably different from the main clade (Figure 9b). They were expressed on a much lower level with the highest expression in leaves whereas accA genes from the main clade were expressed in all analyzed organs with the highest level observed in leaves and during seed or pod development. L. japonicus and M. truncatula accA genes from the main clade were highly expressed also in nodules (on similar level as in leaves). High expression in roots, on a similar level as that in leaves, was revealed for Cca_accA2, Mtr_accA and V. unguiculata accA. In G. max, the expression pattern of three closely related accA genes was similar, however for Gma_accA3 it was almost stable, while Gm_accA2 considerably differed between organs, especially nodules and leaves.
A high level of accB transcripts was found in leaves, seeds and pods, in particular cases showing seed/pod developmental stage specificity (Figure 9c). L. japonicus, M. truncatula, and V. unguiculata also revealed high expression of accB genes in flowers, which might be associated with complex regulation during pollen-stigma interactions and on early stages of embryo development. Comparative analysis of the expression profile of accB duplicates highlighted hypothetical sub-functionalization as the accB1 genes had usually the highest level of expression in leaves, whereas for accB2 genes it was in seeds, pods, stems, or flowers. Such an observation was made for L. japonicus, C. cajan, C. arietinum, P. vulgaris, G. max, and V. unguiculata homologs. Organ specialization was visualized in L. japonicus by diverse expression profiles of three accB duplicates. On the other hand, expression of M. truncatula and P. vulgaris accB duplicates differed rather in level than in pattern.  [81], https://peanutbase.org/gene_expression/atlas and Lotus japonicus [85], https://ljgea.noble.org/v2/index.php, and recalculated to the level of ATPaseV0 gene. "Pattee" refers to developmental stage (original term used in the peanut database), 10d, 12d etc. are days after pod development.

Bacterial Artificial Chromosome-Based Approach Is Still Efficient in Current Genomic Analyses
Next generation sequencing (NGS) methods started the new era in genomics. Nevertheless, Transcriptional activity of accC genes was associated with leaf tissue and with maturation of seeds or pods (Figure 9d). In L. japonicus, C. cajan and M. truncatula high expression was observed in roots, nodules, or flowers. Summarizing, accC revealed expression patterns similar to those of accA. Moreover, transcription profiles of duplicated accC genes (from G. max, A. ipaensis, and A. duranensis) were generally similar to each other. However, the statistically significant difference was observed between Adu_accC1 and Adu_accC2 transcription level during seed development, which might be associated with their functional differentiation.

Bacterial Artificial Chromosome-Based Approach Is Still Efficient in Current Genomic Analyses
Next generation sequencing (NGS) methods started the new era in genomics. Nevertheless, traditional, more direct approaches are still incorporated in many studies concerning gene identification and characterization, especially for species without the sequenced genome or in a draft phase of the assembly. Current narrow-leafed lupin pseudochromosome assembly has a length of 470 Mbp, whereas the estimated genome size based on cytogenetics or k-mer analysis is about 924-1153 Mbp [22][23][24]. Therefore, this assembly provides information on 41-51% of the expected narrow-leafed lupin genome sequence. It should be noted that the first G. max pseudochromosome assembly covered~85% of the expected genome sequence, so twice more than the L. angustifolius one [10]. Moreover, the current narrow-leafed assembly was obtained only by linkage mapping and synteny-based approach with no verification by physical mapping methods. The assembly is composed of many small contigs (average length 758 bp), which are organized in larger scaffolds with numerous gaps and tracks of undefined nucleotides. Taking into consideration that ACCase genes are large structures carrying numerous exons, there was a non-negligible risk of a gap or misassembly inside a gene. Thus, in our study the genome sequence was used to localize analyzed genes. However, we decided also to use a BAC-based approach as a reliable method providing direct visualization of particular sequence probes on a chromosome.
Genome BAC libraries, including L. angustifolius libraries, served as very useful tool to select genome regions carrying genes of interest as well as to support high throughput data analysis and genome assembly [23,[30][31][32][33]. Moreover, BAC-based assembly is still considered as efficient approach for studying complex genomes, where polyploidy or high repeat content hampers whole-genome shotgun sequencing [92,93]. BAC libraries have also been recently used for gene family studies, i.e., fructokinase gene family in Saccharum [94] as well as phosphoethanolamine binding proteins in Lupinus [36,90]. The use of gene and species specific probes to screen BAC libraries ensures accurate selection of research material. In our studies the number of verified BAC clones selected based on ACC, accA, accB homogenic hybridization varied from four to six, whereas the use of heterogenic M. truncatula accC gene probe resulted in a higher number of hybridization signals, including several false positives. BAC library approach was beneficial, although did not allow us to find all accB genes due to high sequence differentiation between accB1 and accB2 clades. In this case, sequence comparative mapping among legumes was indispensable to reveal all lupin homologs. Sequence mining based on closely related reference genes is a commonly used approach in genome-wide studies [95,96].

Nuclear Genes Encoding Cytosolic ACCase and Plastid ACCase Subunits Evolved by Whole-Genome Duplication
Whole-genome duplications (WGDs) are hypothesized to have occurred frequently during plant evolution [97]. It is anticipated that a WGD event predated the differentiation of all extant seed plants (dated~319 million years ago, mya) and another such event occurred in the common ancestor of all extant angiosperms (~192 mya) [98]. Early diversification of the core eudicots could be associated with old genome triplication (~117 mya), as revealed by phylogenomic approach [99]. These duplications have putatively driven the evolution of regulatory genes controlling seed and flower development which resulted in further dominance of seed plants on Earth [100]. Legume expansion could be also anchored in an ancient WGD, which hypothetically occurred in the progenitor line of Papilionoideae. This WGD was dated to about 44-65 mya and putatively launched the divergence of ancient lineages of Papilionoideae [9,19,[101][102][103]. The remnants of such an event have been found in numerous clades including Xanthocercis, Cladrastis, dalbergioids (Arachis spp.), genistoids (e.g., L. angustifolius), millettioids (P. vulgaris, G. max, C. cajan, V. radiata), galegoids (M. truncatula, L. japonicus, C. arietinum), [19,101,[104][105][106]. Besides this major ancestral WGD, additional events occurred in early evolution stages of descendant lineages, including Mimosoideae-Cassiinae-Caesalpinieae, Detarieae, Cercideae and Lupinus clades, dated roughlỹ 30-55 mya [19]. Sequence-based comparative mapping provided compelling evidence for large-scale duplication and/or triplication in the L. angustifolius genome, highlighting one or more ancestral polyploidy events [26]. Lupinus spp. WGD is considered to have happened before the divergence of New World and Old World clades, however no genus-wide phylogenetic support has been provided hitherto [19,35]. It should be emphasized that some legume lineage-specific WGDs occurred relatively recently, i.e., 13 mya in G. max and several mya in Arachis [10,19]. Here, we demonstrated that L. angustifolius cytosolic ACCase duplicates Lan_ACC1 and Lan_ACC2 as well as plastid ACCase subunit duplicates of accA (Lan_accA1, Lan_accA2 and Lan_accA3), accB (Lan_accB2a/b and Lan_accB2c) and accC (Lan_accC1 and Lan_accC2) originated from lineage-specific WGD, whereas the pair of genes Lan_accB2a and Lan_accB2b evolved by local tandem duplication. It was well-evidenced by cytogenetic and comparative mapping that WGD shaped the evolution of numerous L. angustifolius genes, including genes from phenylpropanoid pathway (chalcone isomerase, chalcone isomerase like, fatty acid binding protein and isoflavone synthase) [35,37], phosphatidylethanolamine binding protein family [36] as well as genes involved in symbiotic interactions and nitrogen fixation (early nodulin, nodulin 26-like, phosphoenolpyruvate carboxylase, and glutamine synthetase genes) [33].
Direct counting of all WGD events which are assumed to have occurred during plant evolution would lead to the conclusion that even the relatively small genome of Arabidopsis should carry the remnants of at least five ancestral duplications but obviously it does not have so many duplicates [100]. There are evolutionary mechanisms influencing newly arisen gene homologs which result in their elimination, pseudogenization or survival, usually with some sub-or neo-functionalization [107]. We observed that L. angustifolius nuclear genome retained relatively high number of ACCase gene duplicates, involving both cytosolic enzyme and plastid subunits. The same phenomenon was observed for other legume species, particularly G. max and L. albus.

Functional Differentiation of Duplicated ACCase Genes
Homomeric ACCase provides malonyl-CoA for fatty acid elongation as well as polyketide biosynthesis, malonylation reactions, and other specialized metabolic processes [47]. G. max ACC genes, Gma_ACC1 and Gma_ACC2, are paralogs which arose after G. max speciation and did not reveal any functional specialization. The short time period from their divergence might be one of the factors responsible for high similarity of their expression profiles. M. truncatula also possesses two paralogous ACCases, which reveal stronger sequence diversification than G. max proteins, but expression pattern was accessible only for Mtr_ACC2, due to the lack of Mtr_ACC1-specific probe set. The high level of the cytosolic ACCase mRNA level observed in analyzed legume transcriptomes reflects a high demand for fatty acid elongation, needed for the synthesis of cuticular wax, which is deposited on the surface of developing leaves to protect them against water loss and environmental stresses [108,109]. Moreover, malonyl-CoA pool produced by the cytosolic ACCase is also used for biosynthesis of flavonoids including pigments, signaling, and defense compounds [48,49]. High expression of ACC in leaves might be associated with wax synthesis and cuticle formation whereas expression in nodules (as observed for Adu_ACC and Aip_ACC) might be part of plant mechanisms controlling symbiotic bacteria development within plant cells [110,111]. In seeds and/or pods, during their development, expression of cytosolic ACCase is associated with formation of embryo and protection of seeds against water (both loss and penetration) [112].
The accA gene encodes carboxyl transferase alpha subunit of plastid ACCase, which is associated with fatty acid formation. The activity of this gene is important for phospholipid synthesis and cellular and plastid membranes formation as well as oil deposition [38,[113][114][115]. Papilionoideae accA genes are divided into two sub-clades: the main clade carrying representatives from all studied species and the early diverging clade. The genes from early diverging clade (C. cajan, L. japonicus, A. duranensis and A. ipaensis) revealed much lower expression in all analyzed organs than the genes from the main clade. The strong difference in expression levels indicates evolutionary conserved functional specialization between these clades. The situation in G. max is quite different which has three accA homologs, all from the main accA clade, showing similar expression patterns. Gma_accA2 high activity in leaves and Gma_accA1 high activity at the end of seed maturation may suggest that these paralogous genes are at the early stage of functional diversification.
Genes encoding BCCP are divided into two clades, accB1 and accB2, which evolved before the origin of Papilionoideae. Whereas in A. thaliana BCCP1, At5g16390, is the major gene and the second homolog BCCP2, At5g15530, is not indispensable, in legumes genes from the clade BCCP1 appeared to be lacking in some lineages, whereas genes from BCCP2 were found in every species analyzed. Also the frequency of gene duplications in Papilionoideae BCCP2 clade was much higher than in BCCP1. A considerably higher number of accB genes found in all analyzed plants, in comparison with genes for other pt-ACCase subunits, indicates that the genes for pt-ACCase subunits are under different selection forces. These differences are probably associated with the BCCP role in regulation of the complex enzyme activity. The BCCP-dependent regulatory mechanisms involve posttranslational modification, BCCP dimerization, BCCP-related BADC negative regulatory proteins competition during enzyme assembly, BCCP interactions with signal integrator PII protein, as well as genetic control by the WR1 transcription factor [47,[116][117][118][119]. The other forces driving BCCP evolution might be associated with nuclear-cytoplasmic conflict between BCCP and CT-beta, as it was identified in Pisum sativum [120]. Indeed, the pair of nuclear Bccp3 and plastid accD genes seems to act like speciation genes in pea, hampering successful crossing of wild representatives with cultivated forms [120]. The diversity of regulatory mechanisms depending on BCCP might be responsible for frequent conservation of newly aroused homologs and their subsequent specialization. Nevertheless, in all analyzed plants, the organs of the highest expression of accB genes were leaves and seeds, probably associated with high demand for fatty acids either for phospholipid membranes in plastids of mesophyll cells or as a storage material in seeds [38,39,[121][122][123].
The majority of legume species with accessible transcriptomic data possess only one accC gene, except A. duranensis, A. ipaensis and G. max which carry two homologs. However, given the very low apparent expression of some loci (Figure 9), it is possible that a single transcriptome sequence for some taxa did not reveal some of the paralogues. Gm_ accC1 and Gm_accC2 are paralogs with similar expression patterns indicating that there was no pressure for their functional diversification or simply there was not enough time for such a process. In Arachis spp., accC duplication occurred before A. duranensis and A. ipaensis differentiation and these genomes contain two copies of accC1 and accC2. The similar expression patterns of Adu_accC1 and Aip_accC1 orthologs suggest that their functions were preserved during evolution. Contrarily, Adu_accC2 and Aip_accC2 genes revealed different expression patterns, visualized by high expression of Adu_accC2 in nodules and roots as well as different expression profile during seed development. These differences suggest that A. duranensis and A. ipaensis accC2 orthologues are under different selective pressures.

Selection Constraints of Genes Encoding Cytosolic ACCase and Subunits of Plastid ACCase
Calculation of the nonsynonymous to synonymous substitution rate (Ka/Ks) ratio performed for 53 pairs of duplicated sequences revealed that nuclear genes encoding cytosolic ACCase and subunits of plastid ACCase have been subjected to strong purifying selection. Only a few examples of positive (Lal_accB1a vs. Lal_accB1b/Lal_accB1c) or neutral (Aar_accC1 vs. Aar_accC2) selection in this pairwise comparison were identified. Cytosolic ACC-ase has lower average of Ka/Ks ratio (0.16) than plastid subunits (0.48). The observation of negative selection pressure is consistent with the enzyme position in the biosynthesis pathways as well as with general importance of ACCase as a key metabolic enzyme [124]. Plastid ACCase is the first enzyme in lipid biosynthesis pathway whereas cytosolic enzyme is pivotal for variety secondary metabolism reactions [38,39,49,108]. In general, genes encoding upstream enzymes in molecular pathways evolve more slowly and are subjected to stronger purifying selection than the downstream ones [125]. The relation between enzyme position in the pathway and selective pressure acting on its gene was identified in numerous studies from different metabolic pathways [126,127]. It has been hypothesized that natural selection may act discriminatively on enzymes with high flux control, usually located at the beginning of the pathway [128]. Moreover, nonsynonymous nucleotide substitutions in genes of upstream enzymes may not be preserved because they may have more deleterious impact on final products than similar changes in genes of downstream enzymes [129]. Both cytosolic and plastid ACCases are enzymes affecting numerous downstream products and as such should be under purifying selection. Hence, other factors might have contributed to observed difference of selection constraints between genes encoding cytosolic enzyme and plastid subunits of ACCase.
Branch-site model analysis evidenced that positive selection might have occurred during evolution of accB2 gene in Arachis spp. and accC gene in Lupinus spp. Positive selection had putatively acted in these lineages at their early evolution stage, before the divergence into particular species. Following gene duplication, further amino acid changes were preserved by purifying selection. Similar marks of transient evolutionary episode of positive selection in Lupinus spp. were revealed for isoflavone synthase genes and were suggested to occur before the whole-genome duplication episode and divergence of L. angustifolius, L. albus, and L. luteus [37].

Conclusions
Nuclear genes encoding cytosolic ACCase and plastid ACCase subunits in legumes evolved by whole-genome duplication. Further evolution of these genes was shaped by strong purifying selection. Legume plastid ACCase subunit genes differ by copy number: the most abundant are accB genes, what reflects the regulatory role of BCCP subunit in pt-ACCase activity. During evolution of particular legume lineages duplicated plastid ACCase subunit genes underwent transcriptional sub-functionalization.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/9/11/563/s1, Table S1: List of primers used in the study, Table S2: BAC end mapping to the L. angustifolius genome, Table S3: Segregation data for molecular markers developed for 83:476 × P27255 Lupinus angustifolius recombinant inbred line mapping population, Table S4: Copy number evaluation of L. angustifolius cytosolic ACCase and subunits of plastid ACCase genes by Droplet Digital PCR, Table S5: Sequence coordinates for cytosolic ACCase and subunits of plastid ACCase genes derived from legume genome assemblies, Table S6: Sequence coordinates for cytosolic ACCase and subunits of plastid ACCase genes derived from legume transcriptome assemblies, Table S7: Total score values calculated for legume ACC syntenic regions, Table S8: Total score values calculated for legume accA syntenic regions, Table S9: Total score values calculated for legume accB syntenic regions, Table S10: Total score values calculated for legume accC syntenic regions, Table S11: FASTA alignment of ACC sequences used for phylogenetic inference, Table S12: FASTA alignment of accA sequences used for phylogenetic inference, Table  S13: FASTA alignment of accB1 sequences used for phylogenetic inference, Table S14: FASTA alignment of accB2 sequences used for phylogenetic inference, Table S15: FASTA alignment of accC sequences used for phylogenetic inference, Table S16: Substitutions in ACCase paralogous comparisons, Table S17: Results of branch-site model analysis of positive selection, Table S18: Gene expression values of legume cytosolic ACCase genes and subunits of plastid ACCase genes (normalized), Table S19: Gene expression values of legume cytosolic ACCase genes and subunits of plastid ACCase genes standardized to the expression of ATPaseV0 gene, Figure S1: Structure of ACC genes, Figure S2: Structure of accA genes, Figure S3: Structure of accB genes, Figure S4: Structure of accC genes.
Author Contributions: A.S. performed DNA isolation from BAC clones and plant samples, molecular marker development and mapping, fluorescent in situ hybridization, digital droplet PCR assay, sequence analysis of BAC-ends and BAC clones, and partially drafted the manuscript. M.K. contributed to the general concept of the study, performed gene sequence prediction, comparative mapping, transcriptome in silico profiling, phylogenetic inference, selection pressure analysis, and partially drafted the manuscript. J.P. contributed to the general concept of the study and plan of experiments as well as designed probes for BAC library screening, and partially drafted the manuscript. K.B.C. performed BAC library hybridization. M.F. contributed to the general concept of the study and manuscript verification. B.N. contributed to the general concept of the study, design of experiments and manuscript preparation.